SocialReasoning-Bench evaluates AI agents' alignment with user interests. Across tested models, agents perform tasks competently but struggle to consistently prioritize user benefit, even when explicitly instructed to do so, highlighting a critical gap in AI agent design.
Opening Kapyn…