SocialReasoning-Bench evaluates AI agents' alignment with user interests. The new benchmark reveals that current models consistently struggle to optimize user outcomes, even when explicitly instructed to do so, highlighting a critical gap in AI agent development.
Opening Kapyn…