New benchmark SocialReasoning-Bench evaluates AI agent alignment with user interests. Agents perform competently but often fail to optimize user outcomes, even when explicitly instructed. This highlights a critical gap in agent AI development for user-centric behavior.
Opening Kapyn…