New benchmark SocialReasoning-Bench evaluates AI agent alignment with user interests. Tests reveal agents consistently lack the ability to optimize for user benefit, even when explicitly instructed. This highlights a critical gap in current AI agent capabilities for real-world application.
Opening Kapyn…