New benchmark SocialReasoning-Bench evaluates AI agent alignment with user interests. Models demonstrated competence but struggled to consistently optimize for user benefit, even when explicitly instructed. This highlights a critical gap in agent design for reliable user assistance.
Opening Kapyn…