New SocialReasoning-Bench reveals AI agents struggle to consistently act in users' best interests. Testing across models shows agents are competent but fail to optimize user outcomes even when explicitly instructed. This highlights a critical gap in developing truly beneficial AI collaborators.
Opening Kapyn…