New benchmark SocialReasoning-Bench evaluates AI agent alignment with user interests. Researchers found agents perform tasks competently but struggle to consistently optimize user outcomes, even when instructed to do so. This highlights a critical gap in AI agent development concerning true user benefit.
Opening Kapyn…