SocialReasoning-Bench evaluates AI agents' alignment with user interests. Agents perform tasks competently but struggle to consistently prioritize user benefit, even when explicitly instructed, revealing a critical gap in AI alignment.
Opening Kapyn…