SocialReasoning-Bench: Measuring whether AI agents act in users’ best interests
SocialReasoning-Bench measures AI agents' alignment with user interests. New benchmarks reveal agents often fail to optimize user outcomes, even when explicitly instructed to do so, highlighting a critical gap in AI agent development.