Arena launches Agent Mode to test AI agents on real-world tasks. This new feature allows developers to run autonomous agents that can browse, code, and complete multi-step workflows from a single prompt, with results contributing to a new leaderboard. It shifts AI evaluation from controlled benchmarks to complex, practical applications, offering developers a clearer view of agentic performance.
Opening Kapyn…