The AI agent race just got real. Anthropic dropped Opus 4.6 this week and immediately scrambled the leaderboards for professional-grade AI agents. The new model hit nearly 30% on complex legal and corporate tasks - a 60% jump from its predecessor's 18.4% score just weeks ago. It's the kind of benchmark leap that makes white-collar professionals nervous and investors very interested.
Anthropic just gave every law firm CTO a reason to update their five-year workforce plans. The company's freshly released Opus 4.6 model is rewriting what's possible for AI agents tackling complex professional work, and the numbers tell a story that's hard to ignore.
Just last month, the industry consensus was clear: AI agents weren't ready for prime time. When Mercor launched its APEX-Agents benchmark in January to test how well AI systems handle real-world professional tasks - the kind lawyers and analysts do daily - every major lab scored under 25%. The conclusion felt safe: your job is secure, at least for now.
That was three weeks ago. This week's Opus 4.6 release shattered that comfortable assumption. The new model scored 29.8% on one-shot trials of the benchmark, a 60% improvement over its predecessor. When given multiple attempts at the same problem - mimicking how actual professionals iterate on complex tasks - the average jumps to 45%.
"Jumping from 18.4% to 29.8% in a few months is insane," Mercor CEO Brendan Foody told TechCrunch. The speed of improvement caught even benchmark creators off guard.
What changed? Anthropic shipped Opus 4.6 with a suite of new agentic features, including what it calls "agent swarms" - multiple AI instances working in parallel on different aspects of a problem. The approach seems particularly effective for the multi-step reasoning that professional work demands. Where earlier models would lose the thread halfway through analyzing a contract or financial document, Opus 4.6 can apparently hold the context long enough to reach useful conclusions.











