OpenAI's GPT-5 matches humans in 40% of professional tasks

OpenAI just dropped a bombshell benchmark showing GPT-5 performs at human-expert levels in 40.6% of professional tasks across nine key industries. The GDPval test, spanning healthcare to finance, marks the closest AI has come to matching human economic output - a critical milestone toward artificial general intelligence that could reshape how millions of professionals work.

OpenAI just fired the latest shot in the AI arms race, and this one hits close to home for working professionals everywhere. The company's new GDPval benchmark reveals that GPT-5 now performs at or above human expert levels in over 40% of professional tasks - a dramatic leap that brings artificial general intelligence uncomfortably close to reality.

The test results landed Thursday like a carefully aimed disruption bomb. OpenAI pitted its GPT-5-high model against seasoned professionals across 44 occupations, from investment bankers crafting competitor analyses to nurses documenting patient care. The AI didn't just participate - it won or tied with human experts 40.6% of the time.

That number becomes even more striking when you consider the trajectory. GPT-4o, released just 15 months ago, managed only a 13.7% success rate. "The rate of progress is really encouraging," OpenAI evaluations lead Tejal Patwardhan told TechCrunch. The nearly triple improvement suggests we're not looking at incremental gains but exponential leaps toward human-level AI.

Anthropic inadvertently stole some thunder here - their Claude Opus 4.1 actually outscored GPT-5 with a 49% success rate. But OpenAI quickly threw shade, suggesting Claude's advantage comes from "pleasing graphics" rather than raw analytical capability. It's the kind of diplomatic slight that reveals how intensely these companies are watching each other's benchmarks.

The GDPval test itself represents a clever approach to measuring AI's economic impact. Rather than abstract academic problems, OpenAI focused on the nine industries that contribute most to America's GDP - healthcare, finance, manufacturing, government, and others. They asked experienced professionals to blindly compare AI-generated reports with human work, then vote for the winner.

"People in those jobs can now use the model to offload some of their work and do potentially higher value things," OpenAI chief economist Dr. Aaron Chatterji explained in an interview. It's the optimistic spin on what could be an uncomfortable reality for millions of knowledge workers.

But before anyone starts updating their resume, OpenAI freely admits GDPval-v0 only tests report-writing capabilities. Real professional work involves meetings, relationship building, crisis management, and a thousand other human skills that current AI can't touch. The company plans more comprehensive tests that account for interactive workflows and broader industry contexts.

This benchmark arrives as Silicon Valley's traditional AI tests are hitting ceiling effects. Popular measures like AIME 2025 math problems and GPQA Diamond's PhD-level science questions are approaching saturation, where multiple models score near-perfect results. Several AI researchers have been calling for real-world benchmarks that actually matter for economic productivity.

OpenAI clearly sees GDPval as their answer to that challenge. The test directly tackles the company's founding mission of developing artificial general intelligence that can outperform humans at "economically valuable work." While we're not there yet, the 40% success rate suggests the gap is closing faster than most expected.

The competitive implications ripple far beyond OpenAI. If AI models can genuinely match human professionals in nearly half of knowledge work tasks, that changes the economics of entire industries. Companies will face pressure to integrate AI tools not just for efficiency, but for competitive survival.

What makes this particularly significant is the breadth of occupations tested. We're not talking about narrow, specialized tasks but core professional functions across healthcare, finance, engineering, and journalism. The AI isn't just getting better at chess or protein folding - it's approaching competency in the daily work that drives the economy.

OpenAI expects this trajectory to continue accelerating. With GPT-6 likely in development and competitors like Anthropic and Google pushing their own models, the question isn't whether AI will match human professional performance, but when - and what happens to the humans caught in between.

The 40% human-expert success rate isn't just a number - it's a watershed moment that signals AI's transition from useful tool to genuine competitor in professional work. While current tests only scratch the surface of real job complexity, the exponential improvement from 13.7% to 40.6% in 15 months suggests we're approaching an inflection point where AI becomes indispensable for competitive advantage across industries. The question facing professionals isn't whether AI will reshape their work, but how quickly they can adapt to working alongside systems that increasingly match their capabilities.

the tech buzz

OpenAI's GPT-5 matches humans in 40% of professional tasks

More in AI

Creating Virtual Tour Guide Videos With AI Avatars for National Parks and Adventure Brands

Why Cybersecurity Looks Different in 2026

AI Support Agents: How to Deploy One Without Writing a Line of Code

Morgan Stanley Doubles China Humanoid Robot Forecast

Nvidia and AWS Team Up on Enterprise AI Infrastructure

Nvidia and AWS Deepen AI Partnership for Enterprise Scale

More Articles

DuckDuckGo and Perplexity Outperform Google Search in New Test

Hollywood Studios Drop Sam Altman Biopic After Amazon Exit

Superhuman Snaps Up AI Detection Startup GPTZero

Cerebras Stock Tumbles 8% on Margin Squeeze in First Post-IPO Report

Trending Now

BrainCo's Wearable Brain Tech Challenges Neuralink's Invasive Bet

Uber Pushes Policy to Slow Rival Self-Driving Cars

Musk and Altman Clash on X as Apple Sues OpenAI

Apple's Dead Car Project Birthed the Neural Engine AI Chip

U.S. Workers Back AI Wealth Fund Amid Tech Layoff Wave

People Also Ask

What is OpenAI's GDPval benchmark and how does it test AI performance?

How much better is GPT-5 compared to GPT-4o in professional tasks?

What percentage of professional tasks can GPT-5 perform at human expert level?

How does Claude Opus 4.1 compare to GPT-5 in the GDPval benchmark?

What are the limitations of the current GDPval AI benchmark test?

Will GPT-5 replace human professionals in the workplace?