A quiet revolution is reshaping AI capabilities, but it's not happening everywhere at once. While OpenAI's GPT-5 and Google's Gemini 2.5 have transformed coding workflows seemingly overnight, other AI applications remain stubbornly stuck. The culprit? Reinforcement learning is creating winners and losers based on one critical factor: whether success can be measured automatically.
The AI industry is experiencing an uneven acceleration that's creating clear winners and losers across different capabilities. Russell Brandom's latest analysis for TechCrunch reveals a fundamental divide emerging between AI tasks that can leverage reinforcement learning and those that can't.
Coding applications are seeing breakthrough improvements almost monthly. Last week's release of Sonnet 2.4 continued a trend that began with OpenAI's GPT-5 and Google's Gemini 2.5, each making "a whole new set of developer tricks possible to automate," according to the report. But if you're using AI for email writing or general chatbot interactions, you're probably getting the same value you did a year ago.
The difference comes down to reinforcement learning's hunger for measurable outcomes. Software development offers billions of easily measurable tests - unit testing, integration testing, security testing - that have existed for decades. These pass-fail metrics can be repeated "billions of times without having to stop for human input," creating the perfect training environment for AI systems.
"There's no easy way to validate a well-written email or a good chatbot response," Brandom notes. "These skills are inherently subjective and harder to measure at scale." This creates what he calls the "reinforcement gap" - a growing divide between capabilities that can be automatically graded and those that require human judgment.
Google's senior director for dev tools recently confirmed that existing testing frameworks work just as well for validating AI-generated code as human-written code. But the implications extend far beyond software development. The reinforcement gap is becoming "one of the most important factors for what AI systems can and can't do."
Some processes are proving more testable than expected. OpenAI's surprise release of Sora 2 demonstrates dramatic improvements in AI-generated video. Objects no longer vanish randomly, faces maintain consistency, and physics laws are respected in both . The improvements suggest found ways to automatically test video quality through physics-based metrics.