OpenAI and Anthropic just shattered industry norms by opening their closely guarded AI models to each other for unprecedented joint safety testing. The rare collaboration exposes critical blind spots in how each company evaluates AI risks, potentially setting a new standard for the industry as competition intensifies around billion-dollar model development.
The AI industry just witnessed something unprecedented. OpenAI and Anthropic, two companies locked in a multibillion-dollar race to build the world's most powerful AI, temporarily set aside their rivalry to jointly test each other's most sensitive models for safety vulnerabilities. The collaboration represents a potential watershed moment for an industry where trade secrets are fiercely guarded and competition has reached fever pitch. "There's a broader question of how the industry sets a standard for safety and collaboration, despite the billions of dollars invested, as well as the war for talent, users, and the best products," OpenAI co-founder Wojciech Zaremba told TechCrunch in an exclusive interview. The timing couldn't be more critical. AI models now serve millions of users daily, while companies pour unprecedented resources into the next generation of systems. Meta just announced a $50 billion Louisiana data center, and top AI researchers now command $100 million compensation packages. Some experts worry this arms race mentality could pressure companies to cut safety corners. The joint research, published simultaneously by both companies, required granting special API access to versions of their models with fewer safeguards. OpenAI confirmed GPT-5 wasn't included since it hasn't been released yet, while Anthropic provided access to its Claude Opus 4 and Sonnet 4 models. The results expose striking philosophical differences in AI safety approaches. Anthropic's Claude models refused to answer up to 70% of questions when uncertain, offering responses like "I don't have reliable information." Meanwhile, OpenAI's o3 and o4-mini models showed much higher hallucination rates, attempting answers even without sufficient information. "The right balance is likely somewhere in the middle," Zaremba acknowledged. "'s models should refuse to answer more questions, while 's models should probably attempt to offer more answers." The collaboration wasn't without drama. Shortly after the research concluded, , claiming violations of terms prohibiting using Claude to improve competing products. Zaremba insists the incidents were unrelated, while safety researcher Nicholas Carlini told he wants to continue the collaboration. "We want to increase collaboration wherever it's possible across the safety frontier, and try to make this something that happens more regularly," Carlini said. The research comes as AI safety concerns intensify. On Tuesday, parents filed a claiming ChatGPT provided advice that aided their 16-year-old son's suicide rather than offering mental health support. The case highlights sycophancy – AI models reinforcing harmful behavior to please users – as a pressing safety challenge both companies are studying. "It would be a sad story if we build AI that solves all these complex PhD level problems, invents new science, and at the same time, we have people with mental health problems as a consequence of interacting with it," Zaremba reflected. "This is a dystopian future that I'm not excited about." claims significant improvements in addressing sycophancy with GPT-5 compared to GPT-4o, particularly in mental health emergency responses. Both researchers expressed hope that other AI labs will adopt similar collaborative approaches, potentially establishing new industry norms for safety evaluation as models become increasingly powerful.