Cloudflare just unveiled the technical culprit behind Tuesday's massive outage that knocked ChatGPT offline for hours. CEO Matthew Prince's detailed post-mortem reveals a ClickHouse database query gone rogue caused duplicate data to overwhelm the Bot Management system, cascading across 20% of global web traffic. The infrastructure giant calls it their worst disruption since 2019.
Cloudflare CEO Matthew Prince didn't mince words in his late-night technical breakdown of Tuesday's catastrophic outage. What started as suspected DDoS attacks or cyberwarfare turned out to be something far more mundane but equally devastating - a database query that couldn't stop duplicating itself.
The chaos began in Cloudflare's Bot Management system, the AI-powered gatekeeper that's supposed to distinguish between legitimate users and automated crawlers scraping data for OpenAI and other AI training operations. The system relies on a machine learning model that constantly updates its configuration file to identify bot behavior patterns. But a change to the underlying ClickHouse database query started generating endless duplicate "feature" rows.
"A change in our underlying ClickHouse query behaviour that generates this file caused it to have a large number of duplicate 'feature' rows," Prince explained in the technical post-mortem. As the configuration file ballooned beyond preset memory limits, it brought down the core proxy system that processes customer traffic.
The timing couldn't have been worse. Cloudflare powers roughly 20% of the global web, making it one of the internet's most critical single points of failure. When the Bot Management module crashed, it created a cascade effect that knocked major services offline including ChatGPT, X, and ironically, the popular outage tracker Downdetector.
What made the outage particularly insidious was its selective nature. Companies that had configured Cloudflare rules to actively block bots based on generated scores saw their systems return false positives, cutting off legitimate human traffic. Meanwhile, customers who didn't rely on the bot scoring system in their rules kept humming along normally, creating a confusing patchwork of service availability.
This wasn't Cloudflare's first rodeo with major disruptions. The company has weathered significant outages before, including incidents that resembled recent problems at and . But Prince called this one their worst since 2019, highlighting how even minor database changes can trigger catastrophic failures in hyperscale infrastructure.









