A coalition of major web publishers just fired the opening shot in what could become the industry's biggest battle over AI training data. Reddit, Yahoo, Medium, Quora, and People Inc. announced support for the Really Simple Licensing (RSL) Standard - a new framework that lets publishers set exact pricing terms for AI companies scraping their content. The timing isn't coincidental: AI companies are burning through billions training next-generation models, and publishers want their cut.
The web's latest power play just dropped, and it's aimed squarely at AI companies who've been feasting on free content for years. Reddit, Yahoo, Medium, Quora, and People Inc. are banding together behind the Really Simple Licensing (RSL) Standard, a framework that transforms the humble robots.txt file into a pricing menu for AI training data.
The move represents the most coordinated publisher response yet to AI companies' massive data appetite. Where individual negotiations have yielded mixed results - The New York Times and News Corp secured deals with OpenAI, while others got nothing - the RSL Standard promises collective bargaining power.
"The goal is to create a new, scalable business model for the web," RSL Collective co-founder Eckart Walther told The Verge. Walther, who helped create RSS, is betting that unified publisher action can force AI companies to pay up. "RSL takes some of those early RSS ideas and creates a new layer for the entire internet where licensing rights and compensation rights are defined."
The technical implementation builds on the existing robots.txt protocol that's governed web crawling since the 1990s. But instead of simple allow/deny instructions, publishers can now embed licensing terms directly in their robots.txt files. The system supports multiple pricing models: subscription fees, pay-per-crawl charges, and even pay-per-inference fees that compensate publishers each time an AI model references their content in responses.
This isn't just about blocking bots - it's about monetizing them. Publishers using the RSL Standard can set different rates for different types of crawling. Search engine bots and archival services can proceed as usual, while AI training crawlers face pricing walls.
Behind the scenes, the RSL Collective is working with Fastly, a major content delivery network, to enforce these licensing terms. "Fastly is the bouncer at the door to the club, and they won't let people in unless they have the right ID," explains Doug Leeds, the collective's other co-founder and former CEO of IAC Publishing. "RSL is issuing the IDs."
The enforcement challenge looms large. AI companies have repeatedly ignored robots.txt files, with companies like Anthropic and Perplexity facing accusations of unauthorized scraping. The RSL Standard can't physically block crawlers by itself - it needs infrastructure partners like Fastly to act as gatekeepers.
Leeds positions the collective as a digital rights organization similar to ASCAP in music licensing. "All participants in the collective rights organization participate in the enforcement of any infringement," he says, spreading legal costs across members. But unlike music rights, which enjoy strong copyright precedent, AI training data exists in a legal gray area that's still being fought in courts.
The timing coincides with escalating legal battles across the industry. Reddit is suing Anthropic over alleged unauthorized access, while Getty Images and multiple publishers are pursuing copyright claims against AI companies. The RSL Standard attempts to sidestep these legal uncertainties by creating explicit licensing terms upfront.
"There has always been a question of whether bots have agreed to terms that they don't see," Leeds and Walther said in a statement. "RSL changes that fundamentally, putting crawlers on notice of what the terms are before they access a site."
Early adoption extends beyond the launch partners. O'Reilly, wikiHow, and Ziff Davis (owner of IGN) have also joined the collective, which remains free for publishers. The question now is whether AI companies will voluntarily adopt a standard that could cost them millions in licensing fees.
Previous individual licensing deals suggest there's precedent. OpenAI has struck agreements with Vox Media, News Corp, and others, reportedly worth tens of millions annually. But those deals involved major media companies with significant legal leverage - the RSL Standard aims to democratize that negotiating power.
The collective's success hinges on critical mass. If enough major publishers adopt RSL pricing, AI companies may find it more efficient to pay licensing fees than navigate individual negotiations or risk legal challenges. But if adoption remains limited, AI companies could simply ignore the standard as they have robots.txt files.
"What we're doing is not reinventing wheels or inventing wheels - we're just bringing them to a place that they haven't existed before," Leeds explains. The infrastructure exists in other media industries; the question is whether it can work for web content in an AI-driven world.
The RSL Standard represents the most ambitious attempt yet to monetize AI training data at web scale. Whether it succeeds depends on AI companies choosing cooperation over confrontation - and publishers maintaining unity as legal and technical challenges emerge. If it works, every website owner suddenly has a new revenue stream. If it fails, the AI industry's free lunch continues, leaving publishers to pursue costlier legal remedies one lawsuit at a time.