A federal judge just preliminarily approved Anthropic's $1.5 billion settlement with authors in what will be the largest publicly reported copyright recovery in history. The groundbreaking case alleged the AI startup illegally downloaded books from pirated databases to train its Claude assistant, setting a precedent that could reshape how AI companies source training data.
The gavel came down Thursday on what might be the most expensive lesson in AI training ethics yet. Anthropic, the $183 billion startup behind the Claude assistant, just saw a federal judge preliminarily approve its massive $1.5 billion settlement with a group of authors - making it the largest publicly reported copyright recovery in history.
The lawsuit, filed in the U.S. District Court for the Northern District of California by authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, cut straight to the heart of AI's grey market training practices. The authors alleged that Anthropic illegally downloaded their books from notorious pirated databases like Library Genesis and Pirate Library Mirror to feed its language models.
"We are grateful for the Court's action today, which brings us one step closer to real accountability for Anthropic and puts all AI companies on notice they can't shortcut the law or override creators' rights," the authors said in a joint statement following the ruling.
The settlement terms read like a tech industry wake-up call. Anthropic agreed to pay roughly $3,000 per book plus interest - a figure that adds up fast when you're talking about mass scraping operations. More importantly for the industry, the company committed to destroying the datasets containing the allegedly pirated material entirely.
U.S. District Judge William Alsup didn't rubber-stamp this deal. The judge initially expressed reservations about the settlement, particularly around ensuring authors would be properly informed about their rights. Only after "several weeks of rigorous assessment and review" did Alsup give his preliminary approval, according to court releases.
The case has been closely watched across Silicon Valley, where AI startups and media companies are still trying to figure out what copyright infringement means in the age of large language models. Anthropic, founded in 2021 by former OpenAI research executives including CEO Dario Amodei, has positioned itself as the "safety-first" alternative in the AI race.
That safety-first reputation took a hit when the lawsuit surfaced details about the company's training data sourcing. The allegations painted a picture of an industry willing to use questionable shortcuts to build competitive advantages, even as companies publicly championed responsible AI development.