Reddit just filed a bombshell lawsuit against AI search startup Perplexity, alleging the company used third-party data scrapers to steal Reddit content and circumvent the platform's protections. The lawsuit targets not just Perplexity but three data-scraping companies that Reddit says are fueling an "industrial-scale data laundering economy" in AI. This marks Reddit's most aggressive legal push yet to monetize its treasure trove of human conversation data that's become gold for AI training.
Reddit is declaring war on AI companies that won't pay up. The social platform just dropped a lawsuit against Perplexity AI and three data-scraping companies, accusing them of running an elaborate scheme to steal Reddit's most valuable asset - millions of authentic human conversations.
The complaint filed today reads like a heist movie. Reddit compares the defendants to "would-be bank robbers" who "knowing they cannot get into the bank vault, break into the armored truck carrying the cash instead." The armored truck in this case? Third-party scrapers SerpApi, Oxylabs, and AWMProxy that Reddit says Perplexity hired to do its dirty work.
Here's where it gets juicy. Reddit sent Perplexity a cease-and-desist letter back in May 2024, demanding the AI search company stop scraping Reddit data. Perplexity promised they'd play nice and respect Reddit's robots.txt file. But according to the lawsuit, the volume of Reddit citations on Perplexity actually went up after that conversation.
Reddit didn't stop there. They set a trap - creating a post that could only be crawled by Google. "Within hours," the lawsuit claims, Perplexity had somehow accessed and used that content in its answer engine. "The only way that Perplexity could have obtained that Reddit content," Reddit argues, "is if it and/or its co-defendants scraped Google search results."
This lawsuit hits at the heart of the AI industry's biggest tension right now. Reddit's treasure trove of human-written, community-ranked content is exactly what AI companies need to train better models. But Reddit learned its lesson from the 2023 API pricing controversy that sparked massive protests - if AI companies want the data, they need to pay for it.
The strategy's been working. Reddit has already struck lucrative deals with OpenAI for ChatGPT integration and Google for AI training access. The company reportedly wants even better terms as these partnerships come up for renewal.
"AI companies are locked in an arms race for quality human content," Ben Lee, Reddit's chief legal officer, said in a statement. "That pressure has fueled an industrial-scale 'data laundering' economy." Lee painted a picture of a shadowy ecosystem where scrapers "mask their identities, hide their locations, and disguise their web scrapers" to steal content from Google search results.