The era of scraping free web data for AI training is ending. Companies like Turing Labs and Fyxer are now paying premium rates to hire artists, chefs, and construction workers to create custom datasets, betting that proprietary training data will become their biggest competitive advantage as the AI boom matures.
Taylor strapped a GoPro to her forehead every morning this summer, enduring headaches and red marks to help train the future of AI. For one week, she and her roommate filmed themselves painting, sculpting, and doing household chores - earning premium pay to create something no web scraper could deliver: perfectly synchronized, multi-angle footage of human problem-solving in action.
"We woke up, did our regular routine, and then strapped the cameras on our head and synced the times together," Taylor told TechCrunch. "Then we would make our breakfast and clean the dishes. Then we'd go our separate ways and work on art." The work paid well but came with a physical cost. "It would give you headaches. You take it off and there's just a red square on your forehead."
Taylor was working for Turing Labs, an AI company that's abandoning the old playbook of scraping free web data. Instead, Turing is contracting with chefs, construction workers, electricians, and artists to create proprietary video datasets for their vision models. The goal isn't teaching AI to paint, but developing abstract skills around sequential problem-solving and visual reasoning that can't be found in existing datasets.
"We are doing it for so many different kinds of blue-collar work, so that we have a diversity of data in the pre-training phase," Turing Chief AGI Officer Sudarshan Sivaraman told TechCrunch. "After we capture all this information, the models will be able to understand how a certain task is performed."
This shift from free web scraping to expensive custom collection represents a fundamental change in how AI companies think about competitive advantage. With foundation models becoming commoditized, proprietary training data is emerging as the new battleground. Companies are no longer just building better algorithms - they're building better datasets.
Fyxer, an email management startup, discovered this lesson the hard way. Founder Richard Hollingsworth initially tried standard approaches but found that his AI needed something web scrapers couldn't provide: the nuanced judgment of experienced executive assistants who understand email etiquette and priority.
"We used a lot of experienced executive assistants, because we needed to train on the fundamentals of whether an email should be responded to," Hollingsworth told . In Fyxer's early days, executive assistants outnumbered engineers and managers four-to-one. "It's a very people-oriented problem. Finding great people is very hard."