The Laude Institute just dropped its first batch of Slingshots grants, targeting one of AI's thorniest problems: how to actually measure what these systems can do. The accelerator program is backing 15 projects focused on AI evaluation, offering the kind of compute power and engineering support that most academic researchers can only dream of.
The Laude Institute just made a major play in the AI evaluation space with its debut Slingshots program, and the timing couldn't be more critical. As AI capabilities explode across every sector, the industry is wrestling with a fundamental question: how do you actually measure what these systems can do?
The institute announced 15 projects on Thursday, each tackling different pieces of the AI evaluation puzzle. Unlike traditional academic grants that leave researchers scrambling for compute resources, Slingshots offers the full package - funding, massive compute power, and dedicated engineering support that would make most university labs jealous.
The catch? Recipients need to deliver something concrete, whether that's a startup, open-source code, or another tangible artifact. It's a hybrid model that bridges the gap between academic research and Silicon Valley's move-fast mentality.
Several projects in the cohort should ring bells for anyone following AI development. Terminal Bench is back with its command-line coding benchmark, while the ARC-AGI project continues its long-running quest to create meaningful AGI tests.
But the really interesting action is happening with the newer approaches. Formula Code, a collaboration between CalTech and UT Austin researchers, is building evaluations specifically for AI agents' code optimization skills. Meanwhile, Columbia's BizBench wants to create comprehensive benchmarks for "white-collar AI agents" - the kind that might soon be handling your expense reports or client emails.
The star power extends beyond just the projects. SWE-Bench co-founder John Boda Yang is leading CodeClash, a dynamic competition-based framework that builds on his previous success in AI code evaluation. Yang's worried about something that should keep the entire industry up at night: benchmarks becoming proprietary company tools rather than shared scientific standards.
"I do think people continuing to evaluate on core third-party benchmarks drives progress," Yang told TechCrunch. "I'm a little bit worried about a future where benchmarks just become specific to companies."








