AI infrastructure startup Tensormesh just emerged from stealth with $4.5 million in seed funding to commercialize technology that can slash AI inference costs by up to 10x. The company's expanded key-value caching system has already caught the attention of Google and Nvidia, who've integrated its open-source LMCache utility into their platforms.
Tensormesh just threw down the gauntlet in the AI infrastructure race. The startup emerged from stealth this week with $4.5 million in seed funding, armed with technology that could fundamentally change how companies think about inference costs.
The funding round, led by Laude Ventures with participation from database pioneer Michael Franklin, comes at a time when AI companies are desperate to squeeze more performance out of their GPU investments. With inference costs spiraling and hardware in short supply, Tensormesh's promise of 10x efficiency gains isn't just compelling - it's potentially game-changing.
At the heart of Tensormesh's approach is an expanded form of key-value caching that fundamentally rethinks how AI models handle memory. Traditional architectures discard the KV cache after each query, forcing models to reprocess information they've already seen. It's a wasteful approach that CEO Junchen Jiang compares to "having a very smart analyst reading all the data, but they forget what they have learned after each question."
Instead of throwing away that processed information, Tensormesh's system preserves and reuses it across queries. The technology builds on the open-source LMCache utility created by co-founder Yihua Cheng, which has already gained traction with major players. Google integrated LMCache into its Google Kubernetes Engine, while Nvidia built it into its own infrastructure tools.
The timing couldn't be better. As companies rush to deploy conversational AI and agentic systems, they're hitting memory walls that make traditional caching approaches increasingly inadequate. Chat interfaces need to constantly reference growing conversation histories, while AI agents accumulate expanding logs of actions and goals. Both scenarios create exactly the kind of repetitive processing that Tensormesh's persistent caching can optimize.
"Keeping the KV cache in a secondary storage system and reused efficiently without slowing the whole system down is a very challenging problem," Jiang explains. The technical complexity is significant enough that some companies are dedicating entire teams to the challenge. "We've seen people hire 20 engineers and spend three or four months to build such a system," he notes.