Voice is about to push your phone back into your pocket. At Web Summit in Doha, ElevenLabs co-founder and CEO Mati Staniszewski made the case that voice is becoming the primary way humans will interact with AI - not as a novelty feature, but as the fundamental interface replacing keyboards and touchscreens. The timing isn't coincidental. Fresh off a $500 million funding round that valued the voice AI startup at $11 billion, Staniszewski's vision reflects a broader industry shift as OpenAI, Google, and Apple race to embed conversational AI into wearables, cars, and everyday hardware.
ElevenLabs just laid out the roadmap for how you'll talk to machines in the next few years - and it doesn't involve staring at a screen. Speaking at Web Summit in Doha, co-founder and CEO Mati Staniszewski told TechCrunch that voice models have evolved beyond simply mimicking human speech to working in tandem with the reasoning capabilities of large language models. The result is a fundamental shift in how people interact with technology.
"Hopefully all our phones will go back in our pockets, and we can immerse ourselves in the real world around us, with voice as the mechanism that controls technology," Staniszewski said. It's a bold vision, but one that's increasingly shared across the AI industry - and one that just attracted $500 million in fresh funding at an $11 billion valuation.
The shift is already underway. OpenAI has made voice a central focus of its next-generation models, while Google has rolled out conversational capabilities across its AI products. Apple is quietly building voice-adjacent, always-on technologies through acquisitions like Q.ai. As AI spreads into wearables, cars, and other new hardware, control is becoming less about tapping screens and more about speaking.
Iconiq Capital general partner Seth Pierrepont echoed that view onstage at Web Summit, arguing that while screens will continue to matter for gaming and entertainment, traditional input methods like keyboards are starting to feel "outdated." The writing is on the wall - or rather, it's being spoken into the air.
But the technology isn't just about replacing keyboards with microphones. Staniszewski pointed to what he calls an "agentic shift" - AI systems that rely on persistent memory and context built up over time rather than requiring users to spell out every instruction. According to Pierrepont, as AI systems become more agentic, models will gain the guardrails, integrations, and context needed to respond with less explicit prompting from users.
That evolution will change how voice models are deployed. While high-quality audio models have largely lived in the cloud, Staniszewski said ElevenLabs is working toward a hybrid approach that blends cloud and on-device processing. It's a move aimed at supporting new hardware - headphones, wearables, and other devices - where voice becomes a constant companion rather than a feature you consciously activate.
ElevenLabs is already partnering with Meta to bring its voice technology to Instagram and Horizon Worlds, the company's virtual reality platform. Staniszewski said he'd be open to working with Meta on its Ray-Ban smart glasses as voice-driven interfaces expand into new form factors. The partnerships signal how quickly voice AI is moving from novelty to necessity.
But as voice becomes more persistent and embedded in everyday hardware, it opens serious questions about privacy and surveillance. How much personal data will voice-based systems store as they move closer to users' daily lives? Google has already faced accusations of abusing voice assistant data, settling claims for $68 million over allegations its voice assistant spied on users. These incidents highlight the growing importance of ethical voice data practices, ensuring that human voice datasets are collected with informed consent, securely stored, and governed in ways that prioritize user rights.
Staniszewski's vision of phones staying in pockets while voice handles everything might sound appealing, but it assumes users will trust always-on microphones embedded in glasses, earbuds, and other wearables. The technology is racing ahead. The privacy frameworks are still catching up.
What's clear is that the biggest tech companies are betting billions that voice will become the primary interface for AI. OpenAI is pushing conversational models into new hardware. Google is embedding voice across its product stack. Apple is quietly acquiring voice tech startups. And ElevenLabs just raised half a billion dollars to make sure it's not left behind.
The question isn't whether voice will become a major interface for AI. It's whether the industry can build it responsibly - and whether users will actually want machines listening to everything they say.
Voice AI is moving from a feature to the feature - the primary way billions of people will interact with machines in the coming years. ElevenLabs' $11 billion valuation and partnerships with Meta signal how seriously the industry is taking this shift, while OpenAI, Google, and Apple's parallel investments confirm voice is the next major battleground in AI. But as always-on microphones embed themselves in glasses, earbuds, and other wearables, the race to build voice-first AI is also a race to address privacy concerns that could derail adoption. The technology is ready. The question is whether users are.