Google just turned its Gemini chatbot into a music studio. The company is rolling out beta access to DeepMind's Lyria 3 audio model directly inside the Gemini app, letting users generate 30-second tracks from text prompts, images, or videos without switching tabs. The move signals Google's push to make generative AI more creative and multimodal, putting music creation alongside text and image generation in one unified interface.
Google is making its biggest play yet in AI-generated audio, bringing music creation directly to the 100+ million people using its Gemini chatbot. Starting today, users can type a prompt like "upbeat jazz for a rainy afternoon" or upload a video clip and watch Gemini compose a custom 30-second soundtrack on the spot. No separate app, no export-import workflow - just music generation baked right into the conversation.
The feature runs on Lyria 3, the latest audio model from DeepMind, Google's AI research lab. Unlike earlier versions that required standalone tools or API access, Lyria 3 lives natively in the Gemini interface. Users can generate tracks from text descriptions, upload images to set the mood, or even drop in video clips to get matching background music. It's Google's answer to the multimodal AI race, where the goal isn't just understanding different media types but creating them too.
The rollout is global from day one, with support for English, German, Spanish, French, Hindi, Japanese, Korean, and Portuguese. Google says more languages are coming, though it didn't specify a timeline. Access is limited to adults 18 and older - likely a guardrail against copyright and content moderation headaches that have plagued other generative audio tools.
Timing matters here. Google has been playing catch-up in consumer AI since OpenAI took the world by storm with ChatGPT in late 2022. While OpenAI has hinted at audio generation features and Meta continues expanding its AI Studio, Google's been methodically building out Gemini's creative capabilities. The company already offers image generation through and video tools via experimental features. Music was the missing piece.











