Google Deepmind Rolls Out Gemini 3.1 Flash Text-To-Speech Model With Customisable Audio Tags

It enables the user to easily direct vocal style, delivery, and pace through text commands, Google said.

Prajwal Jayaraj
Technology
Apr 15, 2026 22:39 pm IST
- Published On Apr 15, 2026 22:13 pm IST
- Last Updated On Apr 15, 2026 22:39 pm IST

Read Time: 2 mins

Twitter
WhatsApp
Facebook
Reddit
Email

Google Deepmind Rolls Out Gemini 3.1 Flash Text-To-Speech Model With Customisable Audio Tags

It enables the user to easily direct vocal style, delivery, and pace through text commands.

Photo Source: Google Blog

Google Deepmind launched Gemini 3.1 Flash TTS for advanced text-to-speech control
The model offers vocal style, delivery, pace, and accent options including English variants
Users can select formats like RPG dialogue, podcast, audiobook, and news broadcast styles

Did our AI summary help?

Let us know.

Switch To Beeps Mode

Google Deepmind has rolled out its latest text-to-spech AI model Gemini 3.1 Flash TTS, which enables the user to "easily direct vocal style, delivery, and pace through text commands", according to a post from the company on social media platform 'X' on Wednesday.

As show in the video, the AI model will provide advanced options for the voice being projected by the model, such as inflection and tone such as "enthusiastic", "positive surprise", "informative" and more.

Users also have the options to pick various accents within languages, with the English language itself having a myriad of accent options such as an American 'Valley' or 'Southern' accent, along with options such as a British accent with variations such as 'Brixton' and 'RP', as well as 'Transatlantic' among many others.

Gemini 3.1 Flash TTS is our most controllable text-to-speech model yet.

With new Audio Tags, you can easily direct vocal style, delivery, and pace through text commands. 🧵 pic.twitter.com/Bq4SD8eLUN

— Google DeepMind (@GoogleDeepMind) April 15, 2026

The application also seemed to provide 'Director Level' control with 'Style' and 'Pace' options. It also seemed to provide format templates for users to choose from such as RPG (role playing game) Dialogue, Support Agent, Podcast Conversation, Audibook Narrator, Language Tutor, Voice Assistant, Wellness Guide and News Broadcast.

Google said users can "set the stage" by defining the environment and providing specific dialogue instructions and also easily export it as API code.

ALSO READ: Microsoft's New AI-Powered Copilot Features In Word: Track Changes, Contextual Comments, Table Of Contents

"This world-building context helps characters remain “in-character” and react to one another naturally across multiple turns," the company said in its blog post.

"Once the performance is perfected, these exact parameters can be exported as Gemini API code to ensure consistent, recognizable voices across various projects and platforms," it added.

Google Deepmind said that the model offers "more natural sounding speech" along with support for more than 70 languages such as Japanese, Hindi and German. It also stated that the model has 'SynthID watermarking' on all outputs, for reliable detection of AI-generated content.

The firm said that Gemini 3.1 Flash TTS ranked second in the Artificial Analysis Text-To-Speech Arena Quality Elo with a score of 1211.

Developers can acces the preview via Gemini API and Google AI Studio. For enterprises it rolling out in preview on Vertex AI. Everyone else can access it on Goggle Vids AI.

ALSO READ: India Among World's Most Advanced AI Users But Adoption Uneven: OpenAI Report

Essential Business Intelligence, Continuous LIVE TV, Sharp Market Insights, Practical Personal Finance Advice and Latest Stories — On NDTV Profit.