Get App
Download App Scanner
Scan to Download
Advertisement

Google Deepmind Rolls Out Gemini 3.1 Flash Text-To-Speech Model With Customisable Audio Tags

It enables the user to easily direct vocal style, delivery, and pace through text commands, Google said.

Google Deepmind Rolls Out Gemini 3.1 Flash Text-To-Speech Model With Customisable Audio Tags
It enables the user to easily direct vocal style, delivery, and pace through text commands.
Photo Source: Google Blog
  • Google Deepmind launched Gemini 3.1 Flash TTS for advanced text-to-speech control
  • The model offers vocal style, delivery, pace, and accent options including English variants
  • Users can select formats like RPG dialogue, podcast, audiobook, and news broadcast styles
Did our AI summary help?
Let us know.

Google Deepmind has rolled out its latest text-to-spech AI model Gemini 3.1 Flash TTS, which enables the user to "easily direct vocal style, delivery, and pace through text commands", according to a post from the company on social media platform 'X' on Wednesday.

As show in the video, the AI model will provide advanced options for the voice being projected by the model, such as inflection and tone such as "enthusiastic", "positive surprise", "informative" and more.

Users also have the options to pick various accents within languages, with the English language itself having a myriad of accent options such as an American 'Valley' or 'Southern' accent, along with options such as a British accent with variations such as 'Brixton' and 'RP', as well as 'Transatlantic' among many others.

The application also seemed to provide 'Director Level' control with 'Style' and 'Pace' options. It also seemed to provide format templates for users to choose from such as RPG (role playing game) Dialogue, Support Agent, Podcast Conversation, Audibook Narrator, Language Tutor, Voice Assistant, Wellness Guide and News Broadcast.

Google said users can "set the stage" by defining the environment and providing specific dialogue instructions and also easily export it as API code.

ALSO READ: Microsoft's New AI-Powered Copilot Features In Word: Track Changes, Contextual Comments, Table Of Contents

"This world-building context helps characters remain “in-character” and react to one another naturally across multiple turns," the company said in its blog post.

"Once the performance is perfected, these exact parameters can be exported as Gemini API code to ensure consistent, recognizable voices across various projects and platforms," it added.

Google Deepmind said that the model offers "more natural sounding speech" along with support for more than 70 languages such as Japanese, Hindi and German. It also stated that the model has 'SynthID watermarking' on all outputs, for reliable detection of AI-generated content.

The firm said that Gemini 3.1 Flash TTS ranked second in the Artificial Analysis Text-To-Speech Arena Quality Elo with a score of 1211.

Developers can acces the preview via Gemini API and Google AI Studio. For enterprises it rolling out in preview on Vertex AI. Everyone else can access it on Goggle Vids AI.

ALSO READ: India Among World's Most Advanced AI Users But Adoption Uneven: OpenAI Report

Essential Business Intelligence, Continuous LIVE TV, Sharp Market Insights, Practical Personal Finance Advice and Latest Stories — On NDTV Profit.

Newsletters

Update Email
to get newsletters straight to your inbox
⚠️ Add your Email ID to receive Newsletters
Note: You will be signed up automatically after adding email

News for You

Set as Trusted Source
on Google Search
Add NDTV Profit As Google Preferred Source