ADVERTISEMENT

Google’s Gemini AI App Can Now Turn Photos Into Short Video Clips

Google’s photo-to-video feature is powered by Veo 3, the company’s latest video generation model announced in May at its annual developer conference.

<div class="paragraphs"><p>A Google Gemini generative artificial intelligence webpage arranged in Riga, Latvia, on Aug. 16, 2024. (Photo: Andrey Rudakov/Bloomberg)</p></div>
A Google Gemini generative artificial intelligence webpage arranged in Riga, Latvia, on Aug. 16, 2024. (Photo: Andrey Rudakov/Bloomberg)

Alphabet Inc.’s Google is adding the ability for paid users of its Gemini artificial intelligence assistant to turn their photos into short video clips, expanding access to a tool the tech giant launched earlier this year to a more limited audience.

People who subscribe to Google AI Ultra and Pro plans in select regions will be able to use the feature through the web version of Gemini starting Thursday, the company said in a statement. The tool will be rolled out on the Gemini mobile app throughout the week.

The new feature lets users create 8-second clips with sound based on a photo, as well as any text description of the scene they include in the prompt field. The videos will be created as an MP4 file at 720p resolution in a 16:9 landscape format, the company said. 

The update makes the powerful feature accessible via Gemini’s chat interface, helping Google keep pace with US rivals like OpenAI and Runway AI Inc., a startup specializing in AI-generated video. It faces fierce global competition in this space, too: China’s Alibaba Group Holding Ltd., AI startup Manus and Kuaishou Technology have all released new or updated video tools over the past few months.

Google’s photo-to-video feature is powered by Veo 3, the company’s latest video generation model announced in May at its annual developer conference. Veo 3 has been available to users through a standalone paid filmmaking tool called Flow.

Google says it has taken “significant steps behind the scenes to make sure video generation is an appropriate experience.” For example, it doesn’t allow video creation with images of publicly identifiable figures, such as celebrities, presidents or even some well-known CEOs. Its policy also prohibits outputs that encourage dangerous activities or incite violence or bullying against individuals or groups.

But it has its drawbacks. When Bloomberg News tested the feature on the web version of Gemini, uploading personal photos and asking the tool to generate a video of a person talking, the output changed the facial features, and sometimes even the race, of the subject in multiple instances.

While it was able to successfully respond to prompts to create videos of plants moving in the wind or a talking cat based on still images, it wasn’t able to follow more complicated prompts, such as making a person in a photo breakdance. It instead created a video of the person waving to the camera.

There is no instruction in the AI model to change a person's appearance, a Google spokesman said in response to Bloomberg's test results. The photo-to-video generation and face animation features are still a new technology and may build upon a single image in ways that aren't representative of the original image, he added.

The model is better at bringing other scenes to life, such as animating everyday objects, drawings and paintings, and adding movement to nature photos, he said. The company will continue improving the model, including face animation, in future updates.

Opinion
Google Debuts Gemini AI Coding Tool In Bid To Entice Developers
OUR NEWSLETTERS
By signing up you agree to the Terms & Conditions of NDTV Profit