What Is Gemini Embedding 2 — Google's First Multimodal AI Model That Maps Text, Images, Video, Audio Together?

Google has launched Gemini Embedding 2, its first fully multimodal embedding model.

Author: Madhur Chaturvedi
Technology
Mar 11, 2026 12:24 IST

Read Time: 2 mins

Google has launched Gemini Embedding 2, its first fully multimodal embedding model.

Google

Google has launched Gemini Embedding 2, its first fully multimodal embedding model based on the Gemini system. This model goes beyond older text-only versions. It places text, images, videos, audio, and documents into one shared embedding space. According to Google, it works across more than 100 languages and grasps the meaning of ideas no matter how they appear — whether written, spoken, or shown visually.

How Is Google's Gemini Embedding 2 Different?

Traditional unimodal AI models handle only one kind of data at a time, and most AI models keep different types of data in separate “spaces.” For example, they would treat the word “cake” in a text file and a cake shown in a video as unrelated items. Multimodal AI, like Gemini Embedding 2, works with several data types together in a single request, such as an image, text, videos, and more.

Embedding 2 also places them in a “single, unified embedding space,” which simplifies complex pipelines and makes it easier for the AI to work across formats. For users, it means the AI can process more than one inputs (for example, image plus text) in a single request instead of using same or similar prompts for different modalities of input. It can also better relate different forms of data and understand real-world content more accurately.

Google Gemini Embedding 2 Capabilities

According to Google, Gemini Embedding 2 can handle text with a large context of up to 8192 tokens (which are small units of data that AI models use to process, understand, and generate language). For images, it can process up to six per request in PNG or JPEG format.

The AI model supports videos up to 120 seconds long in MP4 or MOV files. It works directly with audio without first transcribing it into text and can also embed PDFs that are up to six pages in length.

Also read: Amazon Expands Health AI To Website, App — Here's How To Get Medical Help Using The Agent

Essential Business Intelligence, Sharp Market Insights, Practical Personal Finance Advice, Daily Fuel, Gold and Silver Prices and Latest Stories — On NDTV Profit.

What Is Gemini Embedding 2 — Google's First Multimodal AI Model That Maps Text, Images, Video, Audio Together?

Google has launched Gemini Embedding 2, its first fully multimodal embedding model.

How Is Google's Gemini Embedding 2 Different?

Google Gemini Embedding 2 Capabilities

News for You

Oil India Shares Slump 3%; BPCL, HPCL And IOCL Gain As Crude Prices Fall Below $90

Kalyan Jewellers Likely To See Block Deal Of About 5 Crore Shares — Details Inside

Jindal Steel Shares Get Rating Upgrade After Q1 Beat From IDBI Capital — Check Target Price, Upside

The Odyssey Leaks Online: Christopher Nolan's Film Watched By Millions Before Takedown?