Google launch 'Gemini Embedding 2'

Google officially launched Gemini Embedding 2, its first natively multimodal embedding model. Now available in Public Preview via the Gemini API and Vertex AI, this model marks a massive shift from text-only systems to a unified “multimodal” brain.

Unlike traditional pipelines that require converting images to captions or audio to transcripts before processing, Gemini Embedding 2 understands multiple formats in a single, shared mathematical space

Key Capabilities & Modal Limits

The model allows you to map various data types into a single vector space, enabling complex “cross-modal” searches (e.g., using a text query to find a specific moment in a 2-minute video).

Modality	Input Limits / Specs
Text	Up to 8,192 tokens per request.
Images	Processes up to 6 images (PNG/JPEG) in a single request.
Video	Supports clips up to 120 seconds (MP4/MOV).
Audio	Natively embeds audio without needing a text transcript.
Documents	Directly embeds PDFs up to 6 pages long.
Languages	Supports semantic intent across 100+ languages.

Advanced Technical Features

Interleaved Input: You can now pass a mix of modalities (e.g., an image + a text description) in a single request. This helps the AI understand the “nuanced relationship” between a visual and its context.
Flexible Dimensionality (MRL): Using Matryoshka Representation Learning, the model allows you to scale the output dimensions down from the default 3,072 to 1,536 or 768.
- Tip: This lets you use smaller vectors for fast, cheap candidate retrieval and full-size vectors only when you need maximum precision.
RAG & Semantic Search: The model is specifically optimized for Retrieval-Augmented Generation (RAG) and data clustering, simplifying the infrastructure needed for “AI-powered knowledge bases.”

Why This Matters

Previously, developers had to manage separate “file cabinets” (vector indexes) for text and images. If you had a photo of a cat and a text document about cats, the AI might not realize they were the same concept.

Gemini Embedding 2 fixes this: it treats the word “cat,” the sound of a “meow,” and the image of a cat as the same semantic point. This enables a new generation of apps where you can search your entire company’s video archives, meeting recordings, and PDF manuals using a single search bar.

How to Get Started

The model is listed in the API as gemini-embedding-2-preview. Major vector database partners like Qdrant, ChromaDB, and Weaviate announced day-one support for the new architecture.

Lapaas Voice

Subscribe to newsletter

Startup

Artificial Intelligence

Funding

Case Studies

Lapaas Voice

Startup

Artificial Intelligence

Funding

Case Studies

Lapaas Voice

Trending

Related Posts

Google launch ‘Gemini Embedding 2’

Key Capabilities & Modal Limits

Advanced Technical Features

Why This Matters

How to Get Started

LEAVE A REPLY Cancel reply

Popular Articles

Lapaas Voice

About us

Latest Articles

Most Popular

Subscribe