Home Technology Artificial Intelligence Google launch ‘Gemini Embedding 2’

Google launch ‘Gemini Embedding 2’

0

Google officially launched Gemini Embedding 2, its first natively multimodal embedding model. Now available in Public Preview via the Gemini API and Vertex AI, this model marks a massive shift from text-only systems to a unified “multimodal” brain.

Unlike traditional pipelines that require converting images to captions or audio to transcripts before processing, Gemini Embedding 2 understands multiple formats in a single, shared mathematical space

https://lapaasvoice.b-cdn.net/wp-content/uploads/2026/03/gemini-2-multimodal-embeddings.mp4

Key Capabilities & Modal Limits

The model allows you to map various data types into a single vector space, enabling complex “cross-modal” searches (e.g., using a text query to find a specific moment in a 2-minute video).

ModalityInput Limits / Specs
TextUp to 8,192 tokens per request.
ImagesProcesses up to 6 images (PNG/JPEG) in a single request.
VideoSupports clips up to 120 seconds (MP4/MOV).
AudioNatively embeds audio without needing a text transcript.
DocumentsDirectly embeds PDFs up to 6 pages long.
LanguagesSupports semantic intent across 100+ languages.

Advanced Technical Features

  • Interleaved Input: You can now pass a mix of modalities (e.g., an image + a text description) in a single request. This helps the AI understand the “nuanced relationship” between a visual and its context.
  • Flexible Dimensionality (MRL): Using Matryoshka Representation Learning, the model allows you to scale the output dimensions down from the default 3,072 to 1,536 or 768.
    • Tip: This lets you use smaller vectors for fast, cheap candidate retrieval and full-size vectors only when you need maximum precision.
  • RAG & Semantic Search: The model is specifically optimized for Retrieval-Augmented Generation (RAG) and data clustering, simplifying the infrastructure needed for “AI-powered knowledge bases.”

Why This Matters

Previously, developers had to manage separate “file cabinets” (vector indexes) for text and images. If you had a photo of a cat and a text document about cats, the AI might not realize they were the same concept.

Gemini Embedding 2 fixes this: it treats the word “cat,” the sound of a “meow,” and the image of a cat as the same semantic point. This enables a new generation of apps where you can search your entire company’s video archives, meeting recordings, and PDF manuals using a single search bar.


How to Get Started

The model is listed in the API as gemini-embedding-2-preview. Major vector database partners like Qdrant, ChromaDB, and Weaviate announced day-one support for the new architecture.

NO COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Exit mobile version