Wednesday, March 11, 2026

Trending

Related Posts

Google launch ‘Gemini Embedding 2’

Google officially launched Gemini Embedding 2, its first natively multimodal embedding model. Now available in Public Preview via the Gemini API and Vertex AI, this model marks a massive shift from text-only systems to a unified “multimodal” brain.

Unlike traditional pipelines that require converting images to captions or audio to transcripts before processing, Gemini Embedding 2 understands multiple formats in a single, shared mathematical space

Key Capabilities & Modal Limits

The model allows you to map various data types into a single vector space, enabling complex “cross-modal” searches (e.g., using a text query to find a specific moment in a 2-minute video).

ModalityInput Limits / Specs
TextUp to 8,192 tokens per request.
ImagesProcesses up to 6 images (PNG/JPEG) in a single request.
VideoSupports clips up to 120 seconds (MP4/MOV).
AudioNatively embeds audio without needing a text transcript.
DocumentsDirectly embeds PDFs up to 6 pages long.
LanguagesSupports semantic intent across 100+ languages.

Advanced Technical Features

  • Interleaved Input: You can now pass a mix of modalities (e.g., an image + a text description) in a single request. This helps the AI understand the “nuanced relationship” between a visual and its context.
  • Flexible Dimensionality (MRL): Using Matryoshka Representation Learning, the model allows you to scale the output dimensions down from the default 3,072 to 1,536 or 768.
    • Tip: This lets you use smaller vectors for fast, cheap candidate retrieval and full-size vectors only when you need maximum precision.
  • RAG & Semantic Search: The model is specifically optimized for Retrieval-Augmented Generation (RAG) and data clustering, simplifying the infrastructure needed for “AI-powered knowledge bases.”

Why This Matters

Previously, developers had to manage separate “file cabinets” (vector indexes) for text and images. If you had a photo of a cat and a text document about cats, the AI might not realize they were the same concept.

Gemini Embedding 2 fixes this: it treats the word “cat,” the sound of a “meow,” and the image of a cat as the same semantic point. This enables a new generation of apps where you can search your entire companyโ€™s video archives, meeting recordings, and PDF manuals using a single search bar.


How to Get Started

The model is listed in the API as gemini-embedding-2-preview. Major vector database partners like Qdrant, ChromaDB, and Weaviate announced day-one support for the new architecture.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Popular Articles