Thursday, March 5, 2026

Trending

Related Posts

Google launch Google launch Gemini 3.1 Flash-Lite

Google officially launched Gemini 3.1 Flash-Lite, currently the fastest and most cost-efficient model in the Gemini 3 series.

Designed for high-volume developer workloads and latency-sensitive tasks, the model is now available in public preview through Google AI Studio and Vertex AI.


Key Specifications & Capabilities

Gemini 3.1 Flash-Lite is optimized for “Intelligence Density”โ€”delivering high-level reasoning in a lightweight package.

FeatureSpecification
Context Window1 Million Tokens
Output Limit64K Tokens (Default 65,535)
MultimodalityNatively supports Text, Image, Audio, Video, and PDF inputs.
SpeedUp to 380 tokens per second (roughly 45% faster than 2.5 Flash).
Knowledge CutoffJanuary 2025

The “Thinking Mode” Advantage

Following the trend set by the 3.0 series, Flash-Lite includes Adaptive Thinking Levels. This allows developers to toggle between four distinct reasoning modes to balance cost and accuracy:

  1. Minimal: Near-instant responses for simple classification or routing.
  2. Low: Balanced for data extraction and translation.
  3. Medium: Optimized for logic-heavy tasks and code review.
  4. High: Maximum reasoning for complex multimodal analysis (though this significantly increases output token usage and latency).

Pricing: Built for Scale

Google has priced Flash-Lite to be the “workhorse” of the Gemini family, positioned as a cheaper alternative to the standard 3.1 Flash.

  • Input Price: $0.25 per 1 million tokens.
  • Output Price: $1.50 per 1 million tokens (including thinking tokens).
  • Batch Pricing: For non-urgent workloads, Flex/Batch pricing drops the cost even further to $0.125 (Input) and $0.75 (Output).

Performance Benchmarks

Despite its small size, Gemini 3.1 Flash-Lite approaches or surpasses the performance of the previous generation’s larger models:

  • GPQA Diamond (Science): 86.9% (Surpassing Gemini 2.5 Flash).
  • MMMU Pro (Multimodal): 76.8%.
  • Humanity’s Last Exam: 16.0% (An elite-level reasoning score for a “lite” model).

Top Use Cases

  • High-Volume Translation: Processing chat logs, support tickets, or reviews at scale.
  • Real-time UI Generation: Powering the logic behind Google’s new Canvas feature to build interactive layouts instantly.
  • Multimodal Labeling: Automatically tagging thousands of images or videos with high consistency.
  • Lightweight Agents: Handling “intent routing” and entity extraction in complex agentic pipelines.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Popular Articles