Google launch Google launch Gemini 3.1 Flash-Lite

March 5, 2026

267

Google officially launched Gemini 3.1 Flash-Lite, currently the fastest and most cost-efficient model in the Gemini 3 series.

Designed for high-volume developer workloads and latency-sensitive tasks, the model is now available in public preview through Google AI Studio and Vertex AI.

Key Specifications & Capabilities

Gemini 3.1 Flash-Lite is optimized for “Intelligence Density”—delivering high-level reasoning in a lightweight package.

Feature	Specification
Context Window	1 Million Tokens
Output Limit	64K Tokens (Default 65,535)
Multimodality	Natively supports Text, Image, Audio, Video, and PDF inputs.
Speed	Up to 380 tokens per second (roughly 45% faster than 2.5 Flash).
Knowledge Cutoff	January 2025

The “Thinking Mode” Advantage

Following the trend set by the 3.0 series, Flash-Lite includes Adaptive Thinking Levels. This allows developers to toggle between four distinct reasoning modes to balance cost and accuracy:

Minimal: Near-instant responses for simple classification or routing.
Low: Balanced for data extraction and translation.
Medium: Optimized for logic-heavy tasks and code review.
High: Maximum reasoning for complex multimodal analysis (though this significantly increases output token usage and latency).

Pricing: Built for Scale

Google has priced Flash-Lite to be the “workhorse” of the Gemini family, positioned as a cheaper alternative to the standard 3.1 Flash.

Input Price: $0.25 per 1 million tokens.
Output Price: $1.50 per 1 million tokens (including thinking tokens).
Batch Pricing: For non-urgent workloads, Flex/Batch pricing drops the cost even further to $0.125 (Input) and $0.75 (Output).

Performance Benchmarks

Despite its small size, Gemini 3.1 Flash-Lite approaches or surpasses the performance of the previous generation’s larger models:

GPQA Diamond (Science): 86.9% (Surpassing Gemini 2.5 Flash).
MMMU Pro (Multimodal): 76.8%.
Humanity’s Last Exam: 16.0% (An elite-level reasoning score for a “lite” model).

Top Use Cases

High-Volume Translation: Processing chat logs, support tickets, or reviews at scale.
Real-time UI Generation: Powering the logic behind Google’s new Canvas feature to build interactive layouts instantly.
Multimodal Labeling: Automatically tagging thousands of images or videos with high consistency.
Lightweight Agents: Handling “intent routing” and entity extraction in complex agentic pipelines.

Lapaas Voice

Subscribe to newsletter

Startup

Artificial Intelligence

Funding

Case Studies

Lapaas Voice

Startup

Artificial Intelligence

Funding

Case Studies

Lapaas Voice

Key Specifications & Capabilities

The “Thinking Mode” Advantage

Pricing: Built for Scale

Performance Benchmarks

Top Use Cases

LEAVE A REPLY Cancel reply

Lapaas Voice

About us

Latest Articles

Most Popular

Subscribe

LEAVE A REPLY