Google officially launched Gemini 3.1 Flash-Lite, currently the fastest and most cost-efficient model in the Gemini 3 series.
Designed for high-volume developer workloads and latency-sensitive tasks, the model is now available in public preview through Google AI Studio and Vertex AI.
Key Specifications & Capabilities
Gemini 3.1 Flash-Lite is optimized for “Intelligence Density”—delivering high-level reasoning in a lightweight package.
| Feature | Specification |
| Context Window | 1 Million Tokens |
| Output Limit | 64K Tokens (Default 65,535) |
| Multimodality | Natively supports Text, Image, Audio, Video, and PDF inputs. |
| Speed | Up to 380 tokens per second (roughly 45% faster than 2.5 Flash). |
| Knowledge Cutoff | January 2025 |
The “Thinking Mode” Advantage
Following the trend set by the 3.0 series, Flash-Lite includes Adaptive Thinking Levels. This allows developers to toggle between four distinct reasoning modes to balance cost and accuracy:
- Minimal: Near-instant responses for simple classification or routing.
- Low: Balanced for data extraction and translation.
- Medium: Optimized for logic-heavy tasks and code review.
- High: Maximum reasoning for complex multimodal analysis (though this significantly increases output token usage and latency).
Pricing: Built for Scale
Google has priced Flash-Lite to be the “workhorse” of the Gemini family, positioned as a cheaper alternative to the standard 3.1 Flash.
- Input Price: $0.25 per 1 million tokens.
- Output Price: $1.50 per 1 million tokens (including thinking tokens).
- Batch Pricing: For non-urgent workloads, Flex/Batch pricing drops the cost even further to $0.125 (Input) and $0.75 (Output).

Performance Benchmarks
Despite its small size, Gemini 3.1 Flash-Lite approaches or surpasses the performance of the previous generation’s larger models:
- GPQA Diamond (Science): 86.9% (Surpassing Gemini 2.5 Flash).
- MMMU Pro (Multimodal): 76.8%.
- Humanity’s Last Exam: 16.0% (An elite-level reasoning score for a “lite” model).
Top Use Cases
- High-Volume Translation: Processing chat logs, support tickets, or reviews at scale.
- Real-time UI Generation: Powering the logic behind Google’s new Canvas feature to build interactive layouts instantly.
- Multimodal Labeling: Automatically tagging thousands of images or videos with high consistency.
- Lightweight Agents: Handling “intent routing” and entity extraction in complex agentic pipelines.