Alibaba’s Qwen team officially completed its newest generation lineup by launching the Qwen3.5 Small Series.
Following the release of the massive 397B flagship in February, this new family focuses on “Intelligence Density”—delivering high-level reasoning and native multimodality in packages small enough to run on consumer laptops, mobile phones, and IoT devices.
The Qwen3.5 Small Lineup
The series consists of four models, all released under the Apache 2.0 license and available on Hugging Face and ModelScope.
| Model | Parameters | Primary Use Case | Key Performance Highlight |
| Qwen3.5-0.8B | 800 Million | Edge/IoT devices | Ultra-fast inference with low VRAM footprint. |
| Qwen3.5-2B | 2 Billion | Mobile/On-device | Smallest model to support “Thinking Mode” by default. |
| Qwen3.5-4B | 4 Billion | Lightweight Agents | Balanced “Goldilocks” model for multimodal agents. |
| Qwen3.5-9B | 9 Billion | Desktop/Consumer GPU | Beats the previous generation’s 30B models. |
Technical Breakthroughs
- Thinking Mode at Scale: The 2B model is a major milestone, as it is the smallest model in the industry to feature a toggleable “Thinking Mode.” This allows the model to perform step-by-step reasoning, significantly boosting its performance on complex logic tasks (e.g., pushing its IFEval score from 61.2 to 78.6).
- Native Multimodality: Unlike previous small models that used external “vision towers,” the Qwen3.5 small series is natively multimodal. Even the 0.8B version can process images and video directly, with the 9B model reportedly outperforming GPT-5-Nano on vision benchmarks.
- Hybrid Architecture: The models utilize a 3:1 hybrid of Gated DeltaNet (linear attention) and standard Gated Attention. This design allows for high-throughput decoding and a native 262K context window, extensible up to 1 million tokens.
- Hardware Efficiency: * The 2B model fits into roughly 4GB of VRAM, making it viable for Raspberry Pi-class devices.
- The 9B model, when 4-bit quantized, requires only 5GB of VRAM, allowing it to run on older hardware like the NVIDIA RTX 3060 or base M1 Macs.
Market Impact
The launch has been praised by tech leaders, including Elon Musk, who described the series on X as having “impressive intelligence density.” By releasing base models alongside instruction-tuned variants, Alibaba is positioning itself as the primary infrastructure provider for the “on-device AI” era, challenging both the Google Pixel and Apple Intelligence ecosystems.
“Frontier-level reasoning at a fraction of the compute bill is no longer a theoretical promise. It’s a benchmark result.” — Alibaba Qwen Team
