Alibaba Launch 'Qwen3.5' small series

Alibaba’s Qwen team officially completed its newest generation lineup by launching the Qwen3.5 Small Series.

Following the release of the massive 397B flagship in February, this new family focuses on “Intelligence Density”—delivering high-level reasoning and native multimodality in packages small enough to run on consumer laptops, mobile phones, and IoT devices.

The Qwen3.5 Small Lineup

The series consists of four models, all released under the Apache 2.0 license and available on Hugging Face and ModelScope.

Model	Parameters	Primary Use Case	Key Performance Highlight
Qwen3.5-0.8B	800 Million	Edge/IoT devices	Ultra-fast inference with low VRAM footprint.
Qwen3.5-2B	2 Billion	Mobile/On-device	Smallest model to support “Thinking Mode” by default.
Qwen3.5-4B	4 Billion	Lightweight Agents	Balanced “Goldilocks” model for multimodal agents.
Qwen3.5-9B	9 Billion	Desktop/Consumer GPU	Beats the previous generation’s 30B models.

Technical Breakthroughs

Thinking Mode at Scale: The 2B model is a major milestone, as it is the smallest model in the industry to feature a toggleable “Thinking Mode.” This allows the model to perform step-by-step reasoning, significantly boosting its performance on complex logic tasks (e.g., pushing its IFEval score from 61.2 to 78.6).
Native Multimodality: Unlike previous small models that used external “vision towers,” the Qwen3.5 small series is natively multimodal. Even the 0.8B version can process images and video directly, with the 9B model reportedly outperforming GPT-5-Nano on vision benchmarks.
Hybrid Architecture: The models utilize a 3:1 hybrid of Gated DeltaNet (linear attention) and standard Gated Attention. This design allows for high-throughput decoding and a native 262K context window, extensible up to 1 million tokens.
Hardware Efficiency: * The 2B model fits into roughly 4GB of VRAM, making it viable for Raspberry Pi-class devices.
- The 9B model, when 4-bit quantized, requires only 5GB of VRAM, allowing it to run on older hardware like the NVIDIA RTX 3060 or base M1 Macs.

Market Impact

The launch has been praised by tech leaders, including Elon Musk, who described the series on X as having “impressive intelligence density.” By releasing base models alongside instruction-tuned variants, Alibaba is positioning itself as the primary infrastructure provider for the “on-device AI” era, challenging both the Google Pixel and Apple Intelligence ecosystems.

“Frontier-level reasoning at a fraction of the compute bill is no longer a theoretical promise. It’s a benchmark result.” — Alibaba Qwen Team

Lapaas Voice

Subscribe to newsletter

Startup

Artificial Intelligence

Funding

Case Studies

Lapaas Voice

Startup

Artificial Intelligence

Funding

Case Studies

Lapaas Voice

Trending

Related Posts

Alibaba Launch ‘Qwen3.5’ small series

The Qwen3.5 Small Lineup

Technical Breakthroughs

Market Impact

LEAVE A REPLY Cancel reply

Popular Articles

Lapaas Voice

About us

Latest Articles

Most Popular

Subscribe