Tuesday, March 3, 2026

Trending

Related Posts

Alibaba Launch ‘Qwen3.5’ small series

Alibaba’s Qwen team officially completed its newest generation lineup by launching the Qwen3.5 Small Series.

Following the release of the massive 397B flagship in February, this new family focuses on “Intelligence Density”—delivering high-level reasoning and native multimodality in packages small enough to run on consumer laptops, mobile phones, and IoT devices.


The Qwen3.5 Small Lineup

The series consists of four models, all released under the Apache 2.0 license and available on Hugging Face and ModelScope.

ModelParametersPrimary Use CaseKey Performance Highlight
Qwen3.5-0.8B800 MillionEdge/IoT devicesUltra-fast inference with low VRAM footprint.
Qwen3.5-2B2 BillionMobile/On-deviceSmallest model to support “Thinking Mode” by default.
Qwen3.5-4B4 BillionLightweight AgentsBalanced “Goldilocks” model for multimodal agents.
Qwen3.5-9B9 BillionDesktop/Consumer GPUBeats the previous generation’s 30B models.

Technical Breakthroughs

  1. Thinking Mode at Scale: The 2B model is a major milestone, as it is the smallest model in the industry to feature a toggleable “Thinking Mode.” This allows the model to perform step-by-step reasoning, significantly boosting its performance on complex logic tasks (e.g., pushing its IFEval score from 61.2 to 78.6).
  2. Native Multimodality: Unlike previous small models that used external “vision towers,” the Qwen3.5 small series is natively multimodal. Even the 0.8B version can process images and video directly, with the 9B model reportedly outperforming GPT-5-Nano on vision benchmarks.
  3. Hybrid Architecture: The models utilize a 3:1 hybrid of Gated DeltaNet (linear attention) and standard Gated Attention. This design allows for high-throughput decoding and a native 262K context window, extensible up to 1 million tokens.
  4. Hardware Efficiency: * The 2B model fits into roughly 4GB of VRAM, making it viable for Raspberry Pi-class devices.
    • The 9B model, when 4-bit quantized, requires only 5GB of VRAM, allowing it to run on older hardware like the NVIDIA RTX 3060 or base M1 Macs.

Market Impact

The launch has been praised by tech leaders, including Elon Musk, who described the series on X as having “impressive intelligence density.” By releasing base models alongside instruction-tuned variants, Alibaba is positioning itself as the primary infrastructure provider for the “on-device AI” era, challenging both the Google Pixel and Apple Intelligence ecosystems.

“Frontier-level reasoning at a fraction of the compute bill is no longer a theoretical promise. It’s a benchmark result.” — Alibaba Qwen Team

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Popular Articles