DeepSeek cut V4-Pro API pricing by 75%

The global AI price war has hit a staggering new low. DeepSeek has officially slashed the API pricing for its newly released flagship model, DeepSeek-V4-Pro, by 75%, while simultaneously reducing input context caching costs across its entire API suite to an unbelievable one-tenth of prior launch rates.

The aggressive undercut lands just days after the open-source debut of DeepSeek-V4 on April 24, 2026, putting massive commercial pressure on Western frontier offerings like OpenAI’s GPT-5.5, Anthropic’s Claude 4.7, and Google’s Gemini 3.1.

1. The Revamped API Price Sheet

The 75% promotional discount—originally set to expire early in the month—has been officially extended by DeepSeek through May 31, 2026, with internal platform indicators suggesting the company intends to make these ultra-low baselines permanent as domestic hardware infrastructure matures.

Token Type (Per 1 Million Tokens)	Standard List Price	Discounted Rate (Through May 31, 2026)
V4-Pro Input (Cache Miss)	$1.74	$0.435
V4-Pro Input (Cache Hit)	$0.145	$0.003625
V4-Pro Output	$3.48	$0.87

For high-volume production applications, its lighter sibling, DeepSeek-V4-Flash, sits at a permanent standard rate of $0.14 per million input tokens ($0.0028 for cache hits) and $0.28 per million output tokens, making it up to 99% cheaper than competing mini and nano models.

2. Micro-Optimization: The Power of Context Caching

The true disruption for enterprise developers lies in the 90% slash to the context caching architecture. Because agentic workflows, long-horizon coding tasks, and multi-file summaries repeatedly pass the same system instructions or codebases, maximizing cache hits effectively drives the operating cost down near zero.

Real-World Math: Under the new framework, running a complex multi-turn developer session utilizing a 100,000-token codebase prefix costs a mere $0.00036 per subsequent turn if the context hits the cache, completely eliminating the compounding token tax that traditionally cripples large-scale agentic deployments.

3. Architecture Over Shifting Hardware

DeepSeek’s ability to sustain these profit-melting prices without relying on massive, scarce Nvidia clusters stems from deep mathematical upgrades built directly into the V4 architecture:

Sparse Attention & Dimension Compression: DeepSeek-V4-Pro utilizes a massive Mixture-of-Experts (MoE) configuration housing 1.6 trillion total parameters, but fine-grained routing means only 49 billion parameters are active during any single token invocation. This reduces per-token compute demands by 73% compared to legacy architectures.
Domestic Infrastructure Synergy: DeepSeek has optimized V4 to run natively on Huawei Ascend 950 supernode clusters and Cambricon hardware rather than Western silicon. DeepSeek explicitly stated that as mass production of these domestic supernodes accelerates through late 2026, inference costs are structurally positioned to decline even further.

4. Native Tool Integration

To capture immediate Western developer mindshare, the updated API endpoints feature native, out-of-the-box compatibility handles for both standard OpenAI ChatCompletions and Anthropic API harnesses. This allows the 1-million-context V4-Pro model to be dropped directly as a drop-in replacement into popular autonomous coding frameworks like Claude Code, Cline, and OpenClaw CLI by simply swapping the baseline URL.

Get the day’s top stories in your inbox

One concise email. No spam, unsubscribe anytime.

1. The Revamped API Price Sheet

2. Micro-Optimization: The Power of Context Caching

3. Architecture Over Shifting Hardware

4. Native Tool Integration

Related Stories

Tencent release Hy3

Reddit is using LLMs to solve a problem LLMs largely created

Anthropic discovers new internal workspace J-Space

Leave a Comment Cancel reply