DeepSeek dropped V4 on April 24, and the numbers are worth looking at. One million tokens of context. A Pro version and a Flash version. Input pricing at 2 yuan per million tokens for Pro, 0.5 yuan for Flash. The company also open-sourced the weights and published a technical report. This is not a teaser. It is a shipping product.
I have been watching Chinese AI long enough to know that “preview” often means “barely works.” DeepSeek V4-Pro is different. The company claims it beats every other open model on agentic coding and matches or exceeds top closed-source models on math and STEM benchmarks. On Live Code Bench, internal tests put it above Claude Sonnet 4.5 and close to Opus 4.6 in non-thinking mode. The world knowledge scores trail only Gemini-Pro-3.1. Those are specific claims, and they are testable.
The architecture change is the real story. DeepSeek replaced standard attention with something it calls DSA — DeepSeek Sparse Attention — which compresses tokens instead of computing full attention maps. That is how they hit 1M context without burning a data center. It also explains the pricing. At 2 yuan per million output tokens, V4-Pro undercuts most Western APIs by an order of magnitude. Flash is cheaper still. This is the same playbook DeepSeek ran with R1: make the model good enough, then make the price impossible to ignore.
So who is losing sleep?
Alibaba shipped Qwen3.5-Plus in late February. It is a 397B parameter MoE with only 17B active, built on linear attention and Apache 2.0 licensed. ReLE benchmark tests put it at 74.6% accuracy, second only to ByteDance’s Doubao-Seed-2.0-pro. It is faster than the old Qwen3-Max by 73% and 47% cheaper. On BrowseComp, an agent search benchmark, Qwen3.5 scores 78.6 against Claude Opus 4.5’s 67.8. That is a real gap. But DeepSeek V4-Pro is now the open-source benchmark to beat, and Qwen3.5 is chasing it.
Kimi released K2.5 on January 27, right before Chinese New Year. Moonshot AI calls it an “all-in-one” model with native multimodality, thinking and non-thinking modes, and agent clustering. The company claims it trained K2 using roughly 1% of the compute budget of top US labs. Impressive if true, though I have not seen independent verification of that 1% figure. What I do know is that K2.5’s release video had Yang Zhilin personally presenting it, and the company has since stopped its domestic user acquisition burn. Moonshot is pivoting to overseas developers and vertical agents. Smart move, or an admission that they cannot outspend ByteDance and Tencent on CAC? My gut says a bit of both.
Baidu open-sourced ERNIE 4.5 in June 2025 under Apache 2.0, covering models from 0.3B to 47B parameters. It was a decent release — the 21B text model matched Qwen3 at the same size, and the VL-28B multimodal variant beat OpenAI o1 on some vision tasks. But Baidu has been quiet since then. The developer conference in April 2026 announced ERNIE 4.5 Turbo and X1 Turbo, though benchmark details were thin. Baidu’s problem is not model quality. It is relevance. When developers talk about Chinese open models, the conversation starts with DeepSeek and Qwen. ERNIE is an afterthought.
The “Six Little Tigers” are fragmenting. Zhipu and MiniMax both listed on the Hong Kong Stock Exchange in January 2026. Zhipu at a 55 billion yuan valuation, MiniMax above 90 billion. Good for them. But going public does not make your model better. GLM-5 scores 71.0% on ReLE, behind Qwen3.5 and DeepSeek-V3.2-Think. MiniMax makes most of its money from Talkie and StarWild, AI companion apps that generated roughly $70 million in 2024. That is a consumer entertainment business wearing a foundation model hat. Nothing wrong with that, but it is not the same game DeepSeek is playing.
ByteDance and Tencent are the elephants. Doubao-Seed-2.0-pro still tops the ReLE leaderboard at 76.5%. ByteDance put its火山引擎 brand on the 2026 Spring Festival Gala. Tencent spent a reported 1 billion yuan pushing Yuanbao. These are platform companies using AI to defend their moats. They do not need to win on Hugging Face downloads. They need to keep users inside their apps.
Here is what strikes me. A year ago, the narrative was that China was behind on LLMs, catching up to GPT-4. That framing is dead. The question now is which Chinese model you use, not whether Chinese models are competitive. DeepSeek V4-Pro is priced like a commodity and performs like a flagship. Qwen3.5 is the best open-weights alternative. The rest are either niche players or burning cash to stay visible.
The open-source leaderboard is not everything. Closed-source models from Google and OpenAI still lead on some benchmarks. Gemini 3 Pro reportedly scores 23.4% on MathArena where most rivals are under 2%. But the gap is narrowing, and the price gap is widening. For most real-world tasks, DeepSeek V4 at 2 yuan per million tokens is good enough. That is the definition of disruption.
One thing to watch: DeepSeek’s old API names — deepseek-chat and deepseek-reasoner — get sunset on July 24, 2026. The company is forcing migration to V4-Flash and V4-Pro. That is confidence. Or arrogance. With DeepSeek, the two often look the same.