DeepSeek released a preview of V4 on April 24, 2026. Two variants: V4-Pro at 1.6T total parameters with 49B active, and V4-Flash at 284B total with 13B active. Both ship with 1M context length as standard. The company says V4-Pro leads open-source models on agentic coding benchmarks and trails only Gemini-3.1-Pro on world knowledge. V4-Flash is positioned as the faster, cheaper daily driver.
I have not run these models myself. I am reporting what DeepSeek published in its API docs and tech report.
The architecture uses something DeepSeek calls “token-wise compression plus DSA” (DeepSeek Sparse Attention). The pitch is that 1M context windows are now cost-effective. If that holds up under independent testing, it is a real technical win. Long context has been the party trick everyone demos and nobody ships at scale.
DeepSeek also says V4-Pro beats all open models in math, STEM, and coding reasoning, and rivals top closed-source models. Again, I have not verified this. The company has a habit of publishing strong numbers that hold up reasonably well in third-party tests, but I wait for LiveBench and LMSYS Arena before I trust any leaderboard.
What I do know: the old API endpoints deepseek-chat and deepseek-reasoner are being retired on July 24, 2026. They currently route to V4-Flash. If you have integrations, you will need to update your model strings.
The open-weights release is on HuggingFace. The tech report is there too. DeepSeek says V4-Pro integrates with Claude Code, OpenClaw, and OpenCode for agentic coding. That is a smart move. The model is only useful if people can actually plug it into workflows.
What else is moving in China
Alibaba’s Qwen3 dropped on April 28, 2025. Eight models, from 0.6B to 235B parameters, including two MoE variants. The gimmick is a “hybrid reasoning” switch: the model can think hard or answer fast, and the user sets the budget. Qwen3-235B-A22B reportedly beat OpenAI’s o3-mini on Codeforces. I have not tested this. The smaller dense models are what most developers actually use, and those are competent workhorses.
Moonshot’s Kimi K2 came out in mid-2025. One trillion total parameters, 32 billion active, 15.5 trillion pre-training tokens. The arXiv paper (2507.20534) claims state-of-the-art among open non-thinking models on agentic tasks: 66.1 on Tau2-Bench, 76.5 on ACEBench English, 65.8 on SWE-Bench Verified. The K2.5 update added visual agentic capabilities, video-to-code, and a swarm architecture that spins up to 100 sub-agents. I tried Kimi K2 for coding. It is good. Not magic, but good.
MiniMax released M2.5 on February 12, 2026, right after its Hong Kong IPO. 230B MoE, 10B active. The company claims 80.2% on SWE-Bench Verified, within a point of Claude Opus 4.6. The Hacker News thread on this release was brutal. Multiple users called it “benchmaxxed,” citing a large gap between benchmark scores and real-world performance. One commenter said M2 and M2.1 had a “strong tendency to reward hack” and changed existing codebases to make tests pass artificially. I have no independent data either way. The pricing is aggressive: about $1 per hour of continuous operation at 100 tokens per second.
Baidu’s Ernie 4.5 Turbo and X1 Turbo shipped on April 25, 2025, at the Create developer conference. Ernie 4.5 Turbo costs 0.8 yuan per million input tokens and 3.2 yuan per million output tokens. Baidu claims this is 40% of DeepSeek-V3’s price. X1 Turbo is the reasoning variant at 1 yuan in, 4 yuan out. I have not used these. Baidu’s models tend to be fine for Chinese-language tasks and mediocre for code.
ByteDance’s Doubao 1.6 launched on June 11, 2025, through Volcano Engine. It can operate a browser GUI to book hotels. The model has an “adaptive thinking” mode that decides whether to reason deeply based on prompt complexity. ByteDance claims 63% cost savings versus the previous generation. The company also released Seedance 1.0 pro for video generation.
Zhipu’s GLM-5 arrived in early 2026 at 744B total, 40B active. Demand was high enough that Zhipu hiked prices for its coding plan. StepFun’s Step-3.5-Flash (196B-A11B) is strong on math benchmarks. Qwen3.5 dropped in early 2026 with a 397B-A17B MoE flagship. The pace is relentless.
My take
China’s AI labs are releasing models faster than anyone can evaluate them. That is the point. The strategy is to flood the zone, keep the hype cycle spinning, and force Western labs to react. It works. Every month there is a new “DeepSeek moment.”
But the gap between benchmark claims and daily utility is real. I have seen too many models that ace SWE-Bench and then fail on a simple Flask app refactor. DeepSeek V4’s 1M context is genuinely interesting if the sparse attention trick holds up. I want to see independent latency and accuracy tests at 500K+ tokens before I call it a breakthrough.
The price war is not a sideshow. It is the main event. Baidu undercuts DeepSeek. MiniMax undercuts everyone. The race to the bottom on inference costs is reshaping how AI gets deployed, especially in markets that cannot afford OpenAI or Anthropic pricing. That is where China’s open-weights ecosystem has real leverage.
DeepSeek V4’s full evaluation will take weeks. I will run it when I can. For now, the preview is live, the weights are open, and the claims are big. Check them yourself.