Chinese AI labs have spent the last six months in a release cycle that looks less like research and more like a product sprint. DeepSeek, Alibaba, Moonshot AI, Zhipu AI, ByteDance, and StepFun have all dropped major models since April 2025. Here is what actually shipped, what the numbers say, and what I do not buy.
DeepSeek V4: the one-million context model that finally arrived
DeepSeek released V4 in preview on April 26, 2026. It comes in two sizes: V4-Pro and V4-Flash. Both support 1 million tokens of context. That is not a typo. One million. The previous V3.1 topped out at 128K. DeepSeek says it achieved this with a new sparse attention mechanism it calls DSA, plus token-level compression. The company claims V4-Pro beats every other open-source model on math, STEM, and coding benchmarks, and sits close to closed frontier models on world-knowledge tests.
I have not run my own evals on V4, so I will repeat DeepSeek’s claims with the appropriate salt. What I can verify: the API is live, the weights are on Hugging Face, and the 1M context is real. Users on IT之家 reported seeing it in gray-scale testing as early as February 11, 2026. DeepSeek also says its internal staff now use V4-Pro for agentic coding and prefer it to Sonnet 4.5. That is a bold claim. I will believe it when I see independent SWE-bench scores.
Pricing: V4-Pro input is 2 yuan per million tokens, output is 8 yuan. Flash is cheaper. Old API names deepseek-chat and deepseek-reasoner will sunset on July 24, 2026.
Alibaba’s Qwen3 and Qwen3.5: the omnivore approach
Alibaba dropped Qwen3 on April 29, 2025. It was a big release: eight models ranging from 0.6B to 235B parameters, both dense and MoE. The flagship Qwen3-235B-A22B uses only 22B active parameters and still beats DeepSeek-R1 (671B) on code and math benchmarks, according to Alibaba’s own tests. The trick is a “thinking / non-thinking” mode switch inside the same model. Users can dial reasoning depth up or down. That saves tokens on simple queries and burns them on hard ones.
Then on March 30, 2026, Alibaba shipped Qwen3.5-Omni, a native multimodal model that handles text, image, audio, and video in one architecture. It supports up to 10 hours of audio input and 256K context. Speech recognition covers 113 languages and 39 dialects. Video is capped at 400 seconds, so this is still mainly a speech-and-text play. The model uses a Hybrid-Attention MoE architecture for both its Thinker and Talker modules.
My take: Qwen3.5-Omni is the most interesting thing Alibaba has done since Qwen2.5. Ten-hour audio understanding is not a benchmark stunt. It is a product feature. Conference transcription, legal depositions, podcast analysis — those are real workflows. Whether the quality holds across a full 10 hours is another question. I have not tested it end-to-end.
Moonshot AI’s Kimi K2: one trillion parameters, open-sourced
On July 11, 2025, Moonshot AI released Kimi K2. Total parameters: 1 trillion. Active parameters: 32 billion. It is an MoE model built on a modified DeepSeek V3 architecture. Moonshot claims K2 achieves lower loss than V3 at comparable training and inference cost. The company open-sourced both base and instruct versions under a modified MIT license.
What makes K2 notable is not just the parameter count. Moonshot built an automated agent data pipeline to train it. The idea: let the model synthesize its own tool-use trajectories, then filter the good ones. A Moonshot engineer described it as a “fully automated agent data production factory.” That sounds like marketing, but the GitHub repo is public and the community has already produced MLX ports and 4-bit quantizations within 24 hours of release.
On February 24, 2026, 36氪 reported that Kimi K2.5 generated more revenue in 20 days than Moonshot’s entire 2025 annual total. The company is now valued at $10–12 billion. Founder Yang Zhilin says they have stopped paid user acquisition and are betting on model quality as the only growth channel. That is either confidence or desperation. I am not sure which yet.
Zhipu GLM-4.5: the efficiency play
Zhipu AI released GLM-4.5 on July 28, 2025. It is a 355B-parameter MoE model with 32B active parameters. A lighter version, GLM-4.5-Air, runs at 106B total / 12B active. Zhipu claims GLM-4.5 ranks third globally and first among Chinese open-source models on a 12-benchmark average. More interestingly, it uses half the parameters of DeepSeek-R1 and one-third of Kimi K2, yet scores higher on several coding benchmarks.
API pricing is aggressive: 0.8 yuan per million input tokens, 2 yuan per million output tokens. The high-speed version claims over 100 tokens per second. Zhipu is positioning GLM-4.5 as a coding agent base model, and the SWE-bench Verified scores look competitive. I would like to see independent reproduction before calling it a Claude-4-Sonnet replacement, but the direction is clear.
ByteDance Doubao 1.6: the price war continues
ByteDance released Doubao 1.6 on June 11, 2025. The series has three variants: seed-1.6 (general), seed-1.6-thinking (deep reasoning), and seed-1.6-flash (real-time). All support 256K context and adaptive thinking. The headline is pricing: in the 0–32K input range, Doubao 1.6 costs 0.8 yuan per million input tokens and 8 yuan per million output tokens. That is one-third the cost of Doubao 1.5’s deep-thinking model or DeepSeek-R1.
ByteDance also released Seedance 1.0 Pro, a video generation model that costs 3.67 yuan for a 5-second 1080P clip. ByteDance claims this is the cheapest video generation API on the market.
My gut says the Doubao pricing is sustainable because ByteDance owns the inference infrastructure. For everyone else, matching these prices means burning cash. Doubao now handles 16.4 trillion tokens per day as of May 2025, giving ByteDance a data flywheel that smaller labs cannot replicate.
StepFun Step 3: the chip-friendly model
StepFun released Step 3 on July 25, 2025, at WAIC in Shanghai. It is a 321B-parameter MoE model with 38B active parameters. The pitch is not raw benchmark scores — though it did hit SOTA on MMMU, MathVision, and LiveCodeBench. The pitch is inference efficiency. StepFun claims Step 3 runs at 3x the speed of DeepSeek-R1 on domestic Chinese chips like Huawei Ascend, and 70% faster than R1 on NVIDIA Hopper, all without cutting active parameters or attention capacity.
StepFun formed a “model-chip ecosystem alliance” with nine domestic chip vendors including Huawei, Muxi, Biren, and Cambricon. Huawei Ascend already runs Step 3 in production. That is a political statement as much as a technical one. In a world where US export controls are tightening, being able to train and serve on domestic silicon is a survival feature.
What this all means
Six months ago, the story was DeepSeek-R1 shocking everyone with cheap reasoning. Now every major Chinese lab has a reasoning model, an agent framework, a multimodal variant, and a pricing sheet that undercuts OpenAI by an order of magnitude. The differentiation is no longer “can we build a smart model?” It is “can we build a smart model that runs cheap on our own hardware?”
Alibaba is betting on omnimodal long-context understanding. Moonshot is betting on open-source trillion-parameter models. Zhipu is betting on parameter efficiency. ByteDance is betting on vertical integration and price. StepFun is betting on chip independence. DeepSeek is betting on context length and internal adoption.
They cannot all be right. My guess: two of these labs will pull ahead by early 2027, and the rest will pivot to enterprise verticals or get absorbed. The hardware constraint is real. The cash burn is real. The talent war is real. What is also real is that Chinese open-source models now define the price floor for the entire global market. If you are building an AI product and not benchmarking against Qwen, DeepSeek, or Kimi, you are overpaying.