May 2026 China AI Roundup: DeepSeek on Huawei Chips, Ring-2.6-1T, and Agent Wars

Chinese AI labs shipped more models in April than most countries ship in a year. Between mid-April and early May, at least nine frontier models landed from Chinese developers. Some are good. Some are interesting for reasons that have nothing to do with benchmark scores. Here is what happened.

**DeepSeek V4 goes native on Huawei silicon**

DeepSeek V4 (April 24) is the first top-tier Chinese model that runs entirely on Huawei Ascend 950PR chips. DeepSeek migrated the full stack from Nvidia CUDA to Huawei’s CANN framework. The Ascend 950PR supports FP4 precision at 1.56 PFLOPS, has 112GB HBM, and claims 2.87x the single-card performance of an Nvidia H20.

This is not a minor engineering detail. It means DeepSeek V4 — which scores 93.8 on the May 2026 composite benchmark, second globally behind Kimi K2.6 — now runs on domestic hardware. Huawei has already shipped the Atlas 350 accelerator card. The model supports 1M context windows. Baidu Qianfan, Cambricon, and Moore Threads all had V4 running within 48 hours of release.

DeepSeek also released V4-Flash, a cheaper distilled variant priced at 0.14 yuan per million input tokens. That is roughly $0.02. The API cost gap with GPT-5.5 is now about 30:1.

**Ant Group drops a trillion-parameter reasoning model**

On May 9, Ant Group’s Bailing team released Ring-2.6-1T. It has 1 trillion total parameters with 63B active. The trick is a “dynamic reasoning intensity” mechanism that lets you dial up or down how much thinking the model does before responding. On PinchBench it scored 87.60, above GPT-4o and Claude Sonnet 4.6. It also scored 96.4 on MATH-500.

Ant made it available on OpenRouter with a free trial week. The model targets agent collaboration, engineering, and research tasks. Ant claims hallucination rates dropped from 3% to 0.6% on their insurance use cases.

**Xiaomi open-sources MiMo-V2.5**

Xiaomi dropped MiMo-V2.5-Pro (309B total, 15B active) under MIT license. It supports 1M context, uses MoE, and costs 2.5% of what international closed-source flagships charge for inference. On open-source leaderboards it tied for first on composite intelligence index. On OpenRouter, MiMo-V2-Pro captured 30%+ market share in a week, hitting 4.82 trillion tokens in weekly calls.

The model runs on almost every domestic inference chip. Xiaomi says the phone-side 100B-parameter variant can run locally. We will see how that works in practice.

**The composite picture**

China’s daily token call volume hit 140 trillion in March, up 40% from late 2025. OpenRouter data showed Chinese models holding 61% of global weekly call volume, leading the US for five consecutive weeks. The cost gap is the main driver: Chinese models charge 5-30x less than comparable US models.

Everything now revolves around agents. Every April release — Qwen3.6-Plus, DeepSeek V4, Kimi K2.6 — emphasized agent and tool-use capabilities. The question is whether any of them work reliably in production beyond demos. DeepSeek released an “expert mode” on April 8 with domain-specific distillation and multi-step reasoning visualization. Ant built Ring-2.6-1T specifically for agent workflows.

I am skeptical that agent reliability is solved. But the Chinese approach — cheaper inference, domestic chips, relentless iteration — has turned the cost equation upside down. If the agent era actually arrives, Chinese models are positioned to supply the compute at a fraction of US prices. Whether that means “winning” depends on whether reliability catches up.