DeepSeek makes its V4-Pro price cut permanent, ByteDance plans $70B AI spend

China’s AI labs are not slowing down. In the past week alone, DeepSeek locked in a permanent 75% API price cut, ByteDance floated a $70 billion annual AI infrastructure budget, and at least three domestic teams dropped new models or frameworks that claim to punch above their weight class. Here is what actually happened.

DeepSeek V4-Pro: the discount is forever

On May 22, DeepSeek announced that the promotional 2.5x discount on V4-Pro API calls will become permanent after May 31. The new list prices are one-quarter of the original:

Input (cache hit): 0.1 yuan per million tokens
Input (cache miss): 12 yuan per million tokens
Output: 24 yuan per million tokens

That is roughly $0.014 / $1.67 / $3.34 per million tokens at current exchange rates. For context, GPT-4o-level pricing from OpenAI is still an order of magnitude higher on output. DeepSeek is not running a charity; it is running a volume play. The move pressures every other domestic lab to match or explain why they cannot.

Separately, the open-source community is squeezing costs even further. A project called Reasonix, built specifically for DeepSeek’s prefix-cache mechanism, claims 99.82% cache-hit rates in long coding sessions. One user reported a bill dropping from $61 to $12 on a 400-million-token run. I have not independently verified those numbers, but the repo is gaining stars fast.

ByteDance’s $70 billion shopping list

Bloomberg reported on May 27 that ByteDance is discussing up to $70 billion in spending this year on data centers and AI infrastructure. That would be roughly triple last year’s capital expenditure. The company is already China’s largest buyer of AI chips and servers, and it recently signed a deal for millions of Qualcomm chips to power its AI agent services.

My gut says the final number will be lower. ByteDance adjusts capex quarterly, and the report itself calls the figures preliminary. Still, the directional signal is clear: ByteDance thinks the AI race is a winner-take-most market, and it is willing to outspend almost anyone except maybe Microsoft and Google.

New models and frameworks

MiniMax M3. MiniMax teased its M3 series on May 27. The announcement came alongside an arXiv paper on the M2.x family: 229.9B total parameters, 9.8B active per token, 192K context window, 29.2T training tokens. The paper also describes a reinforcement-learning system called Forge and a “self-evolution” prototype that can debug its own training failures and modify scaffolding code. MiniMax claims the system now handles 30-50% of routine engineering iterations internally. That sounds like marketing fluff until you remember that MiniMax’s Doubao app is one of the most-used AI chatbots in China.

Kuaishou Keye-VL 2.0. On May 26, Kuaishou released Keye-VL-2.0-30B-A3B, a multimodal model that adopts DeepSeek’s Sparse Attention (DSA) architecture. It handles 256K context lengths and scored at or near the top on the TimeLens video-understanding benchmark. Kuaishou is primarily a short-video company; the fact that it is building its own vision-language model with agent capabilities tells you how vertically integrated Chinese tech firms have become.

ModelBest ForgeTrain. Beijing startup ModelBest (the team behind MiniCPM) unveiled ForgeTrain, which it calls the first production-grade pre-training framework written entirely by AI. The framework was generated by a model, then used to train a new 1B-parameter model called MiniCPM5-1B. ModelBest says ForgeTrain outperforms NVIDIA’s Megatron on training speed, including a 10% speed-up on Huawei Ascend chips. The 1B model is small enough to run as a desktop pet. Cute demo. The real claim is about the engineering pipeline: if AI can write its own training frameworks, the cost of spinning up new models drops toward zero.

Kunlun SkyClaw. Kunlun Tech released SkyClaw-v1.0 and a lite version on May 22, positioning them as native agent models rather than general LLMs with tool wrappers bolted on. Pricing is aggressive: 0.5 yuan per million input tokens for the full model, 0.3 yuan for lite. Both are free for now. Kunlun says SkyClaw-v1.0 can trade blows with Claude Opus 4.6 on OpenClaw tasks. I have not tested it. The “native agent” pitch is the same one OpenAI and Anthropic are making, so at minimum Kunlun is keeping pace with the narrative.

Tencent Hunyuan Hy-MT2. Tencent open-sourced Hy-MT2, a translation model family in 1.8B, 7B, and 30B-A3B sizes. The 7B and 30B variants beat larger open-source rivals on Flores-200 and approach Gemini 3.1 Pro. The 1.8B model, quantized to 440MB, runs on-phone. Tencent also launched a WeChat mini-program called Tencent Hy Translate. Translation is not the flashiest AI benchmark, but it is a real product with real users. Tencent is shipping.

The gap is 2.7%? Sure.

At a finance summit on May 27, Huawei CTO Zheng Jun cited a Stanford report claiming China’s AI models trail the US by only 2.7% overall. He also said Chinese model call volumes have “crushed” American ones since February. The 2.7% figure is almost certainly an aggregate across many benchmarks, some of which matter and some of which do not. The call-volume claim is more interesting: it suggests Chinese users are actually using these models at scale, not just benchmarking them. Usage is the only metric that ultimately counts.

One data point supports him. The National Data Bureau says China’s AI training and inference data totaled 199.48 exabytes in 2025, up 42.86% year over year. Inference data exceeded training data for the first time, at 101.34 exabytes. That means Chinese AI workloads are shifting from lab experiments to live production. About time.