The Chinese AI startup DeepSeek has unveiled a preview of its new line of language models. The flagship model, V4-Pro, surpasses Claude Opus 4.6 and GPT-5.4, becoming the top open-source system.
π DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length.
β DeepSeek (@deepseek_ai) April 24, 2026
πΉ DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models.
πΉ DeepSeek-V4-Flash: 284B total / 13B active params.β¦ pic.twitter.com/n1AgwMIymu
Architecture and Scale
V4-Pro boasts approximately 1.6 trillion parameters, but only 49 billion are activated at any given time. The second version, V4-Flash, has a total of 284 billion parameters, with 13 billion activated.
Both models utilize a "Mixture of Experts" (MoE) architecture: only the relevant subnetworks are activated for each token processed. This approach is more cost-effective than fully closed architectures while maintaining comparable performance.
The pre-training was conducted on a dataset of over 32 trillion tokens. Developers then fine-tuned the models in stages, dedicating specific blocks for coding, mathematics, logic, and instruction following. The final version integrates these skills through distillation.
Long Context Made Affordable
A key feature of V4 is the optimization for processing long sequences. While other models also offer a context window of 1 million tokens, using it typically incurs high costs and delays.
DeepSeek claims that the new version significantly reduces the resource intensity of such operations. Compared to V3.2, V4-Pro requires about 27% of the computations and 10% of the KV-cache memory when operating at maximum context. For V4-Flash, these figures are 10% and 7%, respectively.
Source: Hugging Face.The team achieved these results through a hybrid attention architecture: two mechanisms compress data and reduce the load when dealing with long texts. They also employed special hyperconnections for stability and the Muon optimizer to accelerate training.
Reasoning Modes and Agent Capabilities
DeepSeek V4 supports three reasoning modes:
- Non-think β quick responses to simple questions without further investigation.
- Think High β in-depth analysis for complex tasks and planning.
- Think Max β maximum mode: the model outlines each step and explores all options.
In agent tasks, the Max mode now retains the chain of intermediate steps within a single task. In the previous version, some of this context was lost during user interactions.
Testing Results
According to DeepSeek, the flagship version demonstrates results comparable to leading systems in several areas:
- In programming tasks on Codeforces, the model achieved a rating of 3206 β 23rd among live programmers worldwide, matching GPT-5.4;
- In mathematics, it scored 95.2 on HMMT 2026 and 89.8 on IMOAnswerBench, outperforming most competitors;
- In knowledge tests, it scored 57.9 on SimpleQA Verified (Opus 4.6 scored 46.2, while Gemini 3.1 Pro scored 75.6).
- In reasoning tasks, the models lag behind GPT-5.4 and Gemini 3.1 Pro by only three to six months;
- In an internal DeepSeek test involving development, debugging, and refactoring tasks, the model achieved 67% β between Sonnet 4.5 (47%) and Opus 4.5 (70%);
- In agent scenarios and development tasks, V4-Pro-Max demonstrated 80.6% on SWE Verified and 67.9% on Terminal Bench.
V4 was specifically trained on real-world scenarios: data analysis, reporting, document editing, and iterative internet searches using tools.
To assess the model's suitability for development, the startup conducted internal testing on its engineers' tasks. In a survey of 85 developers and researchers, 52% stated they were ready to use V4-Pro as their primary coding model, while another 39% indicated they were leaning towards this decision.
Itβs worth noting that on April 23, OpenAI released GPT-5.5, which is positioned as "a new level of intelligence for real work and agent management."
