Nvidia Nemotron 3 Ultra Launch

Nvidia has launched the Nemotron 3 Ultra, its most powerful open AI model, featuring 550 billion parameters and impressive speed, yet still trails behind China's Kimi K2.6.

Summary

NVIDIA introduced Nemotron 3 Ultra at Computex on June 1, featuring a 550-billion-parameter open-weight model.
This model achieves over 300 tokens per second on a pre-release DeepInfra endpoint, outperforming Chinese competitors by three to six times.
However, the Kimi K2.6 model from Moonshot AI still holds the top position in the open-weight intelligence rankings.

During his keynote at Computex in Taipei, Jensen Huang revealed Nemotron 3 Ultra, which is Nvidia's largest and most advanced open AI model to date. While it is impressive, it still falls short of matching China's capabilities.

Nemotron 3 Ultra is designed with approximately 550 billion parameters, but only utilizes 55 billion active parameters at any moment, employing a mixture-of-experts architecture. The number of parameters is crucial for an AI model's knowledge scope, with more generally indicating greater power.

To illustrate the mixture-of-experts approach, consider it as a hospital with numerous specialists: only the relevant doctors attend to a patient, which helps to minimize operational costs compared to the total parameter count. This efficiency allows Nvidia to report five times faster inference and costs that are 30% lower than similar open-weight models.

Artificial Analysis, an independent evaluator that collaborated with Nvidia for the pre-release evaluation, ranked Nemotron 3 Ultra at 48 on its Intelligence Index—a composite score that aggregates ten evaluations related to reasoning, coding, general knowledge, and agentic performance, where higher numbers indicate greater intelligence.

This positions it as the leading U.S. open-weight model by a significant margin. The nearest American models are Google’s Gemma 4 31B at 39, Nemotron 3 Super at 36, and OpenAI's gpt-oss-120b at 33.

NVIDIA has unveiled Nemotron 3 Ultra during Jensen Huang's Computex keynote: with 550 billion parameters (55 billion active), this is the largest model in the Nemotron series and the most intelligent U.S. open-weight model.

We partnered with @nvidia to evaluate this model for… pic.twitter.com/WPXZGLBOn8

— Artificial Analysis (@ArtificialAnlys) June 1, 2026

The improvement over its predecessor is notable. The Nemotron 3 Super, launched in March 2026 with 120 billion parameters, was already seen as a strong model for autonomous agents. The Ultra version has elevated its index score by 12 points, which is significant in this benchmarking context.

Understanding the Nemotron Series

Nvidia has been developing AI models for longer than many realize. The first Nemotron model was released in November 2023, with the third generation announced in December 2025.

The series consists of three versions: Nano for simple tasks, Super for mid-range enterprise applications, and Ultra for complex reasoning tasks. All three leverage a hybrid architecture that integrates Mamba-2 layers, standard Transformer attention, and mixture-of-experts routing.

Mamba-2 is a more efficient alternative to traditional attention mechanisms, capable of processing lengthy sequences at a lower cost—an advantage for models that need to manage a million tokens in memory simultaneously. The Ultra model can handle a context window of one million tokens, theoretically allowing it to view an entire large codebase or numerous research documents at once.

Additionally, the Ultra model incorporates a technique called multi-token prediction (MTP), enabling it to predict several future tokens simultaneously, which accelerates the generation process. All three Nemotron 3 models were post-trained using reinforcement learning across various interactive environments, equipping them to plan and execute multi-step tasks rather than merely responding to questions.

The weights for the Ultra model are publicly available, and its training methodologies will be released. However, running it typically requires a supercomputer, as a 550-billion-parameter model is designed for data center environments. Nevertheless, users can access it via Nvidia's API or through cloud providers, similar to how they currently utilize GPT or Claude through web interfaces.

Speed vs. Intelligence

The speed of the Nemotron 3 Ultra is one of its standout features. On a pre-release DeepInfra endpoint, it produced over 300 output tokens per second. In contrast, Chinese models in the same intelligence category, such as DeepSeek V4 Pro and Kimi K2.6, currently operate at 50–100 tokens per second through their commercial APIs. This difference in speed is critical for practical applications, especially for autonomous agents carrying out lengthy multi-step tasks, where delays can accumulate quickly.

However, speed alone does not determine the intelligence ranking. According to the chart released by Artificial Analysis, Nemotron 3 Ultra has an intelligence score of 48, while China's Kimi K2.6 from Moonshot AI scores 54. This six-point difference on the index reflects a significant distinction: released in April 2026, Kimi K2.6 ranks fourth globally among all AI models, both open and closed, trailing only three points behind the proprietary models from Anthropic, Google, and OpenAI, all tied at 57.

The situation for U.S. open-weight models has been ongoing. Chinese laboratories have significantly increased their presence in the open ecosystem, while American firms—OpenAI, Anthropic, Google—tend to keep their top systems behind APIs. As Decrypt reported in March, the share of Chinese open-source models surged from approximately 1.2% of global open-model usage in late 2024 to around 30% by the end of 2025. Nvidia is the most prominent American company actively working to change this trend, with a publicly available five-year initiative planning to invest $26 billion in open-weight AI development.

Nemotron 3 Ultra is the most visible outcome of this strategy to date. Nvidia has also announced its ongoing work on Nemotron 4—the next iteration—developed through the Nemotron Coalition, which consists of eight AI labs, including Mistral AI and Perplexity, that Nvidia formed in March 2026 to collaboratively develop cutting-edge open models on DGX Cloud infrastructure. The Nemotron 3 Ultra is set to be available starting June 4.

Daily Debrief Newsletter

Start your day with the latest news stories, along with original features, podcasts, videos, and more.

Nvidia Launches Nemotron 3 Ultra, Its Most Advanced AI Model Yet

Summary

Understanding the Nemotron Series

Speed vs. Intelligence

Daily Debrief Newsletter