OpenAI Launches GPT-5.4 with Computer Vision

OpenAI has launched GPT-5.4 and GPT-5.4 Pro, featuring enhanced reasoning, coding, and computer vision capabilities, improving performance across various tasks.

OpenAI has released GPT-5.4 and GPT-5.4 Pro just two days after the launch of version 5.3 Instant.

GPT-5.4 Thinking and GPT-5.4 Pro are now rolling out in ChatGPT.

GPT-5.4 is also available via the API and Codex.

This version integrates our advancements in reasoning, coding, and agentic workflows into a single model. pic.twitter.com/1hy6xXLAmJ
— OpenAI (@OpenAI) March 5, 2026

The standard version of GPT-5.4 is accessible through the ChatGPT web interface, API, and Codex tool. The GPT-5.4 Thinking version is available to Plus, Team, and Pro subscribers.

For Pro users and Enterprise clients, GPT-5.4 Pro is available, also through the API.

The base usage cost is $2.50 per million input tokens and $15 per million output tokens. The Pro version has significantly higher rates: $30 and $180 per million tokens, respectively.

Performance in Work Tasks

GPT-5.4 delivers more stable and higher-quality results in real-world applications. In the GDPval benchmark, which assesses task performance across 44 professions, this version achieved a score of 83%. This indicates that the model performs at or above the level of specialized professionals. In comparison, GPT-5.2 scored 70.9%.

Source: OpenAI.

Developers focused particularly on working with spreadsheets, presentations, and documents. In tasks typical for a junior investment bank analyst, GPT-5.4 scored 87.3%, compared to 68.4% for GPT-5.2.

Evaluators preferred presentations generated by the new model 68% of the time due to better aesthetics, variety, and effective use of image generation.

Source: OpenAI.

GPT-5.4 has also become the most accurate OpenAI model in terms of factual accuracy. In tests involving prompts with known errors:

individual statements were false 33% less often;
complete answers contained errors 18% less frequently compared to GPT-5.2.

Computer Vision

This version is the first to include built-in computer vision and PC control capabilities. The model can use a mouse and keyboard based on screenshots and can write code for automation using Playwright.

Its behavior can be tailored to specific scenarios, taking into account an acceptable level of risk.

In the OSWorld-Verified benchmark (desktop management), GPT-5.4 successfully completed 75% of tasks, surpassing the previous version (47.3%) and outperforming humans (72.4%). This progress is attributed to improved visual perception:

in the MMMU-Pro test (understanding and logic), the score was 81.2%, compared to 79.5% for GPT-5.2;
in OmniDocBench (document analysis), the average error rate decreased from 0.140 to 0.109.

Programming

In coding, the model has matched the specialized GPT-5.3-Codex but operates faster.

Codex now features a /fast mode, which speeds up generation by 1.5 times without sacrificing quality. Internal tests showed that GPT-5.4 performed well in complex frontend development tasks.

An experimental Playwright skill (Interactive) has also been introduced, allowing the model to visually debug web and Electron applications while testing its own code during development.

Tools

GPT-5.4 introduces a Tool Search feature. Previously, the system had to preload descriptions of all available plugins into the context, adding thousands of unnecessary tokens to each request and increasing costs.

Now, the model receives only a basic list and can autonomously find and load the necessary parameters as needed. In tests based on MCP Atlas, this approach reduced token consumption by 47% without loss of accuracy.

Web search has also become more efficient: in the BrowseComp benchmark, performance improved by 17%, with the Pro version achieving a record 89.3%. GPT-5.4 Thinking gathers information from multiple sources more effectively, handles complex queries better, and provides more structured responses.

Manageability and Context

When dealing with complex queries, GPT-5.4 Thinking in ChatGPT first presents a plan of action to the user. This allows for real-time adjustments without restarting the generation or making unnecessary clarifications. This feature is already available on the website and in the Android app, with iOS support coming soon.

The model also retains context better in long dialogues and takes longer to consider complex tasks. This helps maintain coherence and relevance in responses, even when processing large amounts of information.

As a reminder, earlier in March, users boycotted ChatGPT amid OpenAI's deal with the Pentagon.

OpenAI Launches GPT-5.4 with Computer Vision Capabilities

Performance in Work Tasks

Computer Vision

Programming

Tools

Manageability and Context