Anthropic has released Claude Opus 4.8 and separately introduced dynamic workflows for Claude Code.
Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors.
Available today at the same price. pic.twitter.com/EufxL7T1kb
— Claude (@claudeai) May 28, 2026
This tool enables AI to autonomously write orchestration scripts that launch dozens or hundreds of parallel sub-agents and verify their work before delivering results to the user.
It is designed for complex tasks in large codebases, including security audits, bug detection, framework and programming language migration, and project modernization.
The feature is available in a beta testing mode within the command-line interface of Claude Code, the desktop version, and the VS Code extension, as well as through API, Amazon Bedrock, Vertex AI, and Microsoft Foundry.
The mode can be initiated with a direct command to create a workflow or through ultracode. It maximizes computational efforts and allows the model to autonomously decide when to employ a multi-step approach.
Anthropic has warned that dynamic workflows consume significantly more tokens than a standard Claude Code session.
The model breaks down tasks into subtasks, distributes them among parallel agents, and then consolidates the outputs after mutual verification and attempts to refute the proposed solutions.
As an example, Anthropic cited the migration of Bun from the Zig programming language to Rust. Developer Jared Samner utilized dynamic workflows to generate approximately 750,000 lines of Rust code. The port achieved a 99.8% pass rate on the existing test suite, with the process from the first commit to merging taking 11 days. Anthropic noted that this version is not yet in production.
Performance metrics for the new Opus 4.8 model include:
- 69.2% in SWE-Bench Pro;
- 49.8% in Humanity’s Last Exam without tools and 57.9% with them;
- 83.4% in OSWorld-Verified;
- 1890 points in GDPval-AA;
- 53.9% in Finance Agent v2.
In Terminal-Bench 2.1, Opus 4.8 fell short of GPT-5.5, scoring 74.6% compared to 78.2%.
Anthropic stated that Opus 4.8 has become noticeably "more honest" in performing agent tasks: the model more frequently indicates uncertainty, less often claims unverified progress, and better identifies issues in its own code before delivering results to the user.
As a reminder, in May, Anthropic published its first report on Project Glasswing—a vulnerability detection program using the Claude Mythos model.
