Anthropic Warns of AI Self-Improvement Risks

Anthropic highlights the risks of AI self-improvement as it increasingly relies on AI for model development, raising concerns about future autonomy and safety.

Members of the Anthropic team are increasingly delegating much of the development of new models to AI systems. The company sees this as a sign of approaching recursive self-improvement.

According to internal data, over 80% of the code for the company's current products was written by Claude. Additionally, in the second quarter, the amount of code produced per engineer increased eightfold compared to 2024.

Source: Anthropic Institute.

Marina Favaro, head of the Anthropic Institute, and co-founder Jack Clark stated that with sufficient computational power, this trend could lead to a system capable of "fully autonomously designing and developing its successor."

“We have not yet reached a point of no return, and recursive self-improvement is not inevitable. However, it may occur sooner than most institutions are prepared for,” the experts emphasized.

Benchmarks and Metrics

In April, Claude completed over 800 fixes—an engineer estimated that a human would have taken four years to accomplish the same.

For open tasks, the success rate of Claude's sessions rose to 76% in May 2026—a 50 percentage point increase over six months.

Source: Anthropic Institute.

Anthropic reported that the duration of tasks that AI can reliably perform independently doubles approximately every four months (down from seven).

In a task aimed at accelerating the training of a small AI model, Claude Opus 4 in May 2025 achieved an average speed increase of about three times, while Mythos Preview in April 2026 achieved approximately 52 times.

Source: Anthropic Institute.

During internal tests, the Mythos Preview model demonstrated the ability to tackle research tasks in AI safety. Over 800 hours of work, a group of agents closed 97% of the problem gap in the experiment, while two human researchers managed only 23% in a week.

New Bottlenecks

Despite successes in coding, humans still hold an advantage in "research judgment" and setting strategic goals.

Anthropic believes that in the near future, the role of developers will shift from writing lines of code to conducting in-depth reviews of neural network outputs. Human oversight may become the main bottleneck in the speed of developing new models.

The company also suggested that it might be beneficial for the world to have the ability to slow down or temporarily halt the development of advanced AI systems, allowing societal institutions and alignment research to keep pace with progress.

At the same time, startup representatives warned that unilateral slowing could backfire on those who hesitate—less cautious players could close the gap. Without a global coordination mechanism, safety decisions will have to be made under competitive and geopolitical pressure.

As a reminder, in May, Anthropic published its first report on Project Glasswing—a program for identifying vulnerabilities using the Claude Mythos model.

In the same month, the company released Claude Opus 4.8 and separately introduced a dynamic workflow feature for Claude Code.