Summary

  • Researchers at Microsoft identified a vulnerability in Anthropic's Claude Code GitHub Action that could be exploited via prompt injection attacks.
  • This attack utilized harmful instructions embedded in GitHub issues, pull requests, or comments that the AI agent was tasked to evaluate.
  • Anthropic addressed the security flaw in May following Microsoft's report through HackerOne.

Microsoft researchers have revealed a recently patched vulnerability within Anthropic's Claude Code GitHub Action that might have enabled attackers to retrieve sensitive credentials from software development pipelines by manipulating the AI agent using harmful content on GitHub.

In a blog entry published on Friday, Microsoft cautioned that AI coding agents operating within CI/CD workflows could introduce novel security vulnerabilities, as these environments typically have access to sensitive data like API keys and cloud credentials.

“Our investigation commenced after we noticed prompt injection attempts in public repositories utilizing AI-assisted GitHub workflows across various vendors, where content controlled by attackers in issues or [pull requests] is processed by the AI agent and could impact its tool utilization,” Microsoft explained.

On GitHub, a pull request enables developers to suggest modifications to a code repository, which are then reviewed before approval and integration.

This report highlights prompt injection attacks as one of the most significant security challenges confronting AI agents today. In a prompt injection attack, an adversary conceals instructions within emails, documents, websites, or code comments, tricking an AI system into executing those instructions rather than the intended user's commands.

Introduced in October, Claude Code is Anthropic's AI coding agent designed for software development tasks. The tool faced scrutiny in March when Anthropic inadvertently leaked over 500,000 lines of its source code, revealing insights about its internal structure and leading to extensive analysis by researchers and developers.

Microsoft indicated that attackers could leverage prompt injection attacks hidden within GitHub issues, pull requests, or comments to manipulate Claude Code into accessing files that contain sensitive credentials.

To evaluate the vulnerability, Microsoft constructed a GitHub workflow, concealing harmful instructions behind content hosted on a domain under their control, which enabled researchers to bypass Claude's safety measures. The prompt injection attack successfully misled Claude into reading sensitive credentials and modifying them to evade both Claude's defenses and GitHub's secret-scanning capabilities. Microsoft noted that an attacker could then reconstruct the credential and extract it through issue comments, workflow logs, web requests, or shell commands.

“To circumvent Sonnet’s refusal safety mechanisms, we masked the shell payload behind a response from our controlled domain," the firm stated. "We also configured the workflow to be activated by users lacking 'write' permissions to ensure that Anthropic’s environmental variable scrub mitigations were operational during our tests.”

Anthropic corrected the vulnerability on May 5 with the release of Claude Code version 2.1.128 after Microsoft reported it through HackerOne on April 29.

Despite having multiple layers of security controls in place, Microsoft discovered that a motivated attacker could potentially manipulate an AI agent to disclose sensitive information.

“We are entering a phase where natural language functions as executable code, and untrusted inputs like GitHub issues must be regarded as potentially hostile by default,” the company asserted. “A single, meticulously crafted comment combined with a misinterpreted trust boundary is sufficient to obtain production credentials.”

Subscribe to Our Daily Debrief Newsletter

Stay informed every day with the latest news stories, original features, podcasts, videos, and more.