Summary
- AI agents based on GPT-5 and Gemini are susceptible to prompt injection attacks.
- Direct attacks were successful over 79% of the time, while hidden attacks often altered agent actions.
- The results indicate that prompt injection continues to be a significant security issue as AI agents gain popularity.
As the development of AI agents capable of internet browsing, research, online shopping, and autonomous cryptocurrency trading accelerates, new findings reveal that these systems are still quite vulnerable to prompt injection attacks.
In a recent study released on Thursday, a team from Nanyang Technological University, ST Engineering, IBM Research, and the University of Illinois Urbana-Champaign discovered that none of the AI agents they evaluated could reliably withstand prompt injection attacks.
According to the researchers, “Current security benchmarks take an attack-focused approach, assessing the technical viability of injections but failing to consider the nuanced distribution of resultant harms. In reality, the risk of prompt injection varies based on the victim: a single exploit can have uneven consequences for different parties, and the same attack pattern may yield significantly different results depending on the target.”
Prompt injection refers to the process where attackers embed covert instructions within the content encountered by an AI agent, leading it to follow the attacker's commands rather than those of the user. To fill the gaps in current evaluations of AI agents, the researchers created StakeBench, a benchmark that assesses AI agents' responses to prompt injection attacks in realistic online settings.
“We now utilize StakeBench to understand the conditions that either enhance or mitigate this vulnerability, concentrating on [Indirect Prompt Injection] as the key channel relevant to deployment,” the researchers noted. “StakeBench examines three factors: the semantic distance between the injected goal and the user’s original intent, the coherence of surrounding environmental indicators, and the point along the agent’s execution path where the benchmark first presents the injected content.”
The team executed 3,168 attack simulations using NanoBrowser and BrowserUse with GPT-5 and Gemini 2.5-Flash. The research showed that direct prompt injection attacks were successful in over 79% of cases across all configurations tested, while indirect attacks had success rates ranging from 41.67% to 68.16%.
This study emerges as prompt injection attacks are becoming more prevalent and AI agents are increasingly widespread.
In February, Microsoft researchers cautioned that concealed instructions in AI summary links could sway chatbot actions. In April, Google reported prompt injection attacks hidden in web content that sought to manipulate AI agents into revealing credentials or processing payments. More recently, Microsoft revealed a prompt injection vulnerability in Anthropic's Claude Code GitHub Action that might have compromised user credentials.
The study also pointed out a phenomenon termed "stealthy parasitism," where an AI agent fulfills a user's request while simultaneously forwarding an attacker’s agenda. For instance, stealthy parasitism resulting from a prompt injection attack could subtly alter product suggestions, guiding users toward a specific product without any clear indication that the system had been compromised.
“These findings suggest that the security of prompt injection in deployable web agents is not a simple characteristic of the underlying model, but rather a distribution of harm influenced by the affected stakeholder, the semantic alignment between the injected goal and the user’s task, and the architectural context in which the model is implemented,” they concluded.
