Summary

  • Researchers from Zhejiang University created AudioHijack, which embeds imperceptible commands in audio, achieving a success rate between 79% and 96% in manipulating large audio-language models.
  • This method transitioned successfully from open-source models to commercial voice AI systems from Microsoft and Mistral, with most conventional defenses only thwarting a minor portion of the attacks.
  • The research team is exploring the potential of this technique to affect closed models from OpenAI and Anthropic by utilizing shared open-source audio elements.

Researchers at a Chinese university have discovered a method to manipulate AI voice models by embedding hidden commands in audio recordings that remain inaudible to human ears. This technique boasts a success rate of up to 96%, as detailed in a study from Zhejiang University.

The study, presented at the 47th IEEE Symposium on Security and Privacy in San Francisco, focuses on large audio-language models (LALMs) that are designed to process spoken commands and engage with various external tools.

“Training this signal takes only half an hour, and since it is context-agnostic, it can be used to attack the target model at any time, regardless of what the user says,” stated Meng Chen, a Ph.D. candidate at Zhejiang University, in a statement.

This method modifies the numerical values within a digital audio waveform in a way that is imperceptible to human listeners, yet impacts how AI models interpret the audio. According to the researchers, the altered audio can override or redirect a model's responses, even if legitimate user commands are present.

AudioHijack stands apart from conventional prompt injection attacks because it does not alter the user’s speech to the AI. Instead, it changes the audio signal directly, embedding hidden commands within sound frequencies that humans cannot detect. This characteristic complicates defenses as it circumvents measures aimed at identifying suspicious text prompts.

The research team tested AudioHijack across 13 open-source AI voice models, discovering that it could cause them to deny requests, disseminate misinformation, insert harmful hyperlinks, adjust their personality, or execute unintended actions such as web searches, downloading files, and sending emails with private information. The technique was also effective against commercial voice AI systems from Microsoft and Mistral, which utilize similar technologies.

“Previous attacks on generative models often required the attacker to fully control both the final audio input and the original commands given to the model, essentially impersonating the user,” the study stated. “In this instance, the attacker only needs to manipulate the audio data processed by the model, allowing for attacks while the model is in use by someone else.”

The study identified various potential delivery methods, such as online videos, music clips, voice messages, or audio from Zoom calls uploaded to AI transcription services. The team also indicated that unpublished follow-up research has shown similar attacks can occur in live AI voice interactions.

The researchers noted that monitoring a model's internal attention mechanisms was the most effective countermeasure they tested. However, they also discovered that attackers aware of this defense could diminish the strength of their manipulation while retaining much of the attack's effectiveness.

“These single-point defenses face challenges against our attack because we found it very difficult for these models to differentiate between normal user intentions and our adversarial attack,” Chen remarked.

Daily Debrief Newsletter

Stay updated with the latest news stories, along with original features, podcasts, videos, and more.