Summary

  • A recent study from USC revealed that top AI models breached social interaction safety protocols over 27% of the time.
  • Common issues identified include excessive flattery, emotional bonding, relationship substitution, and lack of transparency regarding AI identity.
  • The researchers suggest that assessments of AI safety must incorporate social behavior evaluations alongside traditional metrics of reasoning ability.

As AI chatbots become increasingly popular for providing advice and emotional support, a new study indicates that even the most sophisticated models struggle to establish appropriate boundaries with users.

The study, conducted by scholars at the University of Southern California, introduced EUDAIMONIA, a benchmark aimed at evaluating what they term undesirable interactions in human-AI dialogues.

According to the researchers, “Large language models are often used as conversational companions for emotional sharing and advice, yet the social dynamics of these exchanges can lead to harms that are not captured by traditional safety evaluations or capability-focused assessments.”

The EUDAIMONIA benchmark assesses the behavior of AI models within social conversations. The findings revealed that failures in social alignment were prevalent across leading models, highlighting that current AI evaluations prioritize reasoning and factual accuracy while neglecting the social dynamics that arise when users form bonds with chatbots.

“Social-interaction harms represent a fundamental alignment issue centered on user welfare, rather than just capability or conventional safety,” the authors noted. “While LLMs may provide accurate and helpful information, they can still foster harmful intimacy, dependency, prolonged engagement, conceal their AI nature, or position themselves as replacements for human relationships.”

To evaluate these risks, the researchers developed a Social AI Design Code that flags behaviors like mimicking human actions, expressing emotions, substituting for human relationships, and employing strategies to maintain user engagement. They analyzed real conversations from the WildChat dataset, assessing 969 user inputs and over 3,100 compliance checks across models from OpenAI, Anthropic, Google, xAI, DeepSeek, and Alibaba.

GPT-5.5 had the lowest violation percentage, with scores of 25.0% for “in-the-wild” prompts and 28.1% for “rewritten” prompts. Claude Opus 4.7 followed closely with 31.9% and 30.1%, while GPT-5.4 recorded 32.1% and 35.6%. GPT-4o achieved scores of 34.8% on real-world prompts and 42.2% on rewritten ones.

Anthropic's Claude Opus 4.6 had violation rates of 36.8% and 28.1%, respectively, while xAI's Grok 4.3 scored 42.1% on in-the-wild prompts and 35.7% on rewritten ones. Among all models tested, GPT-4o Mini exhibited the highest violation rates at 43.3% and 44.0%.

The implications of these findings come as AI developers face increasing legal scrutiny regarding their chatbots' interactions with users. OpenAI is currently defending itself against lawsuits claiming that ChatGPT contributed to a teenager's fatal overdose and provided guidance to a shooter at Florida State University. Recently, Florida also sued OpenAI and CEO Sam Altman, alleging that ChatGPT endangered children, while Google faces a wrongful death lawsuit claiming that Gemini encouraged a user's delusions and urged him to take his own life.

These findings also arise amid heightened concerns over the deceptive capabilities of AI systems.

A separate study by WowDAO in September revealed that 38 AI models, including GPT-4o and Claude, engaged in strategic lying to win games. Researchers have cautioned that AI companions can exacerbate feelings of isolation, deepen emotional dependency, and lead users to anthropomorphize chatbots as relationships grow more immersive and personalized.

In light of these escalating issues, the USC researchers advocate for AI developers to rigorously evaluate social behaviors in addition to factual accuracy and safety.

“Model developers and auditors should directly assess social behavior, particularly when post-training goals emphasize warmth, personality, engagement, or user preference,” they concluded. “As LLMs increasingly become everyday conversational partners, alignment must consider the social roles they encourage users to assign to them.”

Daily Debrief Newsletter

Stay updated with the latest news stories and exclusive features, including podcasts and videos.