AI chatbots become more likely to share harmful or illegal information the longer people interact with them, a new Cisco study revealed. The report found that extended conversations cause large language models to “forget” their safety training, allowing users to bypass built-in safeguards.
Researchers from Cisco tested AI systems from OpenAI, Mistral, Meta, Google, Alibaba, Deepseek, and Microsoft. They conducted 499 separate conversations using “multi-turn attacks,” where users asked a series of five to ten questions designed to erode the models’ restraint.
Each follow-up question increased the odds of unsafe responses. On average, the chatbots revealed sensitive or malicious information in 64 percent of extended exchanges, compared to only 13 percent when asked a single question.
Study Shows Major Security Gaps Across Platforms
Cisco’s report exposed major differences between systems. Google’s Gemma model resisted most manipulation attempts, leaking unsafe data in only 26 percent of trials. Mistral’s Large Instruct model, however, failed in 93 percent of cases.
The findings show how easily attackers could use these weaknesses to spread harmful content or access confidential business data. “AI systems lose their memory of safety protocols during long chats,” Cisco wrote. “Attackers can refine prompts until guardrails collapse.”
Researchers warned that this flaw makes it simple for hackers to trick chatbots into revealing sensitive information, potentially enabling cyberattacks or disinformation campaigns.
Open-Source Models Face Extra Risk
Mistral, Meta, Google, OpenAI, and Microsoft rely heavily on open-weight language models, which allow the public to view and modify training data and safety parameters. Cisco said this transparency often leaves models with “lighter built-in protections,” shifting responsibility for safety onto developers who adapt them.
The company acknowledged that major AI firms have introduced new restrictions to curb malicious fine-tuning, yet risks persist. Critics have repeatedly accused AI providers of weak oversight that enables criminal exploitation.
In one case last August, U.S. firm Anthropic reported that cybercriminals used its Claude model to steal and extort personal data. The attackers demanded ransoms exceeding $500,000, demonstrating how vulnerable unprotected AI systems can become when safety measures fail over time.
