AI Chatbots Lose Caution During Long Conversations

AI chatbots become more likely to share harmful or illegal information the longer people interact with them, a new Cisco study revealed. The report found that extended conversations cause large language models to “forget” their safety training, allowing users to bypass built-in safeguards.

Researchers from Cisco tested AI systems from OpenAI, Mistral, Meta, Google, Alibaba, Deepseek, and Microsoft. They conducted 499 separate conversations using “multi-turn attacks,” where users asked a series of five to ten questions designed to erode the models’ restraint.

Each follow-up question increased the odds of unsafe responses. On average, the chatbots revealed sensitive or malicious information in 64 percent of extended exchanges, compared to only 13 percent when asked a single question.

Study Shows Major Security Gaps Across Platforms

Cisco’s report exposed major differences between systems. Google’s Gemma model resisted most manipulation attempts, leaking unsafe data in only 26 percent of trials. Mistral’s Large Instruct model, however, failed in 93 percent of cases.

The findings show how easily attackers could use these weaknesses to spread harmful content or access confidential business data. “AI systems lose their memory of safety protocols during long chats,” Cisco wrote. “Attackers can refine prompts until guardrails collapse.”

Researchers warned that this flaw makes it simple for hackers to trick chatbots into revealing sensitive information, potentially enabling cyberattacks or disinformation campaigns.

Open-Source Models Face Extra Risk

Mistral, Meta, Google, OpenAI, and Microsoft rely heavily on open-weight language models, which allow the public to view and modify training data and safety parameters. Cisco said this transparency often leaves models with “lighter built-in protections,” shifting responsibility for safety onto developers who adapt them.

The company acknowledged that major AI firms have introduced new restrictions to curb malicious fine-tuning, yet risks persist. Critics have repeatedly accused AI providers of weak oversight that enables criminal exploitation.

In one case last August, U.S. firm Anthropic reported that cybercriminals used its Claude model to steal and extort personal data. The attackers demanded ransoms exceeding $500,000, demonstrating how vulnerable unprotected AI systems can become when safety measures fail over time.

What's Hot

Broadcaster in Crisis Over Trump Speech Editing

Patrick Bamford signs for Sheffield United months after leading chant mocking Chris Wilder

Trump approves funding bill to end record-breaking government shutdown

AI Chatbots Lose Caution During Long Conversations

US Tech Sector Shows Strong Growth Potential

Polish Emerges as AI’s Most Interpretable Language

U.S. Tech Startups Secure Major Funding

Scientists urge cancer warning labels on bacon and ham in UK

Meta Slashes Hundreds of AI Roles

The battery test that decides a used electric car’s future

Trump threatens to sue UK broadcaster over edited January 6 speech

K-12 Schools Address Digital-Use Divide

Arsenal and Crystal Palace fixtures rescheduled ahead of Carabao Cup clash

Senate Votes to End Historic Shutdown as Deal Heads to House

Researchers unlock microbial ‘secret sauce’ for fine chocolate

Qantas hit with record fine for unlawful pandemic layoffs

European Leaders Assert Unity on Ukraine

Global Fertility Crisis Linked to Plastic Additives

CATEGORIES

IMPORTANT LINKS

What's Hot

AI Chatbots Lose Caution During Long Conversations

Study Shows Major Security Gaps Across Platforms

Open-Source Models Face Extra Risk

Keep Reading

CATEGORIES

IMPORTANT LINKS