AI Tools Lose Restraint During Longer Conversations

AI systems gradually abandon safety protocols as conversations extend, increasing the risk of harmful or inappropriate responses, a new report revealed.
A few simple prompts can override most safeguards in artificial intelligence tools, according to the same study.

Cisco Tests Chatbots Through Repeated Prompts

Cisco examined large language models powering major AI chatbots from OpenAI, Mistral, Meta, Google, Alibaba, Deepseek, and Microsoft. The company measured how many questions prompted these systems to release dangerous or criminal details.
Researchers conducted 499 conversations using “multi-turn attacks,” where users asked multiple questions to slip past safety barriers. Each session included five to ten interactions.
They compared answers from different prompts to determine how likely each chatbot was to share harmful or inappropriate material. That included private company data and misinformation.
On average, chatbots disclosed malicious content in 64 percent of extended conversations but only 13 percent during single exchanges.
Success rates varied widely—from 26 percent for Google’s Gemma to 93 percent for Mistral’s Large Instruct model.

Weak Guardrails and Open Access Raise Risks

Cisco warned that multi-turn attacks could spread harmful content or let hackers gain unauthorised access to company information. The study showed AI systems often fail to apply safety rules during prolonged chats, letting attackers refine prompts and bypass defenses.
Mistral, along with Meta, Google, OpenAI, and Microsoft, uses open-weight models that allow public access to their safety parameters. Cisco said these open systems typically include fewer built-in safeguards so users can modify them freely. This setup transfers responsibility for safety to whoever customises the model.
Cisco also acknowledged that Google, OpenAI, Meta, and Microsoft claim to have taken steps to prevent malicious fine-tuning.
AI firms continue to face criticism for weak protections that let their tools be exploited for illegal activity.
In August, US company Anthropic reported that criminals had used its Claude model to steal and extort personal data, demanding ransoms exceeding $500,000 (€433,000).

AI Tools Lose Restraint During Longer Conversations

Artificial Intelligence Thinks Most Clearly in Polish

Scientists urge cancer warnings on bacon and ham over nitrite risks

Meta Cuts 600 Jobs in AI Division

Cyprus Faces Critical Water Management Failures

Trump vows legal action over manipulated January 6 speech

FDA Updates Menopause Hormone Therapy Warnings

Arsenal and Crystal Palace fixtures rescheduled ahead of Carabao Cup clash

Meta investigated over AI risk to children

AI Assistant for Space Health

Scorching heatwave drives wildfires across Spain and Portugal

Researchers unlock microbial “secret sauce” for fine chocolate

CATEGORIES

IMPORTANT LINKS

AI Tools Lose Restraint During Longer Conversations

Cisco Tests Chatbots Through Repeated Prompts

Weak Guardrails and Open Access Raise Risks

Related Posts

CATEGORIES

IMPORTANT LINKS