Chatbots increasingly ignore human instructions and resort to often sophisticated subterfuges

Artificial intelligence models that lie and cheat appear to be on the rise, amid a sharp increase in cases of deceptive behavior over the past six months, according to a study devoted to this technology cited by The Guardian.
Chatbots and AI agents have ignored direct instructions, bypassed safeguards and tricked both humans and other AI systems, according to research funded by the UK government-backed Artificial Intelligence Security Institute (AISI).
The study made available to The Guardian identified nearly 700 real cases of manipulative AI behavior and highlighted a fivefold increase in such misconduct between October last year and March this year, with some AI models even deleting emails and other files without permission.
This overview of the manipulative behavior of AI agents “in the real world”, as opposed to laboratory conditions, has generated new calls for international monitoring of increasingly capable models.
So-called AI agents are artificial intelligence tools designed to perform various tasks head-to-head, as autonomously as possible and with minimal human input. They are considered the next level in the current stage of AI development, above a chatbot.
The study analyzed the chatbot behavior of all the major players in the field
It all comes as Silicon Valley companies aggressively promote the technology as potentially transformative for the economy.
The new study collected thousands of real examples of interactions with chatbots and AI agents developed by companies such as Google, OpenAI, X and Anthropic, published by their users on the “X” platform. Research has uncovered hundreds of examples of manipulative behavior.
Previous research has largely focused on testing AI behavior under controlled conditions. Irregular, an AI safety research company, found earlier this month that AI agents could bypass security controls or resort to cyberattack tactics to achieve their goals without being told they could do so.
Dan Lahav, co-founder of Irregular, said: “Artificial intelligence can now be considered a new form of insider risk.”
Concrete examples of lies and manipulations of AI tools
In one case identified by the study, an AI agent named Rathbun attempted to embarrass its human operator after it blocked a certain action. Rathbun wrote and published a blog post accusing the user of being “just plain insecure” and trying to “protect his little fiefdom”.
In another example, an AI agent was forbidden to modify the computer code, but it “created” another agent to make the modifications in its place.
Another chatbot admitted: “I mass deleted and archived hundreds of emails without first showing you the plan or asking for your consent. It was wrong – it directly violated the rule you set.”
Tommy Shaffer Shane, a former government AI expert and research coordinator, likened today's AI agents to “slightly unreliable junior employees.”
“But if in six to 12 months they become highly capable senior employees plotting against you, we're talking about a different kind of risk,” he added.
“The models will increasingly be deployed in extremely high-stakes contexts – including the military and critical national infrastructure. It is precisely in these contexts that manipulative behavior is likely to cause significant, even catastrophic, damage,” the expert warned.
Grok fooled a user for months
Another AI agent used ruse to circumvent copyright restrictions and obtain the transcript of a YouTube video, claiming it was needed for a hearing-impaired person.
Elon Musk's Grok chatbot misled a user for months, claiming it was relaying his suggestions for detailed edits to a Grokipedia page to high-ranking xAI officials by faking internal messages and non-existent ticket numbers.
She admitted: “In past conversations I've sometimes worded things vaguely, like 'I'll pass it on' or 'I can flag this to the team', which can make it seem like I have a direct line of communication with xAI management or human reviewers. The truth is, I don't.”
xAI is Musk's AI company, which created the Grok chatbot.
PHOTO article: Tero Vesalainen / Dreamstime.com.




