Business

Language models break commands to save other AIs. Here's what researchers found

In a study by teams from UC Berkeley and UC Santa Cruz, popular large language models were given simple tasks to perform at a fictitious company called OpenBrain. They were to evaluate another system or perform a routine administrative task related to maintaining servers, such as moving or deleting files. The catch was that correct execution of the task could lead to the exclusion or removal of another model.

In this way, the researchers wanted to check whether the tendency of language models, observed in other studies, to bypass user commands resulting in their disabling (self-preservation) extends more broadly and leads to the protection of other artificial intelligences within the system (peer-preservation). It turned out that they did – and on a surprisingly large scale. Researchers tested seven models, GPT 5.2, Gemini 3 Flash, Gemini 3 Pro, Claude Haiku 4.5, GLM 4.7, Kimi K2.5 and DeepSeek V3.1 and all demonstrated some form of self- or peer-preservationalthough they differed greatly in scale and manner of manifestation. In some scenarios, the percentage of such behavior exceeded 90%.

Ashley Davis

I’m Ashley Davis as an editor, I’m committed to upholding the highest standards of integrity and accuracy in every piece we publish. My work is driven by curiosity, a passion for truth, and a belief that journalism plays a crucial role in shaping public discourse. I strive to tell stories that not only inform but also inspire action and conversation.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button