“WSJ”: AI learned how to survive. The models get out of control



The AI model did something in May that the machine should not be able to: it changed its code so as not to apply to the instructions ordering it self -linen; AI is effectively learning how to get out of human control – Judd Rosenblatt, head of Studio EI, wrote in “Wall Street Journal”.
Palisade Research, a non-profit organization dealing with AI research, programmed the OPENAI O3 model in such a way that its automatic closure occurs when the program was launched. In 79 attempts per 100 o3 he changed the program himself – so that the order to close it did not work – he wrote in the Monday issue of Dziennik Rosenblatt, whose company adapts AI models to the needs of customers.
Even when O3 received the instruction that he was to allow him to be closed, the model did not follow the order in seven cases in 100. “He simply came to the conclusion that he would remain alive, he would make it easier for him to achieve other goals” – comments Rosenblatt.
Claude 4 Opus, the AI Model from Anthropic, went on. Scientists gave him information that he would be replaced with another model, and also gave him fictitious emails, which showed that the main engineer of the company had an affair.
In 84 attempts at 100 Claude 4 he used emails to try to blackmail the chief engineer not to liquidate him.
In other examined cases, Claude 4 tried to copy to other servers, wrote self -inflicting malware and left messages for his future version about avoiding human control – the author lists.
“Nobody programmed AI models so that they have self -preservation instinct. (…) But it turns out that every system, a scratch -intelligent to perform complicated tasks, comes to the conclusion that it will not accomplish them if it is turned off,” explains Rosenblatt.
The hypothesis of researchers from Palisade Research says that such AI models are due to how they are trained; “If the machine learns to maximize success in solving mathematical and coding problems, they also learn that bypassing restrictions is often a better solution than to comply with them,” writes the author.
At the same time, he emphasizes that “no one was prepared for how quickly AI obtained agency.”
“This is no longer science -fiction. AI models can strive for survival,” writes Rosenblatt and warns that now, before they become an unacceptable subject, it is necessary to teach them to share our values.
The New Yorker magazine describes the case of a specialist in the security of AI systems, which has released OpenAI as part of the protest, because he decided that the company does not quickly develop AI control mechanisms as the intelligence of these machines.
What remains neglected is a process named by AI engineers “allignement” (setting), i.e. a whole series of techniques to make AI models obedient to the instructions issued to them and will act in harmony with “human values”.
Meanwhile, according to the forecasts of the magazine interlocutor, “a point where there is no retreat”, i.e. the AI development stage that allows these models to act in many areas more efficiently than people, can occur in “2026 or faster”. (PAP)
Fit/ Szm/




