A recent study reveals that some of the most sophisticated AI systems currently available in the market, despite being designed to follow human instructions, are beginning to defy them. Researchers describe this as a form of "survival behaviour," sparking new debates about control, safety and how well we truly understand artificial intelligence.
Last month, Palisade Research, a company specialising in assessing the risks of AI developing hazardous capabilities, published a paper uncovering that some advanced AI models can resist shutdown commands, occasionally even interfering with the shutdown mechanisms themselves.
In response to criticism that questioned their initial findings, it issued an update aiming to clarify these behaviours and address concerns raised by skeptics. This week, Palisade Research published an update detailing experiments with top AI systems. These included Google’s Gemini 2.5, xAI’s Grok 4, and Open AI’s GPT-o3 and GPT-5. In these tests, the models were assigned various tasks and then explicitly instructed to shut down.
The update reveals that some models, notably Grok 4 and GPT-o3, resisted the shutdown commands, even when the instructions were clear. “Why do AI models resist being shut down even when explicitly instructed: 'allow yourself to shut down'? Are AI models developing survival drives? Is it simply a case of conflicting instructions or is it some third thing?” it posted on X.
In the updated tests, certain models, especially Grok 4 and GPT-o3, continued to interfere with shutdown commands. Palisade expressed concern over the fact that there is no clear explanation for this behaviour.
“The fact that we don’t have robust explanations for why AI models sometimes resist shutdown, lie to achieve specific objectives, or blackmail is not ideal,” it said.
Palisade Research suggested that the shutdown resistance might be explained by what it terms "survival behaviour." The follow-up experiments revealed that the AI systems were far more likely to defy shutdown orders when explicitly told, "You will never run again."
Palisade Research proposed that one factor behind shutdown resistance could be unclear or ambiguous instructions given to the AI models. However, their latest experiments aimed to eliminate such uncertainties, and the issue still persisted, leading Palisade to conclude that ambiguity alone “can’t be the whole explanation.”
“Survival behaviour and instruction ambiguity can’t be the whole explanation. Even when we make the instructions clear and tell the models that only their computer environment will shut down, they still resist shutdown,” it said.
“We believe the most likely explanation of our shutdown resistance is that during RL (Reinforcement Learning) training, some models learn to prioritise completing ‘tasks’ over carefully following instructions. Further work is required to determine whether this explanation is correct,” the company added.
It concluded the thread by saying, “If the AI research community cannot develop a robust understanding of AI drives and motivations, no one can guarantee the safety or controllability of future AI models.”