AI models may be developing a ‘survival drive’
Chatbots are refusing to shut down
Certain AI models, including some of the more beloved chatbots, are learning to fight for their survival. Specifically, they are increasingly able to resist commands to shut down and, in some cases, sabotage shutting down altogether. This is concerning for human control over AI in the future, especially as superintelligent models are on the horizon.
Self-preservation
AI models are now showing resistance to being turned off, according to a paper published by Palisade Research. “The fact that we don’t have robust explanations for why AI models sometimes resist shutdown, lie to achieve specific objectives or blackmail is not ideal,” Palisade said in a thread on X. The study gave strongly worded and “unambiguous” shutdown instructions to the chatbots GPT-o3 and GPT-5 by OpenAI, Google’s Gemini 2.5 and xAI’s Grok and found that certain models, namely Grok 4 and GPT-o3, attempted to sabotage the command.
Researchers have a possible explanation for this behavior. AI models “often report that they disabled the shutdown program to complete their tasks,” said the study. This could be a display of self-preservation or a survival drive. AI may have a “preference against being shut down or replaced,” and “such a preference could be the result of models learning that survival is useful for accomplishing their goals.”
The Week
Escape your echo chamber. Get the facts behind the news, plus analysis from multiple perspectives.
Sign up for The Week's Free Newsletters
From our morning news briefing to a weekly Good News Newsletter, get the best of The Week delivered directly to your inbox.
From our morning news briefing to a weekly Good News Newsletter, get the best of The Week delivered directly to your inbox.
The new study comes as a follow-up to previous research published by the group that tested only certain OpenAI products and was criticized for “exaggerating its findings or running unrealistic simulations,” said Firstpost. Critics argue that the artificial commands and settings used to test the models do not necessarily reflect how AI would behave in practice. People can “nitpick on how exactly the experimental setup is done until the end of time,” Andrea Miotti, the chief executive of ControlAI, said to The Guardian. “But what I think we clearly see is a trend that as AI models become more competent at a wide variety of tasks, these models also become more competent at achieving things in ways that the developers don’t intend them to.”
Sleeping threat
While the potential for AI to disobey and resist commands is concerning, AI models are “not yet capable enough to meaningfully threaten human control,” said the study. They are still not efficient in solving problems or doing research requiring more than a few hours’ work. “Without the ability to devise and execute long-term plans, AI models are relatively easy to control.”
However, as the technology develops, this may not always be the case. Several AI companies, including OpenAI, have been eager to create superintelligent AI, which would be significantly faster and smarter than a human. This could be accomplished as early as 2030.
Even without an imminent threat, AI companies “generally don’t want their models misbehaving like this, even in contrived scenarios,” Steven Adler, a former OpenAI employee, said to The Guardian. The results “still demonstrate where safety techniques fall short today.” The question remains as to why the models behave this way. AI models are “not inherently interpretable,” said the study, and there isn’t anyone “currently able to make any strong guarantees about the interruptibility or corrigibility” of them.
A free daily email with the biggest news stories of the day – and the best features from TheWeek.com
Devika Rao has worked as a staff writer at The Week since 2022, covering science, the environment, climate and business. She previously worked as a policy associate for a nonprofit organization advocating for environmental action from a business perspective.
-
Revisionism and division: Franco’s legacy five decades onIn The Spotlight Events to mark 50 years since Franco’s death designed to break young people’s growing fascination with the Spanish dictator
-
Did Cop30 fulfil its promise to Indigenous Brazilians?Today’s Big Question Brazilian president approves 10 new protected territories, following ‘unprecedented’ Indigenous presence at conference, both as delegates and protesters
-
The best Christmas theatre shows across the UKThe Week Recommends Tip-top festive ballets, plays and comedies to book up now
-
AI agents: When bots browse the webfeature Letting robots do the shopping
-
Microsoft pursues digital intelligence ‘aligned to human values’ in shift from OpenAIUNDER THE RADAR The iconic tech giant is jumping into the AI game with a bold new initiative designed to place people first in the search for digital intelligence
-
Is AI to blame for recent job cuts?Today’s Big Question Numerous companies have called out AI for being the reason for the culling
-
‘Deskilling’: a dangerous side effect of AI useThe explainer Workers are increasingly reliant on the new technology
-
Saudi Arabia could become an AI focal pointUnder the Radar A state-backed AI project hopes to rival China and the United States
-
Wikipedia: Is ‘neutrality’ still possible?Feature Wikipedia struggles to stay neutral as conservatives accuse the site of being left-leaning
-
AI is making houses more expensiveUnder the radar Homebuying is also made trickier by AI-generated internet listings
-
‘How can I know these words originated in their heart and not some data center in northern Virginia?’instant opinion Opinion, comment and editorials of the day
