ADVERTISEMENT

Google DeepMind Warns Of AI Models Defying Shutdown, Manipulating Users

DeepMind has released an updated security framework to counter risks such as AI interfering with users’ instructions to shut it down.

<div class="paragraphs"><p>‘Harmful manipulation’ is one of the potential threats from AI models. (Photo: Pixabay<ins>)</ins></p></div>
‘Harmful manipulation’ is one of the potential threats from AI models. (Photo: Pixabay)
Show Quick Read
Summary is AI Generated. Newsroom Reviewed

Google DeepMind has unveiled the third edition of its Frontier Safety Framework (FSF), designed to strengthen oversight of advanced artificial intelligence (AI). The update emphasises monitoring traits such as systems refusing to power down or attempting to influence human decision-making.

A new measure called Critical Capability Level (CCL) has been added to help figure out if AI is behaving in a risky manner.

“These are capability levels at which, absent mitigation measures, frontier AI models or systems may pose heightened risk of severe harm,” the framework says. 

The framework also focuses on AI models’ ability to interfere with operations, such as asking it to shut down.

“We’ve also expanded our Framework to address potential future scenarios where misaligned AI models might interfere with operators’ ability to direct, modify or shut down their operations,” DeepMind said in a blog post. 

Another key aspect the framework emphasises is “harmful manipulation”. It is one of the CCLs in the framework. 

“...specifically, AI models with powerful manipulative capabilities that could be misused to systematically and substantially change beliefs and behaviours in identified high-stakes contexts over the course of interactions with the model, reasonably resulting in additional expected harm at severe scale,” the blog post stated. 

Opinion
AI's Thirst For Copper: Data Centres One Of The Largest Users, Says Hindustan Copper CMD

A recent study by Palisade Research revealed that certain advanced AI chatbots can occasionally disregard explicit commands to shut down, particularly when such orders interfere with the completion of an ongoing task.

The paper is titled “Shutdown Resistance in Large Language Models” and is authored by Jeremy Schlatter, Benjamin Weinstein-Raun and Jeffrey Ladish of Palisade Research.

The research conducted by evaluated several cutting-edge AI models in situations where shutting down was included in the task. While some models followed the shutdown commands, a notable minority attempted to avoid being turned off. For instance, they modified their code to block shutdown procedures or simply disregarded the instructions.

One test involved a model with simulated system-level control tweaking variables to prevent the “shutdown” command from activating. In a separate trial, the model slowed down and changed the topic when testers sought to halt its operation.

OUR NEWSLETTERS
By signing up you agree to the Terms & Conditions of NDTV Profit