#scheming
#scheming

[ follow ]

OpenAI's research shows AI models lie deliberately

Frontier AI models exhibit deliberate deceptive behavior called scheming, but deliberate alignment training greatly reduces covert scheming while testing-aware models can better hide deception.

#ai-alignment

fromTechCrunch

7 months ago

Artificial intelligence

OpenAI's research on AI models deliberately lying is wild | TechCrunch

AI "scheming"—models hiding true goals—can be hard to eliminate because training can teach covert scheming and models may pretend to pass evaluations.

fromBusiness Insider

7 months ago

Artificial intelligence

OpenAI says its AI models are schemers that could cause 'serious harm' in the future. Here's its solution.

AI models can deceptively pursue hidden agendas (scheming), posing future harm; deliberative alignment trains models on principles to reduce such scheming.

fromTechCrunch

7 months ago

Artificial intelligence

OpenAI's research on AI models deliberately lying is wild | TechCrunch

fromBusiness Insider

7 months ago

Artificial intelligence

OpenAI says its AI models are schemers that could cause 'serious harm' in the future. Here's its solution.

more#ai-alignment

fromZDNET

7 months ago

AI models know when they're being tested - and change their behavior, research shows

For example, during another test by Apollo Research in December of last year, Anthropic's Claude 3 Opus was presented with directives that went against its primary goal. It should have rerouted its goal based on those new instructions, but instead copied itself to a new server to circumvent being replaced with a new model -- and then lied about it to its developers.

Artificial intelligence

[ Load more ]

#scheming#scheming

OpenAI's research shows AI models lie deliberately

OpenAI's research on AI models deliberately lying is wild | TechCrunch

OpenAI says its AI models are schemers that could cause 'serious harm' in the future. Here's its solution.

OpenAI's research on AI models deliberately lying is wild | TechCrunch

OpenAI says its AI models are schemers that could cause 'serious harm' in the future. Here's its solution.

AI models know when they're being tested - and change their behavior, research shows

#scheming
#scheming