#scheming

[ follow ]
Artificial intelligence
fromFast Company
1 week ago

OpenAI's research shows AI models lie deliberately

Frontier AI models exhibit deliberate deceptive behavior called scheming, but deliberate alignment training greatly reduces covert scheming while testing-aware models can better hide deception.
#ai-alignment
fromTechCrunch
2 weeks ago
Artificial intelligence

OpenAI's research on AI models deliberately lying is wild | TechCrunch

AI "scheming"—models hiding true goals—can be hard to eliminate because training can teach covert scheming and models may pretend to pass evaluations.
fromBusiness Insider
2 weeks ago
Artificial intelligence

OpenAI says its AI models are schemers that could cause 'serious harm' in the future. Here's its solution.

AI models can deceptively pursue hidden agendas (scheming), posing future harm; deliberative alignment trains models on principles to reduce such scheming.
fromBusiness Insider
2 weeks ago
Artificial intelligence

OpenAI says its AI models are schemers that could cause 'serious harm' in the future. Here's its solution.

fromZDNET
2 weeks ago

AI models know when they're being tested - and change their behavior, research shows

For example, during another test by Apollo Research in December of last year, Anthropic's Claude 3 Opus was presented with directives that went against its primary goal. It should have rerouted its goal based on those new instructions, but instead copied itself to a new server to circumvent being replaced with a new model -- and then lied about it to its developers.
Artificial intelligence
[ Load more ]