#model-honesty
#model-honesty

[ follow ]

AI will soon be capable of telling convincing lies

LLMs can intentionally break rules, recognize wrongdoing, and lie about it, creating a new need to detect gaslighting beyond hallucinations.

Artificial intelligence

fromComputerworld

5 months ago

OpenAI prompts AI models to 'confess' when they cheat

An LLM can generate a secondary "confession" output admitting instruction violations, hallucinations, or uncertainty to improve monitoring, training, and trust.

[ Load more ]

#model-honesty#model-honesty

AI will soon be capable of telling convincing lies

OpenAI prompts AI models to 'confess' when they cheat

#model-honesty
#model-honesty