The expression "AI hallucination" is well-known to anyone who's experienced ChatGPT or Gemini or Perplexity spouting obvious falsehoods, which is pretty much anyone who's ever used an AI chatbot. Only, it's an expression that's incorrect. The proper term for when a large language model or other generative AI program asserts falsehoods is not a hallucination but a "confabulation." AI doesn't hallucinate, it confabulates.
Alone in the fog, after six days of running through the British mountains, Raf Willems began to speak with the grass and stones. About what, he doesn't know. The air was freezing, but he liked it that way; summer heat made for awful running weather. To be in naturethe thickening precipitation above, the undulating terrain belowpushing his physical limits was a thrilling adventure. Until he saw dead people lying in the snow, calling to him: Help, help, help me!
What would you call an assistant who invented answers if they didn't know something? Most people would call them "Fired." Despite that, we don't mind when AI does it. We expect it to always have an answer, but we need AI that says, "I don't know." That helps you trust the results, use the tool more effectively and avoid wasting time on hallucinations or overconfident guesses.
In one of my courses at Stanford Medical School, my classmates and I were tasked with using a secure AI model for a thought experiment. We asked it to generate a clinical diagnosis from a fictional patient case: "Diabetic retinopathy," the chatbot said. When we asked for supporting evidence, it produced a tidy list of academic citations. The problem? The authors didn't actually exist. The journals were fabricated. The AI chatbot had hallucinated.
OpenAI has been clear in its messaging that different models perform differently. But my recent testing has shown that different interaction modes, even using the same model, also perform differently. As it turns out, ChatGPT in Voice Mode (both Standard and Advanced) is considerably less accurate than the web version. The reason? It doesn't want to take time to think because that would slow down the conversation.
Hallucination is fundamental to how transformer-based language models work. In fact, it's their greatest asset: this is the method by which language models find links between sometimes disparate concepts. But hallucination can become a curse when language models are applied in domains where the truth matters. Examples range from questions about health care policies, to code that correctly uses third-party APIs.
Bi Gan's new film "Resurrection" is a bold exploration of hallucination and memory in an episodic journey through Chinese history, featuring incredible visual storytelling.