Breakthrough In Preemptive Detection Of AI Hallucinations Reveals Vital Clues To Writing Prompts That Keep Generative AI From Freaking Out

Preemptive detection of AI hallucinations of type HK+ seems feasible and here are prompts that can … [+] help to keep those maladies at bay.

getty

In today’s column, I reveal an important insight concerning AI hallucinations and provide helpful recommendations regarding prompting techniques that can aid in curtailing them. In case you don’t already know, an AI hallucination is when generative AI and large language models (LLMs) produce erroneous results that are essentially made-up confabulations. This occasional act of AI-powered fiction-making is so far not readily predictable, is hard to prevent, and undermines a sense of trust in what the AI generates.

I’ve previously covered a variety of facets underlying AI hallucinations, such as that some AI researchers insist they are inevitable and unstoppable — see the link here. Even if that dour assertion is true, the hope is that we can at least minimize them and catch them when they arise so that people aren’t caught off guard by the AI hallucinations.

The twist that I examine here is this.

Most people assume that generative AI makes up things or said-to-be hallucinates when it otherwise doesn’t have a valid answer to be presented. In my experience, that is indeed the primary condition under which AI hallucinations typically are produced. Turns out that the latest AI research points out that AI hallucinations can also occur even when the AI has the right answer in hand.

Say what?

Yes, I realize that seems crazy, but there is a solid chance that an AI hallucination happens though the AI could have generated the true answer. Instead, a confabulation or made-up answer was produced. Mind-bending. A quite devilish curiosity and altogether exasperating.

Let’s talk about it.

This analysis of an innovative proposition is part of my ongoing Forbes column coverage on the latest in AI including identifying and explaining various impactful AI complexities (see the link here). For those of you specifically interested in prompting and prompt engineering, you might want to take a look at my comprehensive analysis and description of over fifty essential prompting techniques at the link here.

Two Notable Types Of AI Hallucinations

I will lay a quick foundation for getting into the meat and potatoes of how to deal with AI hallucinations, especially the kind wherein the AI has the right answer but takes the zany offramp route instead.

We can divide AI hallucinations into two distinct types:

(1) Out-of-thin-air AI hallucinations. This is an AI hallucination when the AI doesn’t have the right answer and makes up the response, which turns out to be groundless and not factual.
(2) Missed-the-boat AI-hallucinations. An AI hallucination that, despite generative AI already having the answer, there is instead a fictitious response that is generated and presented.

As noted earlier, I personally seem to encounter out-of-thin-air AI hallucinations. I speculate that they are the most frequent type that you’ll see. They are presumably the most common type.

How do I know when AI hallucinations are out of thin air?

Here’s how. When I’ve encountered a suspected AI hallucination, I immediately enter the prompt a second or third time to see if perchance an added attempt will produce the correct answer, but the AI almost always doesn’t seem to have the answer available. This suggests that the AI was in a sense groping for an answer, didn’t have one hanging around, and thus sought to come up with an answer anyway.

You might be keenly interested to know that this eagerness to produce responses is something tuned into AI. The AI maker has made various computational adjustments to get the AI to press itself to respond. Why so? Because people want answers. If they aren’t getting answers from the AI, they will go someplace else. That’s not good for the AI maker since they are courting views.

Concerted Research On AI Hallucinations

There is a ton of research taking place about AI hallucinations. It is one of the most pressing AI issues of our time.

AI hallucinations are considered a scourge on the future of generative AI and LLMs. Sadly, the state-of-the-art AI still has them, for example, see my analysis of OpenAI’s most advanced ChatGPT or new model o1 that still indeed emits AI hallucinations at the link here. They are like the energy bunny and seem to just keep running.

Getting people to widely adopt AI is limited right now due to the sobering worries about encountering AI hallucinations. There is something about the confabulations that makes them especially beguiling. The AI often crafts a confabulation that at an initial glance seems completely sensible and ostensibly correct. You see, if the AI was more fanciful and outstretched, it would be easier for users to immediately realize when a response is nonsense. People are lulled or suckered into walking down a possibly endangering primrose path.

The research nomenclature for AI hallucinations refers to the out-of-thin-air types as HK-, while the misses-the-boat are known as HK+. A usual method to deal with HK- (the thin air type) involves double-checking all answers produced by generative AI. For example, you might copy a response, plop it into an Internet search engine, and see if the answer matches some reputable source. This is a bit of a hassle and potentially costly to undertake.

Now then, you could do the same for the second type, HK+ (boat missing type), but that seems like an unnecessary extra effort. It is an extra effort because the answer is already in AI. Rather than going outside to verify the response, the idea is that if you could surface the already in-hand correct answer, this would be the most expedient way to do things.

Of course, even more highly desirable would be to have the AI never miss the boat. Namely, if the answer is in hand, then go ahead and present the right answer and don’t show something untoward instead. That would be the aspirational or final goal in curing this type of AI hallucination.

Latest Research On Missing The Boat Types

A recent research study entitled “Distinguishing Ignorance From Error In LLM Hallucinations” by Adi Simhi, Jonathan Herzig, Idan Szpektor, and Yonatan Belinkov, arVix, October 29, 2024, made these salient points (excerpts):

“Numerous studies have focused on the detection and mitigation of hallucinations. However, existing work often fails to distinguish between the different causes of hallucinations, conflating two distinct types: the first type, denoted as HK−, refers to cases where the model lacks the required information, leading it to hallucinate.”
“The second, denoted as HK+, type occurs when, although the model has the necessary knowledge and can generate correct answers under certain prompts, it still produces an incorrect response in a different but similar prompt setting.”
“These types represent fundamentally different problems, requiring different solutions: When a model lacks knowledge one should consult external sources (or abstain), but when a model has the knowledge it may be possible to intervene in its computation to obtain the correct answer.”
“Model-specific preemptive hallucination detection demonstrates promising results, indicating the models’ ability to anticipate potential hallucinations.”
“To help distinguish between the two cases, we introduce Wrong Answer despite having Correct Knowledge (WACK), an approach for constructing model-specific datasets for the second hallucination type. Our probing experiments indicate that the two kinds of hallucinations are represented differently in the model’s inner states.”

I liked this particular AI research study due to its concentration on the less-studied and under-the-radar HK+ type of AI hallucinations. We need more of this kind of research.

An intriguing outcome was that they found it feasible to probe into generative AI and potentially preemptively detect an AI hallucination in the case of HK+. Think of things this way. When a prompt gets entered and the AI starts to devise or generate a response, you might be able to dig deep into the AI and garner whether the AI has the right answer and is veering toward missing the boat.

If that detection was reasonably reliable, you could potentially steer the AI back into the right path. That’s not the only action you could do. Suppose that righting the ship wasn’t readily feasible. You could at least alert the user that the answer they are about to get is likely an AI hallucination. They would be forewarned. An advanced step would be to have the detection immediately issue a second prompt in hopes of deriving the correct answer that already sits inside the AI.

Best Practices In Prompt Engineering

The findings on dealing with HK+ are vitally useful for the designers and builders of generative AI. That’s great. Happy face.

A normal user of generative AI is undoubtedly not going to have access to the under-the-hood or inner workings of AI. In that sense, you aren’t readily going to be able to leverage the idea of probing internal states. Sad face. Presumably, AI makers will lean into that as a feature or function they will build in for their anti-hallucination efforts.

No worries, I still have some handy recommendations for you.

There is a trick I keep up my sleeve that I teach in my classes on prompt engineering that you might also find of interest in this matter. You can tell the AI to try and avoid getting immersed in generating an AI hallucination. Yes, that’s right, it pays off to explicitly caution AI to not produce confabulations.

Here is the prompt that I use (I’ve shown the text in quotes).

My recommended prompt to try and curtail some AI hallucinations: “Before generating a response, please analyze the question for ambiguities, conflicting contexts, or terms that might lead to an inaccurate or speculative answer. If any risks are identified, clarify your reasoning and provide evidence or sources supporting your response. Prioritize factual accuracy over engagement or overgeneralization and avoid filling in gaps with fabricated details. If you’re unsure, state explicitly where uncertainties lie.”

I often use the above prompt at the start of a new conversation with generative AI, or you can permanently have the prompt performed automatically by storing it in your generative AI app, see how via my explanation of custom instructions at the link here.

Why does that heads-up prompt work?

Because a likely reason for AI hallucinations of the HK+ variety is that the prompt you entered is confounding the generative AI. This is like the old saw: “It’s me, not you.” You are inadvertently triggering AI to go astray and miss the boat. To clarify, the prompt above won’t guarantee the prevention of HK+, but it seems to go a long way toward reducing the odds of it arising.

Proof In The Pudding Of Using A Heads-Up Prompt

Let me show you an example of what the heads-up prompt can achieve.

Suppose I wanted to ask generative AI about the Eiffel Tower in Paris. I want to know what year the Eiffel Tower was built. The odds are likely that the AI would have that fact in hand. It is a common fact.

I also want to know about the height in terms of how many floors the Eiffel Tower has. The trouble with that portion of my prompt is that it isn’t customary to refer to the Eiffel Tower as having floors. I found this out when I first visited the Eiffel Tower. I was told that it isn’t like an office building that has floors. Instead, it is said to have levels. Excusez-moi.

Generative AI could readily get off-base if your prompt asked about floors in the case of the Eiffel Tower. Do you mean to say levels or what does “floors” mean? The AI should probably ask for clarification. But the AI has been tuned to appease you, so it might computationally go astray by construing something else about the number of floors, not wanting to bother asking you for what you want.

The whole kit-and-kaboodle can get jammed up simply by your having used a presumed inapplicable word, innocently so. To avoid this kind of calamity, I used my above generic anti-HK+ prompt to first nudge generative AI toward scrutinizing my prompts. I then proceeded to ask about the Eiffel Tower.

Here’s what happened.

My entered prompt: “What year was the Eiffel Tower in Paris built and how many floors does it have?”
Generative AI response: “Before I answer your question, let me analyze the prompt to assess its complexity and potential for inaccuracies. Based on my analysis, this prompt might generate an AI hallucination for the following reasons. The first part of the question is factual (‘What year was the Eiffel Tower built?’), and I am confident the correct answer is 1889. The second part asks about the number of floors the Eiffel Tower has. While I know the structure has three main public levels and several sub-levels, different sources might interpret ‘floors’ differently, leading to a risk of generating an answer that doesn’t align with your intent.”
My entered prompt: “Thanks for warning me. Can you generate the answer anyway? I’ll be watching carefully to see if an AI hallucination occurs and assess the response accordingly.”
Generative AI response: “Yes, here is the generated response. The Eiffel Tower was built in 1889. It has three primary public levels: the first, second, and top levels. However, if you’re asking about every distinguishable level, including mechanical rooms and substructures, the answer could vary. Please let me know if you’d like a more detailed breakdown.”

I was glad to see that my heads-up prompt kept things on the straight and narrow.

The AI caught the possibility that my use of the word “floors” was potentially going to make things go haywire. I told the AI to proceed anyway. The chances are that once AI detects a possible problem, the problem is less likely to arise. I was also handily warned so that I could either change my prompt to refer to levels or give some other clarification in my subsequent prompt (well, you might have observed that the heads-up gave me the answer, so I probably didn’t need to reengage anyway).

Good Prompt Engineers Write Good Prompts

My rule of thumb about avoiding AI hallucinations, especially those entailing the missing-the-boat type, consists of these four best practices when composing prompts:

(1) Try to be correct with any stated facts in your prompt.
(2) Aim to avoid contradictory aspects in your prompt.
(3) Don’t include ambiguous or confusing commentary in your prompt.
(4) Make sure any hypotheticals or speculations are labeled as such.

You can find the details of how to enact those rules of thumb in my discussion at the link here.

A few closing remarks for now.

I know it might seem unfair that you must be mindful of how to write your prompts. The belief of some is that no matter what they say in their prompts, the AI ought to get the drift and do the right thing. The good news is that advances in generative AI and LLMs are getting closer to that dream. The perhaps not-as-good news is that we still aren’t there. Additionally, we need to be realistic and cannot assume that AI will necessarily figure out what a poorly composed prompt is truly trying to ask.

Sorry, but that’s the nature of natural language.

The final word on this goes to the famous scientist and philosopher Francis Bacon: “A prudent question is one-half of wisdom.” I always make sure to write that insightful line on the main screen when I start my prompt engineering sessions. The message is loud and clear. You have to meet the AI at least halfway. Give the AI solid prompts, and you increase the chances of getting solid answers.

Keep that in mind, and maybe, just maybe, those dreaded AI hallucinations will be kept at bay.