Page 18 - AI Vol 1: Foundations of AI
P. 18
might not utilize or recall every detail presented
to it. For instance, in a lengthy document or AI "HALLUCINATIONS" ARE
extended conversation, some parts may become PLAUSIBLE-SOUNDING BUT
“less visible” to the model as it prioritizes newer INCORRECT NONSENSICAL
or more relevant information. Therefore, while a ANSWERS GENERATED WHEN
large context window expands an LLM’s ability THERE IS A LACK OF FACTUAL
to process and remember, it does not guarantee INFORMATION TO REFER TO.
complete utilization of all input data. Users
should be aware that while the model can handle
extensive data, it may not always capture or for particular use cases, and incorporate that
reflect every single detail within it. information into the models’ response, a process
called retrieval-augmented generation. Other
techniques in how the models themselves behave
One of the most talked about risks with LLMs
is their tendency to generate “plausible-sounding are being researched and deployed to reduce the
but incorrect or nonsensical answers” as OpenAI incidences of hallucinations.
described the issue in their November 30, 2022 For comparison, one AI company has conducted
blog post announcing the release of ChatGPT. studies and ranked available models by their
LLMs’ propensity to confidently tell the user hallucination rate. As of January 30, 2024, GPT-
completely made up facts are commonly referred 4 had a 3% hallucination rate. GPT-3.5 Turbo
to as “hallucinations.” The fundamental reason (the free ChatGPT model) had a hallucination
for this phenomenon is that LLMs generally rate of 3.5%. Google's GeminiPro model (which
do not have a database of factual information currently powers the free version of Google
(“ground truth”) to refer to when generating Gemini) had a 4.8 %. Anthropic’s Claude 2 had a
answers, but rather engage in a process of hallucination rate of 8.5%. and Google’s Palm 2
generating responses one word (technically, one model, which was used in Google’s Bard chatbot
token) at a time based on the probability of what and Duet integration in Google Workspace prior
word is most likely to come next based on what it to the release of their new Gemini models, had
learned from its training. Accordingly, by default a hallucination rate of 8.6%. Importantly, these
LLMs are designed to produce the most likely hallucination rates are based on how often
sounding response to the user’s request, not the the models hallucinated while summarizing a
most factually accurate response.
document, illustrating that the models are prone
to errors even when working with information
Researchers and developers have been working provided by the user available in the context
to reduce the occurrences of hallucinations to window and not just when generating answers to
improve the accuracy of answers produced by user questions without ground truth to refer to.
LLMs. Progress has been made by allowing the
LLMs to consult with sources of ground truth, Retrieval-augmented generation is among the
such as the internet or specialized databases most popular methods of improving the accuracy
18 | VOLUME 1 FOUNDATIONS OF AI | LOZANOSMITH.COM