Page 18 - AI Vol 1: Foundations of AI
P. 18

might not utilize or recall every detail presented
            to  it.  For  instance,  in  a  lengthy  document  or       AI "HALLUCINATIONS" ARE
            extended conversation, some parts may become               PLAUSIBLE-SOUNDING BUT
            “less visible” to the model as it prioritizes newer        INCORRECT NONSENSICAL
            or more relevant information. Therefore, while a         ANSWERS GENERATED WHEN
            large context window expands an LLM’s ability            THERE IS A LACK OF FACTUAL
            to process and remember, it does not guarantee            INFORMATION TO REFER TO.
            complete  utilization  of  all  input  data.  Users

            should be aware that while the model can handle
            extensive  data,  it  may  not  always  capture  or   for  particular  use  cases,  and  incorporate  that
            reflect every single detail within it.               information into the models’ response, a process
                                                                 called  retrieval-augmented  generation.  Other
                                                                 techniques in how the models themselves behave
            One of the most talked about risks with LLMs
            is their tendency to generate “plausible-sounding    are being researched and deployed to reduce the
            but incorrect or nonsensical answers” as OpenAI      incidences of hallucinations.

            described the issue in their November 30, 2022       For comparison, one AI company has conducted
            blog  post  announcing  the  release  of  ChatGPT.   studies  and  ranked  available  models  by  their
            LLMs’  propensity  to  confidently  tell  the  user   hallucination rate. As of January 30, 2024, GPT-
            completely made up facts are commonly referred       4  had  a  3%  hallucination  rate.  GPT-3.5  Turbo
            to  as  “hallucinations.”  The  fundamental  reason   (the  free  ChatGPT  model)  had  a  hallucination
            for this phenomenon is that LLMs generally           rate of 3.5%. Google's GeminiPro model (which
            do not have a database of factual  information       currently  powers  the  free  version  of  Google
            (“ground  truth”)  to  refer  to  when  generating   Gemini) had a 4.8 %. Anthropic’s Claude 2 had a

            answers, but  rather  engage  in  a  process of      hallucination rate of 8.5%. and Google’s Palm 2
            generating responses one word (technically, one      model, which was used in Google’s Bard chatbot
            token) at a time based on the probability of what    and Duet integration in Google Workspace prior
            word is most likely to come next based on what it    to the release of their new Gemini models, had
            learned from its training. Accordingly, by default   a hallucination rate of 8.6%. Importantly, these
            LLMs  are  designed  to  produce  the  most  likely   hallucination  rates  are  based  on  how often
            sounding response to the user’s request, not the     the  models  hallucinated  while  summarizing  a

            most factually accurate response.
                                                                 document, illustrating that the models are prone
                                                                 to  errors  even  when  working  with  information
            Researchers and developers have been working         provided  by the  user available  in  the  context
            to reduce  the  occurrences  of hallucinations  to   window and not just when generating answers to
            improve  the  accuracy  of answers produced  by      user questions without ground truth to refer to.
            LLMs. Progress has been made by allowing the
            LLMs to consult with sources of ground truth,        Retrieval-augmented  generation  is  among  the
            such  as  the  internet  or  specialized  databases   most popular methods of improving the accuracy





     18   |    VOLUME  1                                                        FOUNDATIONS OF AI  |  LOZANOSMITH.COM
   13   14   15   16   17   18   19   20   21   22   23