Page 10 - AI Vol 1: Foundations of AI
P. 10

known as “transformers,” inspired by the human
            brain’s  way  of  processing  information.  This             THE DEVELOPMENT OF

            structure  is comprised  of layers  and nodes,            ATTENTION MECHANISM IS
            paralleling  the neurons and synapses  in our          ONE OF THE MOST IMPORTANT
            brains.  In  these  models,  each  layer  processes     DEVELOPMENTS MAKING THE
            information  before passing it on to the next,             CURRENT LLMs POSSIBLE.
            deeper layer, a process that underpins the “deep”
            in “deep learning.” The initial layers manage the
            basic structure of language, such as grammar and     the  varying  importance  of  different  parts  of  a

            common phrases. As the information progresses        sentence,  thereby  understanding  its meaning
            to  deeper  layers,  the  model’s  understanding     more  effectively.  Without  this  mechanism,  the
            becomes  more  refined,  enabling  it  to  discern   model would process each word as if it had equal
            nuances in context, tone, and even humor.            importance  necessitating  greater  processing
                                                                 power  and  potentially  diminishing  the  model’s
                                                                 inferential  capabilities.  The  development  of
            A key feature of these models is the “attention
            mechanism”  which  enables  LLMs  to  discern        attention mechanisms is one of the most important
                                                                 developments  making  the  current  generation  of
                                                                 LLMs  possible. This  is  similar  to  how  humans
            Transformer-Based LLM Model Architecture             process language, by focusing on key information
                                                                 and paying less attention to irrelevant details. For
                                                                 example, humans are often able to understand the
                                                                 meaning of a grammatically  incorrect  sentence
                                                                 by focusing on key words or phrases. Attention
                                                                 mechanisms  allow  LLMs to  do something
                                                                 similar. Additionally, the incorporation of parallel

                                                                 processing,  where  multiple  parts  of a  sentence
                                                                 are analyzed simultaneously by the transformers,
                                                                 enhances  the  speed  and  efficiency  of  LLMs  in
                                                                 understanding and generating language.


                                                                 2. DATA AND TRAINING PROCESS

                                                                 To understand how LLMs are trained and
                                                                 function, it is helpful to imagine a complex
                                                                 mathematical  formula  with  numerous  variables.
                                                                 In LLMs, these variables represent the “nodes” or
                                                                 “parameters”  of  the  model,  with  other  variables
                                                                 signifying “weights” and “biases” that determine
                                                                 the  relationships  between  these  parameters.  To

                                                                 provide  a  sense  of  scale,  GPT-3  has  175  billion





     10   |    VOLUME  1                                                        FOUNDATIONS OF AI  |  LOZANOSMITH.COM
   5   6   7   8   9   10   11   12   13   14   15