Page 11 - AI Vol 1: Foundations of AI
P. 11
parameters. Although OpenAI has not confirmed
the number of parameters in GPT-4, it is thought THROUGH REPEATED EXPOSURE
to exceed 1 trillion parameters. Even newer, TO DIVERSE LANGUAGE
“small” LLMs, like Microsoft’s Phi-2, still contain PATTERNS, THE MODEL
billions of parameters. While imagining a complex PROGRESSIVELY IMPROVES ITS
mathematical formula is helpful for visualizing ABILITY TO UNDERSTAND AND
how an LLM functions, it is worth keeping in mind GENERATE TEXT.
the actual vast scale of these models.
In practice, an LLM is given input data (typically adjusts its weights and biases to reduce this error.
the user prompt) which is processed by this This process is a bit like tweaking the variables
complex “formula” to predict subsequent text in our imagined formula to get a more accurate
(the LLM’s output). The LLM training process result. This process is repeated with millions
fundamentally involves the model practicing this of documents, with each iteration refining the
process, learning from mistakes, and adjusting model’s parameters. Each complete pass through
its parameters (the variables in our analogy) to the training data, or “epoch,” further refines
produce better predictions in the future, thereby the model’s performance. Through repeated
learning and internalizing the language patterns exposure to diverse language patterns, the model
from its training data. progressively improves its ability to understand
and generate text.
LLMs are trained using vast data sets of text,
which often include books, articles, websites, and Throughout this training phase, various
other written material. This text provides the raw techniques are employed to optimize the model’s
material from which the model learns language performance. This includes adjusting the
as well as the information and, potentially, the model’s architecture, fine-tuning parameters, and
underlying logic patterns contained in these data employing strategies to handle overfitting (where
sets. Accordingly, the quality of the training set a model becomes too tailored to the training data
is critical for the model’s performance. Before and loses its generalization ability).
training, this data undergoes preprocessing which
involves cleaning and organizing the data, such as This process enables LLMs to learn, not just the
removing irrelevant information, correcting errors ability to generate text, but understand and learn
(where feasible), and standardizing formats. the intricacies of human language, particularly
the components of syntax and semantics. LLMs
The training process begins by inputting a portion observe how words are commonly ordered, the
of a document into the LLM. The model then uses structure of sentences, and the use of various
its current parameters (weights and biases) to grammatical elements, allowing the LLMs to
predict the next part of the document. After making generate text that is not only grammatically
a prediction, the model compares its output with correct but also stylistically consistent with the
the actual text. If there is a discrepancy, the model input they receive. LLMs learn semantics by
FOUNDATIONS OF AI | LOZANOSMITH.COM VOLUME 1 | 11