Page 14 - AI Vol 1: Foundations of AI
P. 14
these models may not have the deep understanding Multi-Expert Models, also known as a mixture
to excel at tasks related to specialized domain of experts, represent a newer approach in to
knowledge. Certain prompting strategies and the LLMs. These models blend the versatility
application of retrieval-augmented generation may of foundational models with the specialized
mitigate some of these limitations, though currently expertise of domain-specific models, offering a
domain-specific models tend to outperform unique and potentially more efficient solution.
foundational models in tasks requiring expertise. The core idea behind multi-expert models is to
Given their “general use” nature, foundational combine the strengths of different specialized
models are the most popular publicly available models into a single framework. This is achieved
models, used not only in tools like ChatGPT by integrating multiple sets of parameters, each
and Google Gemini, but also integrated into an representing a distinct area of expertise, within
increasing number of applications. one model. A prominent example is Mixtral by
Mistral_AI, a sparse mixture-of-experts network.
Domain-Specific Models represent a more In simple terms, this model is composed of
focused approach targeting specific areas of several smaller “expert” models which a user’s
expertise. Unlike foundational models, domain- prompt is directed to by a gatekeeping “router
specific models are either fine-tuned with or network.” One of the significant advantages of
trained exclusively on data from a particular this approach is efficiency. Mixtral, for instance,
field, such as law, healthcare, or finance. This has a large total parameter count (46.7B) but uses
specialized training allows them to develop only a fraction of these (12.9B) for each token, as
a deeper understanding of sector-specific only the relevant “expert” needs to conduct the
languages, concepts, and nuances. For example, computational work. This results in the processing
BioMedLM is a LLM that was trained exclusively speed and cost of a much smaller model, while
on biomedical abstracts and papers, resulting retaining the breadth of a larger one. Multi-
in higher performance on medical evaluations. Expert Models are still relatively new, but they
While these models excel in their specific domain, could gain more popularity given the previously
they may not perform as well on general tasks, described benefits.
depending on their training structure.
14 | VOLUME 1 FOUNDATIONS OF AI | LOZANOSMITH.COM