Page 14 - AI Vol 1: Foundations of AI
P. 14

these models may not have the deep understanding     Multi-Expert Models, also known as a mixture
            to  excel  at  tasks  related  to  specialized  domain   of experts,  represent  a  newer  approach  in  to
            knowledge. Certain prompting strategies and the      LLMs.  These  models  blend  the  versatility
            application of retrieval-augmented generation may    of  foundational  models  with  the  specialized
            mitigate some of these limitations, though currently   expertise  of  domain-specific  models,  offering  a

            domain-specific  models  tend  to  outperform        unique  and  potentially  more  efficient  solution.
            foundational models in tasks requiring expertise.    The core idea behind multi-expert models is to
            Given  their  “general  use”  nature,  foundational   combine  the  strengths  of  different  specialized
            models are the most popular publicly available       models into a single framework. This is achieved
            models,  used  not  only  in  tools  like  ChatGPT   by integrating multiple sets of parameters, each
            and  Google  Gemini,  but  also  integrated  into  an   representing a distinct area of expertise, within
            increasing number of applications.                   one model. A prominent example is Mixtral by
                                                                 Mistral_AI, a sparse mixture-of-experts network.
            Domain-Specific  Models represent a more             In  simple  terms,  this  model  is  composed  of

            focused  approach  targeting  specific  areas  of    several smaller “expert” models which a user’s
            expertise. Unlike foundational models, domain-       prompt  is  directed  to  by  a  gatekeeping  “router
            specific  models  are  either  fine-tuned  with  or   network.”  One  of  the  significant  advantages  of
            trained  exclusively  on data  from a particular     this approach is efficiency. Mixtral, for instance,
            field,  such  as  law,  healthcare,  or  finance.  This   has a large total parameter count (46.7B) but uses
            specialized  training  allows  them  to  develop     only a fraction of these (12.9B) for each token, as

            a  deeper  understanding  of  sector-specific        only the relevant “expert” needs to conduct the
            languages, concepts, and nuances. For example,       computational work. This results in the processing
            BioMedLM is a LLM that was trained exclusively       speed and cost of a much smaller model, while
            on biomedical  abstracts  and  papers,  resulting    retaining  the  breadth  of  a  larger  one.  Multi-
            in  higher  performance  on  medical  evaluations.   Expert Models are still relatively new, but they
            While these models excel in their specific domain,   could gain more popularity given the previously
            they may not perform as well on general tasks,       described benefits.
            depending on their training structure.


























     14   |    VOLUME  1                                                        FOUNDATIONS OF AI  |  LOZANOSMITH.COM
   9   10   11   12   13   14   15   16   17   18   19