Page 22 - AI Vol 1: Foundations of AI
P. 22

02       D. GENERATIVE AI: BEYOND TEXT




            Our focus to this point has been on LLMs, as these   has become prevalent in security systems. While
            are  currently  the  forefront  of AI  developments   incredibly  powerful, vision  models  also  bring

            and  the  most  likely  to  impact  public  agencies.   challenges and ethical considerations, especially
            However,  it  is  important  to  note  that AI  is  not   concerning  privacy  and  bias.  The  way  these
            limited  to  text-based  interactions  alone.  AI    models are trained and used can greatly influence
            encompasses a broad spectrum of models, each         their effectiveness and societal impact.
            specializing in different types of data processing
            and  generation.  While  LLMs  primarily  deal       2. IMAGE/VOICE/MUSIC/VIDEO
            with text, other models excel  in understanding      GENERATION MODELS

            and creating visual, auditory, and other types of    Beyond  analyzing  and  classifying  data, AI  has
            content.  These  models  use  similar  underlying    made remarkable strides in the realm of generation
            machine  learning  principles,  as we have           – creating  new content  in  various forms such
            previously discussed, but are tailored to interpret   as  images,  voice,  music,  and  videos.  These
            and generate specific types of data. This section    generation models use advanced machine learning
            briefly explores several key types of AI models      techniques, often building upon foundations laid
            beyond LLMs.                                         by GANs (Generative Adversarial Networks) and

                                                                 VAEs  (Variational  Autoencoders),  to  produce
            1. VISION MODELS                                     novel and sometimes startlingly realistic outputs.

            Vision models are a class of AI that specialize in
            interpreting  and  processing  visual  information.   In  image  generation,  models  like  GANs  have
            At their core, these models use deep learning        been trained to create everything from art to
            techniques to analyze and understand images and      photorealistic images. These models learn from

            videos. This analysis can range from recognizing     a vast array of existing images, understanding
            objects in a photograph to understanding complex     intricate patterns and styles, and then generate
            scenes in a video. As with LLMs, vision models       new images that can range from novel artworks
            are trained on large datasets of images and videos.   to realistic depictions of objects or scenes that
            The  quality  and  diversity  of  this  training  data   never existed.
            significantly  impact  the  model's  accuracy  and
            ability to generalize to new visual inputs.          Voice generation  models have evolved to not
                                                                 only mimic  human  speech but also to capture
            One of the fundamental tasks of vision models is     the nuances of emotion, tone, and accent. These

            image classification, where the model identifies     models  find  extensive  use  in  virtual  assistants,
            the main subject of an image or other components     audiobook narration, and even in creating realistic
            of an image. A significant application of vision     dialogues  for  movies  or  video  games.  They
            models is in the field of facial recognition, which   work  by  analyzing  patterns  in  speech  and  then





     22   |    VOLUME  1                                                        FOUNDATIONS OF AI  |  LOZANOSMITH.COM
   17   18   19   20   21   22   23   24   25   26