Page 22 - AI Vol 1: Foundations of AI
P. 22
02 D. GENERATIVE AI: BEYOND TEXT
Our focus to this point has been on LLMs, as these has become prevalent in security systems. While
are currently the forefront of AI developments incredibly powerful, vision models also bring
and the most likely to impact public agencies. challenges and ethical considerations, especially
However, it is important to note that AI is not concerning privacy and bias. The way these
limited to text-based interactions alone. AI models are trained and used can greatly influence
encompasses a broad spectrum of models, each their effectiveness and societal impact.
specializing in different types of data processing
and generation. While LLMs primarily deal 2. IMAGE/VOICE/MUSIC/VIDEO
with text, other models excel in understanding GENERATION MODELS
and creating visual, auditory, and other types of Beyond analyzing and classifying data, AI has
content. These models use similar underlying made remarkable strides in the realm of generation
machine learning principles, as we have – creating new content in various forms such
previously discussed, but are tailored to interpret as images, voice, music, and videos. These
and generate specific types of data. This section generation models use advanced machine learning
briefly explores several key types of AI models techniques, often building upon foundations laid
beyond LLMs. by GANs (Generative Adversarial Networks) and
VAEs (Variational Autoencoders), to produce
1. VISION MODELS novel and sometimes startlingly realistic outputs.
Vision models are a class of AI that specialize in
interpreting and processing visual information. In image generation, models like GANs have
At their core, these models use deep learning been trained to create everything from art to
techniques to analyze and understand images and photorealistic images. These models learn from
videos. This analysis can range from recognizing a vast array of existing images, understanding
objects in a photograph to understanding complex intricate patterns and styles, and then generate
scenes in a video. As with LLMs, vision models new images that can range from novel artworks
are trained on large datasets of images and videos. to realistic depictions of objects or scenes that
The quality and diversity of this training data never existed.
significantly impact the model's accuracy and
ability to generalize to new visual inputs. Voice generation models have evolved to not
only mimic human speech but also to capture
One of the fundamental tasks of vision models is the nuances of emotion, tone, and accent. These
image classification, where the model identifies models find extensive use in virtual assistants,
the main subject of an image or other components audiobook narration, and even in creating realistic
of an image. A significant application of vision dialogues for movies or video games. They
models is in the field of facial recognition, which work by analyzing patterns in speech and then
22 | VOLUME 1 FOUNDATIONS OF AI | LOZANOSMITH.COM