Skip to content
Learn · Course · Intermediate

Generative AI beyond text

How machines learned to make images, audio, and video.

Seven chapters on generative AI beyond the chatbot — how machines make images, audio, video, and 3D, how the pieces fit together in multimodal models, and the real risks of synthetic media. Each pairs a plain-language explanation with optional dive-deepers.

Written for the curious, for builders, and for anyone working with generative media. No heavy math required; the dive-deepers go further into the research where it helps.

Start chapter 1 →← All courses7 chapters · 73 min total

Chapters

  1. Chapter 01 · 9 min

    Beyond the chat box

    Once you know how to learn the shape of a thing, you can learn the shape of anything — words, pictures, sound.

    Read →
  2. Chapter 02 · 12 min

    How image generation works

    A sculptor doesn't add marble. They start with a rough block and remove everything that isn't the statue.

    Read →
  3. Chapter 03 · 10 min

    Controlling images

    A prompt is shouting an order across a noisy room. Control is putting the blueprint in their hands.

    Read →
  4. Chapter 04 · 10 min

    Audio & music

    Sound is just a wiggling line in time. Teach a machine the shapes of the wiggles, and it can draw new ones.

    Read →
  5. Chapter 05 · 10 min

    Video & 3D

    A flipbook only works if every page agrees with the last. That agreement is the hard part.

    Read →
  6. Chapter 06 · 11 min

    Multimodal models

    Teach two languages in the same classroom and they start finishing each other's sentences.

    Read →
  7. Chapter 07 · 11 min

    Risks & reality

    When anyone can forge a photograph, the question stops being "is it fake?" and becomes "can you prove it's real?"

    Read →
Generative AI beyond text · AI courses · SDEN