Generative AI beyond text
How machines learned to make images, audio, and video.
Seven chapters on generative AI beyond the chatbot — how machines make images, audio, video, and 3D, how the pieces fit together in multimodal models, and the real risks of synthetic media. Each pairs a plain-language explanation with optional dive-deepers.
Written for the curious, for builders, and for anyone working with generative media. No heavy math required; the dive-deepers go further into the research where it helps.
Chapters
- Chapter 01 · 9 min
Beyond the chat box
“Once you know how to learn the shape of a thing, you can learn the shape of anything — words, pictures, sound.”
Read → - Chapter 02 · 12 min
How image generation works
“A sculptor doesn't add marble. They start with a rough block and remove everything that isn't the statue.”
Read → - Chapter 03 · 10 min
Controlling images
“A prompt is shouting an order across a noisy room. Control is putting the blueprint in their hands.”
Read → - Chapter 04 · 10 min
Audio & music
“Sound is just a wiggling line in time. Teach a machine the shapes of the wiggles, and it can draw new ones.”
Read → - Chapter 05 · 10 min
Video & 3D
“A flipbook only works if every page agrees with the last. That agreement is the hard part.”
Read → - Chapter 06 · 11 min
Multimodal models
“Teach two languages in the same classroom and they start finishing each other's sentences.”
Read → - Chapter 07 · 11 min
Risks & reality
“When anyone can forge a photograph, the question stops being "is it fake?" and becomes "can you prove it's real?"”
Read →