A thin model inside a thick system

“A power tool needs a workbench. The model is the blade; the system is everything that keeps your fingers attached.”

Thin model, thick system

A demo is a model with a prompt. A product is a model wrapped in retrieval, tools, guardrails, fallbacks, logging, and evals. The model might be 5% of the code and 90% of the magic, but the other 95% of the code is what makes the magic reliable enough to charge money for.

The single most useful mental shift for building with AI is this: stop trying to make the model better, and start making the system around it better. You usually can't retrain the model. You can always improve what you feed it, what you let it touch, and how you check its work.

The four jobs of the system

Everything you build around the model is doing one of four jobs: getting the right information in (retrieval), giving the model real capabilities (tools), keeping bad input and output from causing harm (guardrails), and knowing whether any of it works (evals). The rest of this course is one chapter per job, plus how to ship and operate the whole thing.

The same four levers from the fundamentals course, now seen from the builder's side: each is a part of the system you write, not the model you call.

Probabilistic core, deterministic shell

The model is probabilistic: the same input can produce different output, and it has no notion of "correct." Your system should be as deterministic as possible everywhere else. Parse the model's output into a strict schema. Validate it. If it fails, retry or fall back. Don't pass unvalidated model output downstream and hope.

The pattern that ships: treat every model call like a network call to an unreliable third party. It can be slow, wrong, malformed, or down. Wrap it accordingly (timeouts, retries, schema validation, a fallback path) exactly as you would any flaky dependency.

Start with the cheapest thing that could work

There's a natural order of escalation, cheapest first. Try a better prompt. Then add examples. Then add retrieval. Then add tools. Only after all of those fail should you consider fine-tuning: it's the most expensive lever and the one most teams reach for too early.

Prompt: minutes, free. Always try this first.
Few-shot examples: minutes, nearly free.
Retrieval (RAG): days, moderate. The right answer for "it doesn't know our stuff."
Tools: days, moderate. The right answer for "it can't do our stuff."
Fine-tuning: weeks, expensive. The last resort, not the first move.

In one line each

The model is a component, not the product. The system around it is where reliability comes from.
Keep the probabilistic part small and the deterministic shell large: validate every output against a schema.
Treat model calls like flaky network calls: timeouts, retries, fallbacks.
Escalate from cheapest to most expensive: prompt → examples → retrieval → tools → fine-tuning.

Where to go next

Chapter 2: Prompts as software