Skip to content
Chapter 01 · 10 min

The shape of an AI feature

Most failed AI products fail in the same way: they treat the model as the product. The model is a component. The product is the system you build around it. This chapter is about that system — where the model fits, and where it must not.

A thin model inside a thick systemThree nested rings. The small accent core in the middle is the model. Around it sits a layer of guardrails, retrieval, tools, and evals. The outer ring is your product. Most of the engineering lives outside the model.the modelguardrails · retrieval · tools · evalsyour product

A power tool needs a workbench. The model is the blade; the system is everything that keeps your fingers attached.

Thin model, thick system

A demo is a model with a prompt. A product is a model wrapped in retrieval, tools, guardrails, fallbacks, logging, and evals. The model might be 5% of the code and 90% of the magic — but the other 95% of the code is what makes the magic reliable enough to charge money for.

The single most useful mental shift for building with AI is this: stop trying to make the model better, and start making the system around it better. You usually can't retrain the model. You can always improve what you feed it, what you let it touch, and how you check its work.

The four jobs of the system

Everything you build around the model is doing one of four jobs: getting the right information in (retrieval), giving the model real capabilities (tools), keeping bad input and output from causing harm (guardrails), and knowing whether any of it works (evals). The rest of this course is one chapter per job, plus how to ship and operate the whole thing.

The four levers of a working AI systemA central node labeled "model" with four levers extending in the cardinal directions: prompting, retrieval, tools, and evals. Most production systems pull at least three of these.modelpromptingformat, examplesretrievalgive it the docstoolslet it call codeevalsmeasure it
The same four levers from the fundamentals course, now seen from the builder's side: each is a part of the system you write, not the model you call.

Probabilistic core, deterministic shell

The model is probabilistic: the same input can produce different output, and it has no notion of "correct." Your system should be as deterministic as possible everywhere else. Parse the model's output into a strict schema. Validate it. If it fails, retry or fall back — don't pass unvalidated model output downstream and hope.

The pattern that ships: treat every model call like a network call to an unreliable third party. It can be slow, wrong, malformed, or down. Wrap it accordingly — timeouts, retries, schema validation, a fallback path — exactly as you would any flaky dependency.

Start with the cheapest thing that could work

There's a natural order of escalation, cheapest first. Try a better prompt. Then add examples. Then add retrieval. Then add tools. Only after all of those fail should you consider fine-tuning — it's the most expensive lever and the one most teams reach for too early.

  • Prompt — minutes, free. Always try this first.
  • Few-shot examples — minutes, nearly free.
  • Retrieval (RAG) — days, moderate. The right answer for "it doesn't know our stuff."
  • Tools — days, moderate. The right answer for "it can't do our stuff."
  • Fine-tuning — weeks, expensive. The last resort, not the first move.

In one line each

  • The model is a component, not the product. The system around it is where reliability comes from.
  • Keep the probabilistic part small and the deterministic shell large: validate every output against a schema.
  • Treat model calls like flaky network calls — timeouts, retries, fallbacks.
  • Escalate from cheapest to most expensive: prompt → examples → retrieval → tools → fine-tuning.