“Stop asking the model to do arithmetic in its head. Give it a calculator and let it press the buttons.”
What "function calling" actually means
You describe a set of functions to the model — their names, what they do, and their parameters. When the model decides it needs one, instead of answering it emits a structured request: "call get_order_status with order_id=4471." Your code runs the real function, returns the result, and the model continues with that result in hand.
The model never runs anything itself. It only proposes a call in a structured form; your code decides whether and how to execute it. That boundary is the whole safety story, and we lean on it hard in the security course.
Why tools fix the model's worst weaknesses
Recall the things models are bad at: arithmetic, current facts, and anything with a deterministic right answer. Tools fix all three by handing the work to systems that are good at them.
- Calculator / code interpreter — the model stops confabulating numbers and computes them.
- Database / API query — the model stops hallucinating data and fetches it.
- Search — the model stops relying on stale training knowledge and looks things up.
- Validators (type checker, linter, schema) — the model can check its own output against ground truth.
Designing good tools
The model uses a tool the way a new hire uses an unfamiliar API: from the name and description alone. So tool design is API design for a reader who won't read the source. Names must be unambiguous, descriptions must say when to use the tool (not just what it does), and parameters should be hard to misuse.
Return errors the model can act on. "Error 400" is useless; "order_id must be 6 digits; you passed 4471 (4 digits)" lets the model correct itself and retry. Treat the model as the consumer of your error messages, and write them for that reader.
Read tools and write tools are different animals
A tool that reads (look up an order, search docs) is low-risk: the worst case is a wrong answer. A tool that writes (send an email, charge a card, delete a record) can cause real-world damage the model can't undo. These deserve completely different treatment.
Make write tools idempotent where you can, so a retry is safe. Put a human approval step in front of high-stakes actions. Give the model a dry-run mode so it can test before committing. And log every call. The model will eventually call a write tool with bad arguments; design so that's survivable, not catastrophic.
In one line each
- Function calling: the model proposes a structured call; your code executes the real tool and returns the result.
- Tools fix the model's worst weaknesses — math, fresh facts, side effects — by delegating to systems that are good at them.
- Tool design is API design for a reader who won't read the source: clear names, instructive errors, compact output.
- Read tools are low-risk; write tools need idempotency, dry-runs, bounds, approval gates, and logging.
Where to go next