Run it cheap: model routing and spend

“You don't put your principal architect on every drywall screw. Match the worker to the task and the bill takes care of itself.”

Match model to task: cheap models for simple steps, strong ones where it counts.

Route teammates to a cheaper model

Keep the lead on Opus 4.7, since it has to reason about architecture and coordination. Push the teammates, which do focused execution, onto Sonnet 4.6. Same quality on well-scoped tasks, a fraction of the spend.

bashCopy prompt

# Lead runs on whatever model you're using (Opus 4.7 for complex work).
# Teammates and subagents use this model instead:
export CLAUDE_CODE_SUBAGENT_MODEL="claude-sonnet-4-6"
# (the "sonnet" alias also resolves to Sonnet 4.6)

Effort and thinking

Two more knobs affect how hard, and how expensively, agents work. Set the effort level explicitly, and on the 4.6 models you can revert adaptive thinking to a fixed budget (this has no effect on Opus 4.7).

bashCopy prompt

export CLAUDE_CODE_EFFORT_LEVEL=high          # or a level name / "auto"
export CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1  # no effect on Opus 4.7

Capping spend: the honest version

The source recommends a `--max-budget-usd` flag to cap a session's total cost. That flag isn't in Claude Code's documentation, so don't count on it. The levers that do exist: route teammates to a cheaper model (above), optionally set a per-workspace output/token ceiling on chatty work, and rely on your plan and API usage tier as the hard limit. Then watch the usage dashboard while a team runs.

In one line each

Keep the lead on Opus 4.7; route teammates to Sonnet 4.6 via CLAUDE_CODE_SUBAGENT_MODEL (or the 'sonnet' alias).
Set effort with CLAUDE_CODE_EFFORT_LEVEL (not the made-up DEFAULT_EFFORT); adaptive-thinking override is 4.6-only.
There is no documented --max-budget-usd flag. Real spend control is model routing, optional token ceilings, and your usage tier.

Where to go next

Chapter 5: Manage and choose