“You don't put your principal architect on every drywall screw. Match the worker to the task and the bill takes care of itself.”
Route teammates to a cheaper model
Keep the lead on Opus 4.7, since it has to reason about architecture and coordination. Push the teammates, which do focused execution, onto Sonnet 4.6. Same quality on well-scoped tasks, a fraction of the spend.
# Lead runs on whatever model you're using (Opus 4.7 for complex work).
# Teammates and subagents use this model instead:
export CLAUDE_CODE_SUBAGENT_MODEL="claude-sonnet-4-6"
# (the "sonnet" alias also resolves to Sonnet 4.6)Effort and thinking
Two more knobs affect how hard, and how expensively, agents work. Set the effort level explicitly, and on the 4.6 models you can revert adaptive thinking to a fixed budget (this has no effect on Opus 4.7).
export CLAUDE_CODE_EFFORT_LEVEL=high # or a level name / "auto"
export CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1 # no effect on Opus 4.7Capping spend: the honest version
The source recommends a `--max-budget-usd` flag to cap a session's total cost. That flag isn't in Claude Code's documentation, so don't count on it. The levers that do exist: route teammates to a cheaper model (above), optionally set a per-workspace output/token ceiling on chatty work, and rely on your plan and API usage tier as the hard limit. Then watch the usage dashboard while a team runs.
In one line each
- Keep the lead on Opus 4.7; route teammates to Sonnet 4.6 via CLAUDE_CODE_SUBAGENT_MODEL (or the 'sonnet' alias).
- Set effort with CLAUDE_CODE_EFFORT_LEVEL (not the made-up DEFAULT_EFFORT); adaptive-thinking override is 4.6-only.
- There is no documented --max-budget-usd flag. Real spend control is model routing, optional token ceilings, and your usage tier.
Where to go next