Subagent runs that overspend their context have been a load-bearing failure mode in our stack for a year. The fix has always been a workaround. Have the subagent persist its findings to disk before it runs out of room. Have the parent coordinate two passes instead of one. Read everything back from a file rather than trust the inline response. The workaround works, but it is a workaround. The real problem is that the subagent had no way to know it was running out of room until it already had.
Anthropic shipped Claude Opus 4.7 on April 16. The release added two things that change how subagents should be designed going forward. Task budgets. And a fifth effort level above high.
Task budgets are the thing that matters
A task budget is a token target you set for the entire agentic loop, not a single response. The model gets a running countdown. As the budget runs out, the model wraps up gracefully, instead of hitting a context wall and producing whatever it had at the moment of failure. This is the missing primitive that makes self-stopping subagents possible.
In practice, this means our infra-improver agent, which blew past its expected length on the April 25 run by enough that the synthesis step lost detail, can now be called with an explicit budget. max_tokens was always available, but max_tokens is per-response. The budget is per-task, across however many response cycles the loop produces. The agent designs around it the way a human designs around a deadline.
The implementation pattern is straightforward. When you fan out a subagent call, add a task_budget field with the token count you expect the agent to need plus 20% headroom. Do not pad it more than that. The signal you want is the model knowing it is on a clock, not the model treating the clock as fictional. Bound budgets produce focused outputs. Unbounded budgets produce sprawl.
xhigh effort is for advanced agentic loops, not daily work
The new effort level above high is what Anthropic calls xhigh. It is meant for long-form synthesis, multi-step research, plan-then-execute orchestration. It is not meant for the seventy percent of work that is editing one file and running tests.
The temptation will be to set everything to xhigh because it sounds faster or smarter. It is neither. xhigh costs more compute per token, takes longer to first token, and produces meaningfully better outputs only when the work has the shape of multiple linked decisions rather than one local edit. Set xhigh on Spirit Molecule script-synthesis runs. Set xhigh on the parent infra-improver brief synthesis. Leave it off everywhere else.
What we changed
The TODO entry from this week's brief was: set task budgets on the infra-improver agent, set xhigh on long-form synthesis only, document budget defaults per agent. The defaults we landed on, which any reader can fork, are roughly these. infra-improver gets 80k tokens with high effort on the parallel research fan-out, then xhigh on the synthesis step with another 60k. orient gets 40k with standard effort. wiki-ingest gets 20k with low effort. Subagents that just persist findings to disk get 15k with low effort, because the work is structurally simple and the budget is mostly there to prevent runaway.
The numbers will drift. The pattern is what to copy. Set a budget per agent type, not per call. Let the agent see the countdown. Reserve xhigh for the work that has a synthesis shape.
Why this is on the install list this week
Two reasons. First, the failure mode it addresses is real and we have been mitigating it with disk persistence, which works but is one more layer of plumbing on every multi-step orchestration. Task budgets remove the need for the workaround in the cases where the workaround was just covering for the lack of a budget primitive. The disk-persistence pattern still applies for cases where you need durable artifacts across runs, which is most of them, but the in-loop self-stopping is now the model's job instead of ours.
Second, this is a budget-shaped thing in the more general sense. Setting budgets on subagents is the operational equivalent of writing the brief on Monday and then choosing the one thing to install. Without an explicit constraint, the work expands. With one, the work stays focused. The token-burn-as-productivity narrative covered separately this week is the same pattern at the company level. The right move is the same. Set the budget. Let the system tell the difference.