Claude Code sessions hit a wall around 300 to 400 thousand tokens. The model's long-context window is technically larger, but the working quality of long-running sessions degrades before you reach the architectural limit. Sessions get fuzzy on details from earlier in the conversation. Tool calls start producing output that contradicts decisions made an hour ago. The session is not broken; it is rotting.
The cause is auto-compaction. Claude Code summarizes the conversation when the context is about to overflow, replacing the bulk of past messages with a synthesized summary so new work has room to land. The summary is faithful enough to preserve most of the thread, but the summarization itself happens late by default, close to the architectural cap, when the choice is "compact now or fail." When that compaction lands in the middle of an active workflow, the model loses fidelity on the exact thing you were working on.
The fix
Set CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000 as an environment variable. This forces compaction earlier than the default, at 400k tokens, well before the rot threshold, when the session has time to summarize cleanly without the pressure of an impending overflow. The summary produced under headroom is materially better than the summary produced under deadline.
Two places to set it. The first is your shell profile (almost always ~/.zshrc on macOS) so the variable is exported on every new shell:
echo 'export CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000' >> ~/.zshrc
The second is ~/.claude/.env, which Claude Code reads directly when it starts. Belt and suspenders, but the redundancy means you cannot accidentally lose the setting by switching shells:
echo 'CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000' >> ~/.claude/.env
Restart your shell. Open a new Claude Code session. That is the entire change.
Why this is a no-downside install
The tradeoff: you get a few percentage points fewer "raw" tokens of usable context per session, because the compaction kicks in earlier. The benefit: the working quality of the session through hour two and beyond is materially better, because the summary that feeds the rest of the work was synthesized with breathing room. For sessions under an hour the change is invisible. For sessions that run multi-hour or that pick up the next morning via session resume, it is the difference between a partner that remembers what you decided and one that has to be re-onboarded every time you ask a deep question.
Boris Cherny published the recommendation on howborisusesclaudecode.com as part of his Opus 4.7 workflow. Boris is a load-bearing voice on Claude Code workflow precisely because he runs sessions long enough to find these edges. If you only take one thing from his site, take this.
How to know it worked
Open a long-running orient or research session, anything that pulls in a lot of files and synthesizes across them. After about ninety minutes of work, ask the model something specific about a decision made in the first thirty minutes. Without the env var set, the answer will be approximately right but missing the precise wording or rationale. With the env var set, the answer should track the early-session detail cleanly. The compaction happened, but it happened with the headroom required to preserve the load-bearing parts of what you said.
This week's brief tagged it [INSTALL NOW] because the install is a single line, the failure mode it prevents is real and underdocumented, and the cost of doing nothing is the slow degradation of every long-running partnership session you run from here forward. There is no reason not to set this tonight.