OCDevel
WalkPodcast
The logo for OCDevel Claude Code features clean, modern typography paired with minimalist developer-centric iconography representing the Claude command-line interface.
OCDevel Claude Code Podcast
The OCDevel Claude Code Podcast is a technical show for software developers using Anthropic's Claude Code CLI and developer tools in production. Over a structured 30-episode series, the show teaches you how to move from running a single manual terminal session to orchestrating fully automated pipelines. Learn how to configure CLAUDE.md, manage permissions in settings.json, build custom slash commands, connect MCP servers, and set up autonomous review-and-fix loops. Subscribe to learn how to build a system where Claude safely implements features, runs tests, and deploys to production for you.
CTA
Generated with OCDevel PodcasterMade with OCDevel Podcaster
This show was made with OCDevel Podcaster — turn any topic or text into an AI-narrated podcast episode that drops right into your feed.Turn any topic into an AI-narrated episode in your feed.Create your own →Create your own →

Context windows and CLAUDE.md hierarchies: why long sessions go dumb, and how to keep them sharp

10h ago

Two hours into a session, Claude starts re-reading files and forgetting the conventions you set at the start. That's a full context window, not a smarter assistant. Learn to read the /context meter, when to /compact versus /clear, how to push noisy work into a subagent, and how to structure a multi-file CLAUDE.md hierarchy so the right instructions load at the right depth instead of silently eating your token budget.

Show Notes

The context window is the scarce resource Claude Code thinks inside of, and managing it turns out to be the same skill as managing your CLAUDE.md files. This episode covers both halves and the one idea underneath them: a fuller window is a slower, more forgetful, more expensive Claude, not a smarter one. Anthropic frames the whole craft in Effective context engineering for AI agents as finding the smallest set of high-signal tokens that does the job, and we build practical habits around that.

First, the mechanics. What's already loaded before you type (system prompt, tools, skills, MCP servers, and your CLAUDE.md), and why long sessions degrade: context rot, the attention budget, the quadratic cost of attention, and lost in the middle. The context-window docs ship an interactive walkthrough and the load order we use throughout.

Then the workflow. Reading the /context meter (including the autocompact buffer it reserves), /compact with focus instructions, what survives a compaction versus what silently vanishes, and the /clear-versus-/compact-versus-fresh-session decision rule. Plus offloading noisy work to a subagent (the docs' own example reads 6,100 tokens and returns 400), and quick memory with the # shortcut and /memory. Sources: Manage costs effectively.

The second half is the memory hierarchy: managed policy, user, project, and local CLAUDE.md files, how they concatenate rather than override, and how Claude discovers them by walking up the directory tree at launch and loading nested subdirectory files on demand. We cover @-imports (and why they don't save context), the monorepo pattern with path-scoped rules in .claude/rules/, and what belongs in CLAUDE.md versus a skill or a hook.

The pitfall: a bloated, stale CLAUDE.md silently eats your window on every turn and you can't see it in the terminal. How to catch it with /context and /memory, and how to fix it by moving instructions to where they load on demand. Earlier episodes referenced: subagents, skills, hooks, and MCP servers.

Transcript

The session that got dumb

Here's a session every Claude Code user eventually has. You start clean, and the first hour is great. Claude's sharp. It remembers the file you touched twenty minutes ago, it threads a change through three modules without being told twice, it's a pleasure. Then somewhere past the two-hour mark it goes a little dumb. It re-reads a file it already read. It forgets the convention you set at the very start of the session. It cheerfully suggests a fix you explicitly rejected an hour ago.

Nothing crashed. No error in the terminal. It just quietly got worse. What happened is the whole subject of this episode. You filled up the context window, and a full window is not a smarter Claude. It's a slower, more forgetful, more expensive one.

So today is two skills that turn out to be the same skill seen from two angles. Managing the context window, the scarce resource Claude thinks inside of. And managing your CLAUDE.md files, the memory hierarchy you load into that window on every single turn, whether you use it or not. Anthropic puts the goal plainly in their writeup on effective context engineering: context is a finite resource with diminishing returns, and the job is to find the smallest set of high-signal tokens that gets you the outcome you want. Everything today serves that one sentence.

What's already in the window before you type

Let's start by looking at what's actually in there, because most of it is invisible. The context window is the total span of tokens the model can see at once. On the older models that's two hundred thousand tokens. On the newer ones, Opus four-six and Sonnet four-six as of this recording, it's a full million, and on the Max, Team, and Enterprise plans the Opus upgrade to a million happens automatically with nothing to configure. The exact list of million-token models moves fast, so don't trust my numbers, trust the meter, and I'll show you the meter in a minute.

Here's the part people miss. Before you type a single character, that window is already partly full. Claude Code ships an interactive walkthrough in its context-window docs that lays out the load order, and it's worth knowing by heart. First the system prompt, the core instructions you never see, a few thousand tokens. Then Claude's own auto-memory notes from past sessions. Then a little environment info, your working directory, your shell, your git branch and recent commits.

Then the tools. Your skills get a one-line description each so Claude knows they exist, with the full body deferred until you actually invoke one. Your MCP servers, the same idea now, just the tool names up front and the heavy schemas loaded on demand. Then your global preferences from your home-directory CLAUDE.md. Then your project's CLAUDE.md, which the docs call the most important file you can create. And only after all of that, finally, your actual prompt, which is usually tiny compared to everything stacked in front of it.

So when you open a session and ask one small question, you're not starting from zero. You're starting from a baseline that you, partly, control. Hold that thought, because the second half of the episode is entirely about keeping that baseline small.

Why a full window makes Claude worse

Now, why does filling it up actually hurt? It's tempting to think more context is just more memory, strictly better. It isn't, and Anthropic's context-engineering post names the reasons. The first is context rot. As the number of tokens in the window goes up, the model's ability to accurately recall any specific one of them goes down. More haystack, same needle, worse odds.

The second is the attention budget. A transformer has something like a limited working memory, and every token you add spends from it. This isn't a metaphor they're being loose with, it's architectural. Attention works by having every token look at every other token, so the cost grows with the square of the length. Double the context and you've much more than doubled the work the model does to stay coherent. Longer sequences are just inherently harder to reason over accurately.

The third has a name you'll want, because it'll explain your bad sessions for the rest of your career. Lost in the middle. Models attend well to the beginning of a long context and well to the end, and they lose things buried in the middle. So that convention you set at the very start, the one Claude forgot two hours in? It didn't get deleted. It got buried in the soft middle of a hundred-and-eighty-thousand-token transcript where the model's attention is thinnest.

And then there's money and time, which compound all of it. You pay to carry the entire context on every single turn. A bloated window isn't a one-time cost, it's a tax on every message you send for the rest of the session. Prompt caching softens the stable part of that, the system prompt and your CLAUDE.md, but the moment you edit that stable prefix you've invalidated the cache and you're paying full freight again. For a sense of scale, Claude Code's cost docs put a typical engineer somewhere around thirteen dollars of active-use a day. Sloppy context management is a real line item, not a rounding error.

The meter: read it before you guess

So here's the first concrete move, and it's the one I want you to build a reflex around. Before you do anything big, run the context command. You type a slash and the word context, and Claude Code draws you a colored grid and an itemized breakdown of exactly where your tokens are going, with the count and the percentage for each category.

Let me walk a real readout, because once you've read one you'll read them forever. System prompt, a few thousand tokens. System tools, a bit more. MCP tools split into loaded and deferred. Then memory files, broken out per file, so you'll literally see your CLAUDE.md listed at, say, seventeen hundred tokens, and your auto-memory next to it. Skills, usually tiny. Then messages, your conversation history, which is almost always the biggest and fastest-growing line on the board. Then free space. And one line that confuses everyone the first time, the autocompact buffer, around thirty-three thousand tokens held in reserve.

That buffer is headroom Claude keeps so that when it has to compact mid-thought, it's got room to finish the sentence it's writing. It's not lost, it's escrow. Earlier builds reserved less, more like thirteen thousand, so again, read your own meter rather than memorizing mine. The single most useful thing about this command is that it prints each memory file's token cost by name, which makes it the tool you'll use later to catch a bloated CLAUDE.md red-handed. There's a companion, the usage command, that shows session cost and even attributes spend to individual skills, subagents, and MCP servers, and you can pin the context meter right into your status line so it's always in the corner of your eye.

Compact: summarize and keep going

When the window fills and you want to keep going on the same work, you compact. You run slash compact, and Claude replaces the verbatim back-and-forth with a structured summary, then continues from there. You'll see a short "conversation compacted" note and that's it.

What survives matters, so know the deal you're making. Compaction keeps your requests and your intent, the key technical concepts, the files you examined and changed with the important snippets, the errors you hit and how you fixed them, and the pending to-dos. What it drops is the bulk, the full tool outputs and the intermediate reasoning. So Claude will remember that it refactored the auth module and why, but it won't have the exact four hundred lines of the old file it read along the way. In the docs' own simulation, a compact near the end of a long session collapsed the transcript to roughly a tenth of its former size. That's the trade, an order of magnitude of space back, in exchange for losing the verbatim detail.

Here's the copyable trick most people never find. You can steer the summary. Run slash compact followed by a focus instruction, something like, focus on the auth refactor, or, focus on code samples and API usage, and Claude weights the summary toward what you said matters. Even better, you can make that standing policy. Drop a short compact-instructions section into your CLAUDE.md telling Claude what to preserve when it compacts, and because your CLAUDE.md gets re-read from disk after every compaction, those instructions survive the very event they're about. That's a genuinely nice loop once you see it.

Auto-compaction happens to you

Now, you don't actually have to remember to do this, because Claude Code will compact on its own when you get close to the ceiling. Practitioners peg the trigger at around ninety-five percent of the window, so near a hundred and ninety thousand tokens on a two-hundred-thousand window. That reserved buffer we saw in the meter is exactly what guarantees there's room to do it gracefully when it fires. Some builds even let you move the trigger earlier with an environment variable override, though that one's community-documented rather than official, so verify before you lean on it.

A quick word on names, because the tool moves and the vocabulary hasn't fully settled. The docs say auto-compaction. You'll see people online say microcompact for smaller incremental trims. Don't get attached to the label. The behavior is what matters: near the ceiling, Claude will silently summarize to make room, and you should know it's coming rather than be surprised by it. The healthier habit is to compact at a natural breakpoint you chose, not to let the auto-trigger bail you out at ninety-five percent when you're least ready for it.

What actually survives, and the gotcha that bites everyone

This is the part to tattoo somewhere, because it's where people lose work without noticing. The context-window docs publish a table of what survives a compaction, and the pattern is the thing to internalize. Your project-root CLAUDE.md and your unscoped rules survive, because they get re-injected fresh from disk every time. Your auto-memory, same, re-read from disk. The system prompt was never in the message history to begin with, so it's untouched.

But here's what does not survive. Any rule that's path-scoped, that only loads when you touch a matching file, is gone after a compaction until you read a matching file again. A nested CLAUDE.md down in a subdirectory, gone after compaction until you read a file in that subtree again. And the big one, anything you told Claude only in the chat, only by typing it, is the most fragile thing in the room, because it lives nowhere but the transcript that just got summarized away.

So the practical rule writes itself. If an instruction has to outlive a compaction, and most of your important ones do, it does not belong in a throwaway chat message and it does not belong only in some deep nested file. It belongs in your project-root CLAUDE.md, where it gets re-injected on every compaction, every single time. That one habit will save you more silent regressions than any other thing in this episode.

Clear, compact, or start fresh

Compaction's quieter cousin is the clear command, slash clear, and it does something blunter. It wipes the conversation history entirely. Empty transcript, fresh start, but your CLAUDE.md and your setup all reload. The cost docs are direct about why you'd want this: stale context wastes tokens on every message after it, so when you're done with a thing, get rid of it.

So here's the decision rule I want you to carry. Clear when the next task shares nothing with the last one. You just fixed an auth bug, now you're writing a database migration, those have nothing to say to each other, so clear and start the migration with a clean slate. Compact when you're continuing the same thread of work but the window's filling and you need the gist carried forward, and use a focus instruction when you do. And spin up a genuinely new session for a separate workstream you might want to come back to later. One pro touch: rename a session before you clear it, so you can find it again and resume it later if it turns out you weren't as done as you thought. Resume and continue, the way you reload a whole prior session, we covered in an earlier episode. Just remember they're the exact opposite of clear, they pour an entire old context back in, so reach for them deliberately.

Push the noise into a subagent

There's a move that beats all of this, though, which is not filling the main window in the first place. Remember the subagent from a few episodes back, the fresh Claude instance you hand a noisy, self-contained job to. This is where it pays off for context, not just for tidiness.

The docs' own numbers make the case better than I can. A research subagent in their example read sixty-one hundred tokens of files to answer a question, and handed back a four-hundred-token result. Sixty-one hundred tokens of reading happened, and your main session paid four hundred. That's the whole game. The subagent gets its own fresh window, does the messy grepping and file-reading over there, and returns only the conclusion. When you've got a verbose investigation, "go read these twelve files and tell me which one defines the rate limiter," that is a subagent's job precisely because the twelve files never enter your window.

Same spirit, smaller scale, applies to a couple of other patterns the cost docs point at. A pre-tool hook that greps a ten-thousand-line log down to the handful of matching lines before Claude ever sees it. A code-intelligence plugin so "go to definition" replaces reading five files to find one function. And just writing specific prompts, "add input validation to the login function in auth-dot-ts" instead of "improve this codebase," so Claude doesn't go scan the whole tree to figure out what you meant. Every one of those is the same idea: keep the noise out of the window that you pay for on every turn.

Quick memory: the pound shortcut and slash memory

Two small tools before we switch sides. If, mid-session, you want to jot something down for Claude to remember, start a line with the pound sign, the hash, and type your note, something like, the API tests need a local Redis instance running. Claude saves it. You can tell it to put that in your CLAUDE.md specifically, or let it go to auto-memory. Verify the exact keybinding in your build, but the pound-to-remember flow is the quick path.

And the slash-memory command is your inspector. It lists every memory file loaded in the current session, your CLAUDE.md, any local file, any rules files, lets you toggle auto-memory on or off, and opens any of them in your editor. When you're not sure what's actually loaded right now, that's the command that tells you the truth instead of your assumptions.

The loop, in one breath

So that's the first half as a workflow you can run. Scope your tasks small. Run the context command at the start of anything big to see your baseline. Clear between unrelated tasks. Compact, with a focus instruction, at natural breakpoints in related work. Push noisy research and log-reading into subagents. Watch the meter instead of waiting for the ninety-five-percent auto-trigger to rescue you. And keep the always-loaded stuff lean so that baseline starts low in the first place. Which is the perfect handoff, because the biggest piece of always-loaded stuff is your CLAUDE.md.

The memory hierarchy: many files, one window

You already met CLAUDE.md back in the very first episode. What you maybe didn't see is that it's not one file, it's a hierarchy, and Claude Code stitches the whole thing together every time it starts. All of this is from the memory docs, and the load order runs from broadest to most specific.

At the top, managed policy memory. This is the org-wide file your IT or platform team can drop in a system location, and the important thing about it is that an individual can't exclude it. If your company sets it, it applies. Below that, your user memory, the CLAUDE.md in your home claude directory, your personal preferences across every project you touch. Below that, the project memory, the CLAUDE.md checked into the repo root and shared with your team, the one that does most of the work. And finally, a local file, CLAUDE-dot-local-dot-md, for your private per-project notes that you keep out of git.

A quick status note on that last one, because you'll hear conflicting things. The local file isn't deprecated, the docs still document it and it still loads. But it's de-emphasized, partly because a gitignored file only exists in the one worktree where you made it. The cleaner modern move, if you want personal instructions to follow you across worktrees, is to import a file from your home directory into your CLAUDE.md instead. Some practitioners will flatly call the local file dead in favor of imports and user memory. The docs don't go that far, so I won't either, but lean toward imports.

Concatenated, not overridden

Here's the subtlety that trips people up. These files don't override each other, they're all concatenated into context together. So it's not that the project file replaces the user file. They're both in there. What gives you override-like behavior is ordering: the more specific files load later, and when two instructions actually conflict, the later, more specific one tends to win.

The docs' example is perfect. Your personal user CLAUDE.md says use two-space indentation. The project CLAUDE.md says four-space. The project wins, because it's more specific to the work at hand and it loads after yours. The one hard rule, not a tendency but a rule, is that managed policy file at the top. It can't be excluded, so org instructions always apply no matter what your local files say.

How Claude finds them: up the tree, and on demand

Discovery works two different ways, and the difference is the whole reason nested files are cheap. The first way is upward recursion, and it happens fully at launch. When you start Claude Code, it walks up the directory tree from wherever you launched, checking each directory for a CLAUDE.md on the way to the repo root, and loads all of them. Start down in packages-slash-web-slash-components and it'll pull the components file, the web-package file, and the root file, all at startup. Closest-to-you loads last, so your most specific instructions sit nearest the end where the model's attention is strong.

The second way is on-demand, and it's the clever part. Claude also discovers CLAUDE.md files in subdirectories below you, but it does not load those at launch. It loads a subdirectory's file only when it actually reads a file in that subdirectory. So a giant monorepo with thirty package-level CLAUDE.md files doesn't cost you thirty files' worth of tokens up front. You only pay for the ones whose code you actually touch. And now that earlier gotcha clicks into place: those nested files load on demand, which is exactly why they're the ones that vanish after a compaction until you read into that subtree again.

Imports, and the trap inside them

You can also pull files into a CLAUDE.md explicitly with an import, the at-sign followed by a path. You'll see things like, see at-README for an overview and at-package-dot-json for the available commands, or a line that points at a separate git-workflow doc. Relative paths resolve from the file doing the importing, not from your working directory, and you can import from your home directory too. Imports can nest, up to four hops deep.

But here's the trap, and it's a common misconception, so let me be blunt. Imports do not save you context. An imported file still loads into the window at launch, fully, just like it was pasted inline. Splitting a giant CLAUDE.md into ten neat at-imports makes it more organized, more readable, easier to maintain. It does absolutely nothing for your token budget. If a chunk of instructions needs to leave the always-loaded set to save space, an import is not how you do it. We'll get to how you actually do it in a second.

The monorepo pattern, and path-scoped rules

So how do you actually keep instructions out of the base window until they're relevant? Two tools. The first you've already seen, nested CLAUDE.md files that load on demand. A root file for repo-wide conventions, then a web-package file with your Next.js and TypeScript rules, an api-package file with your Postgres and endpoint rules, and each one only enters context when Claude reads code in that package. That's the right-instructions-at-the-right-depth pattern, and for a monorepo it's most of the battle.

The second tool is the modern answer, and it's worth adopting: a rules directory, dot-claude-slash-rules, with topic files like code-style, testing, security. A plain rules file loads at launch like the rest of your memory. But a path-scoped rule, one with a little bit of YAML at the top declaring which file globs it applies to, loads only when Claude touches a matching file. So a rule that says "every API endpoint must validate its input" can be set to load only when Claude opens something under your api source glob. The rest of the time it costs you nothing. That is how you get a big, opinionated set of standards without paying for all of it on every unrelated turn. And in a giant monorepo where you keep inheriting other teams' ancestor files, there's a settings option to exclude specific CLAUDE.md paths by glob, so you're not carrying instructions for code you never touch.

What belongs in CLAUDE.md, and what doesn't

Which brings us to the judgment call: what actually goes in this file? The docs give a size target, and it's stricter than most people's files. Aim for under two hundred lines per CLAUDE.md. Their reasoning is exactly our whole episode, longer files eat more context and, past a point, the model adheres to them less. A wall of two hundred rules isn't followed more carefully than twenty, it's followed less.

What belongs in there is the stuff Claude should hold in every single session. Your build and test commands. Your conventions. The project layout. The architectural decisions and the always-do-this rules. And be specific, because specificity is what makes them work. "Use two-space indentation," not "format code properly." "Run the test command before committing," not "test your changes." "API handlers live in this directory," not "keep files organized." Vague instructions cost the same tokens and buy you nothing.

What does not belong is anything procedural or anything that only matters for one corner of the codebase. A multi-step workflow you run occasionally, that's a skill, the kind we built a few episodes back, loaded on demand when the task matches. Something that only applies to your API layer, that's a path-scoped rule. The docs even list "move instructions out of CLAUDE.md and into skills" as a named cost-reduction strategy. The instinct to put everything in CLAUDE.md because it's always loaded is exactly backwards. Always-loaded is the expensive shelf, not the free one.

One honest caveat about what CLAUDE.md even is. It's delivered as a user message after the system prompt, not as enforced configuration. It's strong context, but it's context, Claude can still stray from it. So when you need a real guarantee, block this command, run that check before every commit, you don't write a politer CLAUDE.md line. You reach for a hook, the way we did in the hooks episode. CLAUDE.md is for what Claude should know. Hooks are for what must happen.

The pitfall: the file you can't see eating the window you can't see

Now the pitfall, the one you will actually hit, because it's sneaky in a specific way. A bloated, stale CLAUDE.md silently eats your window on every turn and quietly degrades the session, and you won't catch it in the terminal, because CLAUDE.md is invisible there. You never see it scroll by. It just sits in the context, costing you.

And it's worse than it sounds, because unlike auto-memory, which is capped, CLAUDE.md loads in full no matter how long it is. So a six-hundred-line CLAUDE.md, plus a couple of nested ones that loaded when you opened some files, plus a chatty MCP server you forgot you connected, can quietly burn tens of thousands of tokens before your first prompt. That pushes you toward lost-in-the-middle, triggers auto-compaction earlier, and taxes every message you send. The classic version of this is the file that's grown for six months, half its rules contradict the other half, and when two rules conflict the docs warn Claude may just pick one arbitrarily. So you're paying for instructions that are actively making it less predictable.

Here's how you catch it, and it ties the whole episode together. Run the context command. It prints each memory file by name with its token cost, so a fat CLAUDE.md shows up right there as a number you can't argue with. Run the slash-memory command to see exactly which files loaded and whether something stale is still in the set. The docs even have a troubleshooting entry titled, literally, my CLAUDE.md is too large, which tells you how common this is.

And the fix is everything we've covered, now pointed at one file. Trim CLAUDE.md down to essentials, under two hundred lines. Move the workflow-specific stuff into skills that load on demand. Move the area-specific stuff into path-scoped rules that load only when their files are touched. Delete the contradictions. And remember the one that fools people, splitting it into at-imports does not help, the imported text still loads in full, it has to actually leave the always-loaded set to save you anything. Recognize it with the meter, fix it by moving instructions to where they load on demand. That's the loop.

The one idea underneath both halves

So that's both halves, and they really are one idea. The context window is the scarce thing. Everything you load into it, your history, your tool surfaces, and above all your CLAUDE.md, spends from a finite budget on every single turn, and a fuller window is a duller, slower, costlier Claude, not a smarter one. The whole craft is Anthropic's one sentence, the smallest set of high-signal tokens that gets the job done. The context command is how you see the budget. Compact, clear, and subagents are how you spend less of it as you work. And a lean, specific CLAUDE.md, with the rest pushed out to skills and path-scoped rules that load only when they're needed, is how you keep the baseline low before you even start. Get those two habits and your two-hour sessions stop going dumb.