
A subagent is a fresh Claude instance that does a noisy, self-contained job in its own context window and hands back only the summary. Learn to delegate codebase searches and code reviews, restrict each agent's tools and model, and avoid the blank-context pitfall that produces confident wrong answers.
A subagent is a fresh Claude instance that does a self-contained job in its own context window and hands back only a summary. This episode covers using them the Act I way: one-off delegation to keep your main session's context clean.
Why isolation matters. The context window fills fast and Claude's performance degrades as it does. From Anthropic's context-window walkthrough: a research subagent read ~6,100 tokens of files and returned a 420-token result. The reading never touched your main thread. The best-practices guide calls subagents one of the most powerful tools available because context is your fundamental constraint.
The Agent tool. As of Claude Code 2.1.63 the Task tool was renamed to Agent (old Task(...) references still work as aliases). Subagents cannot spawn other subagents, which is why the orchestrator pattern needs agent teams later.
Built-in subagents. Explore (read-only, Haiku, codebase search), Plan (read-only, used in plan mode), general-purpose (all tools, inherits your model), plus helpers statusline-setup and claude-code-guide. Explore and Plan skip CLAUDE.md and git status to stay fast.
Custom subagents. Markdown files with YAML frontmatter in .claude/agents/ (project, checked into git) or ~/.claude/agents/ (user). The fields that matter: name, description (drives automatic delegation), tools (allowlist; omit to inherit all), and model (sonnet/opus/haiku/full ID/inherit). Manage them with the /agents command. See Create custom subagents.
Proactive delegation. Put "use proactively" in the description field so Claude reaches for the agent on its own.
Worked examples. A codebase-search delegation that keeps file reads out of your window, and the official read-only code-reviewer (restricted to Read, Grep, Glob, Bash) that judges your diff with fresh eyes.
Subagent vs skill vs slash command. A skill runs in your main context; a subagent runs isolated and returns a summary; a slash command is a typed entry point. Offload noisy, self-contained work to a subagent.
The pitfall. Subagents start blank, with no conversation history. Vague delegation produces confident, wrong answers. Scope tightly and restate load-bearing constraints. Plus the costs: tokens still get spent, cold-start latency, and no mid-task steering (a subagent can't even ask you a clarifying question).
Last episode we wired up skills, reusable expertise that Claude loads only when the task matches. This episode is about the next rung on the same ladder: subagents, used as one-off delegation to keep your main session's context window clean.
Here's the whole idea in one sentence. A subagent is a fresh Claude instance that does a self-contained chunk of work in its own separate context window, and hands back only a summary. The noisy middle of the task, the dozens of file reads, the log dumps, the test output, the docs it fetched, all of that stays over there. You keep the conclusion.
The official docs put it plainly. Use a subagent when a side task would flood your main conversation with search results, logs, or file contents you won't reference again. The subagent does that work in its own context and returns only the summary. That's the value. Context isolation. Everything else in this episode is detail hanging off that one idea.
Before we go further, one boundary. Today we're using subagents the Act One way: you delegate a single noisy job, you read the summary, you move on. There's a bigger pattern coming later, where a lead agent dispatches whole waves of subagents that coordinate with each other. That's the orchestrator pattern, and it belongs to Act Two. We'll flag it as we go, but we are not teaching it here. One subagent, one job, clean context. That's the rung.
Why context isolation matters at all
To get why this is worth a whole episode, you have to remember the one constraint everything in Claude Code bends around. The context window fills up fast, and performance degrades as it fills. Anthropic's own best-practices guide opens on exactly this point. A single debugging session or codebase exploration can burn tens of thousands of tokens, and as the window fills, Claude starts forgetting earlier instructions and making more mistakes.
So the context window is your scarcest resource. Anything you can do to keep the signal high and the noise low directly improves the quality of the work. And the single biggest source of noise is research: reading lots of files to answer one question, where most of what you read you'll never look at again.
That's the job subagents were built for. The best-practices guide says it directly: since context is your fundamental constraint, subagents are one of the most powerful tools available. They run in separate context windows and report back summaries.
The numbers, because they make it concrete
Let me give you the actual numbers from Anthropic's context-window walkthrough, because they make the savings real. The scenario is simple. You say, use a subagent to research session timeout handling, then fix it.
Claude spins up a subagent with a fresh, separate context window. That subagent reads whatever files it needs to understand how session timeouts work. In the walkthrough, it reads about six thousand one hundred tokens worth of files. Then it writes up what it found and returns. And the result that lands back in your main conversation is four hundred and twenty tokens. Six thousand tokens of reading happened. You got a four hundred token answer.
That gap is the context savings. The verbose middle never touched your main thread. Anthropic frames the general case the same way: a subagent might explore extensively, using tens of thousands of tokens or more, but returns only a condensed, distilled summary, often one to two thousand tokens.
So the mental model is a research assistant. You send them into the library. They read forty books. They come back and tell you the one paragraph you needed. The forty books never pile up on your desk.
The tool that does this, and a name change worth knowing
Under the hood, the main session spawns a subagent through a built-in tool. And there's a recent rename you should know about, because half the blog posts out there use the old name. As of Claude Code version two point one point six three, the tool that was called Task is now called Agent. The old Task references still work as aliases in settings and agent definitions, so nothing breaks, but going forward the tool is Agent. If you read someone talking about "the Task tool," they mean today's Agent tool.
One hard rule that comes with this tool, and it shapes a lot of what follows: subagents cannot spawn other subagents. No nesting. A subagent is a leaf. This is deliberate, it prevents infinite recursion, and it's the reason that the bigger orchestrator pattern needs a different mechanism, agent teams, which we'll get to in a much later episode. For now, just hold onto it. One level deep. The main session delegates; the subagent does the work and reports; it can't go off and delegate further.
The built-in subagents you already have
You don't have to define anything to start using subagents, because Claude Code ships with several built in, and Claude reaches for them automatically when a task fits. Let me walk through the ones that matter.
The first is Explore. This is a fast, read-only agent built for searching and analyzing codebases. It runs on Haiku, the fast cheap model, and it's denied access to the Write and Edit tools, so it physically cannot change your code. Its whole job is file discovery and code search. When Claude needs to understand your codebase without changing anything, it sends Explore. And there's a nice detail here: when Claude invokes Explore, it specifies a thoroughness level. Quick for a targeted lookup, medium for balanced exploration, or very thorough for a comprehensive sweep.
The second is Plan. If you remember the plan-mode episode, this is the agent behind it. When you're in plan mode and Claude needs to research your codebase before proposing a plan, it delegates that research to the Plan subagent. Also read-only, no Write or Edit. The difference from Explore is that Plan inherits the model from your main conversation rather than always using Haiku.
The third is general-purpose. This is the capable one, for complex multi-step tasks that need both exploration and action. It inherits your main model, and it has access to all tools. Claude reaches for general-purpose when the task needs both reading and modifying, complex reasoning over what it finds, or several dependent steps.
There are a couple of small helper agents too, invoked automatically so you rarely think about them. One is called statusline-setup, which runs on Sonnet when you configure your status line. The other is claude-code-guide, which runs on Haiku when you ask questions about Claude Code itself.
Here's a behavioral detail worth holding onto. Explore and Plan deliberately skip your project memory file and your git status to stay fast and cheap. Every other subagent, built-in or custom, loads both. The docs are explicit that Explore and Plan are the only two that skip them, and there's no setting to change which agents do.
How you actually invoke a subagent
There are three ways to trigger a subagent, and they escalate from a gentle suggestion to a session-wide default.
The first is just natural language. There's no special syntax. You name the subagent in your prompt and Claude usually delegates. Use the test-runner subagent to fix failing tests. Have the code-reviewer subagent look at my recent changes. Claude still gets the final say on whether to delegate, but naming it is a strong nudge.
The second is an at-mention, and this one's a guarantee. You type the at sign and pick the subagent from the menu, the same way you at-mention a file. That ensures that specific subagent runs, rather than leaving the choice to Claude. One thing to understand: your full message still goes to Claude, and Claude writes the actual task prompt for the subagent. The at-mention controls which subagent runs, not the words it receives.
The third is session-wide, where the whole session takes on a subagent's identity. You launch with the agent flag and a name, and the main thread itself adopts that subagent's system prompt, tool restrictions, and model. You can also set a default agent in your project settings file. That mode is more of an Act Two flavor, so for today just know it exists.
Writing your own subagent
The built-ins cover a lot, but the real power is defining your own. A custom subagent is a markdown file with a small block of YAML configuration at the top, called frontmatter, followed by the system prompt as plain markdown. That prompt body becomes the subagent's instructions.
Here's the shape. At the top, between two lines of three dashes, you set a name, a description, the tools it's allowed to use, and the model. Then below, you write the prompt: you are a code reviewer, when invoked analyze the code and give specific actionable feedback. That's it.
One isolation detail to internalize: the subagent receives only that prompt you wrote, plus basic environment details like the working directory. It does not get the full Claude Code system prompt. It's a lean, focused worker with exactly the instructions you gave it.
Where do these files live? Two locations matter for you. Project subagents go in a dot-claude slash agents folder inside your repo. Those are meant to be checked into version control so your whole team shares them. User subagents go in the same agents folder under your home dot-claude directory, and those follow you across every project. If a project subagent and a user subagent share a name, the project one wins. There's a fuller precedence chain above those two, with managed organization settings at the top and plugins at the bottom, but project-over-user is the part you'll touch.
The four fields that matter
The frontmatter supports a long list of fields, but only two are actually required: name and description. For an Act One workflow, four fields carry the weight.
Name is the unique identifier, lowercase with hyphens. Description is the one that drives automatic delegation; it tells Claude when to hand work to this agent. Tools controls what the agent can do. And model picks which model it runs on.
There's more in there. Fields for preloading skills into the agent, for giving it persistent memory across sessions, for running it in an isolated git worktree, for scoping it to its own MCP servers. Those are real and we'll meet several of them in later episodes. For one-off delegation, name, description, tools, and model are your kit.
Let's take tools and model in turn, because they're where the real control lives.
Restricting tools, one agent at a time
The tools field is an allowlist. List some tools, and the subagent can use only those. Leave it out, and the subagent inherits every tool the main conversation has. So a read-only researcher might list Read, Grep, Glob, and Bash, and nothing else. It can search and read and run commands, but it cannot edit or write files, and it can't touch any MCP tools. That restriction isn't a polite request, it's enforced. The agent simply doesn't have the capability.
There's also a denylist field, disallowedTools, which subtracts from whatever the agent would otherwise have. So you could say, inherit everything except Write and Edit, and you get an agent that keeps Bash and MCP and the rest, minus the ability to change files.
This matters for the pitfall later, so file it away: a handful of tools are never available to a subagent no matter what you list, because they depend on the live session. One of them is the tool Claude uses to ask you a question. A subagent literally cannot ask you a clarifying question, because it doesn't have the tool to do it.
Picking a model per agent
The model field takes an alias, sonnet, opus, or haiku, or a full model identifier, or the special value inherit, which means use whatever the main conversation is using. If you leave it off, it defaults to inherit.
This is a genuine cost lever, and it's worth being deliberate about. High-volume grunt work, searching files, scanning logs, the stuff where you just need a competent reader, can run on Haiku, which is fast and cheap. Save the expensive model for the work that needs real reasoning. The built-in Explore agent already does this for you by defaulting to Haiku. When you write your own search-heavy agent, consider doing the same.
Managing all of this from inside the session
You don't have to hand-write these files. There's a slash command, slash agents, that opens a management interface, and the docs call it the recommended way to create and manage subagents. It has a Running tab that shows live subagents and lets you stop them, and a Library tab where you can see every available subagent, create new ones with guided setup or have Claude generate one for you, edit their configuration and tool access, and delete the ones you've made.
One operational note that'll save you confusion. Subagents you create through the slash agents interface take effect immediately. But if you create or edit a subagent file directly on disk, you have to restart your session for it to load, because subagents are read at session start.
How automatic delegation actually fires
So how does Claude decide to delegate on its own? It looks at the task in your request, the current context, and the description field in each subagent's configuration. That description field is the trigger. Write a vague description and Claude won't know when to reach for the agent. Write a sharp one and it will.
And there's a specific phrasing Anthropic recommends to encourage this. Include words like "use proactively" in the description. You see it all over the official examples. The code-reviewer's description reads, expert code review specialist, proactively reviews code for quality, security, and maintainability, use immediately after writing or modifying code. The debugger's reads, debugging specialist for errors, test failures, and unexpected behavior, use proactively when encountering any issues.
A small accuracy note, because older write-ups got loud about this. The current docs use the calm lowercase phrase, use proactively, not the shouty all-caps versions that floated around community posts a while back. The mechanism is the same either way: a strong, specific description with a clear "when to use this" trigger makes Claude pick up the agent without you asking.
The first worked example: delegating a codebase search
Let's make this real with the canonical example. You're about to add a feature and you need to understand how the existing code works, but you don't want forty file reads cluttering your thread.
Straight from the best-practices guide: use subagents to investigate how our authentication system handles token refresh, and whether we have any existing OAuth utilities I should reuse. The subagent goes off, reads the relevant files, and reports back its findings, without any of that reading landing in your main conversation.
Put it on our default stack and it gets even more concrete. Say you're on a TypeScript Next.js app with a Postgres-backed session store. You could say: use a subagent to find every place we read the session timeout environment variable and how the middleware refreshes the Postgres session, then report just the file paths and the refresh flow. You get back a tight summary. The grep noise stays gone.
The same shape works for test output, which is one of the most useful tricks here. Use a subagent to run the test suite and report only the failing tests with their error messages. Instead of a two-thousand-line test dump filling your window, you get the three failures you actually care about. That's the isolate-high-volume-output pattern, and it's worth building a habit around.
The second worked example: a code-review subagent
The other canonical one is a code reviewer, and it shows off tool restriction beautifully. The official version is a markdown file whose frontmatter names it code-reviewer, gives it that proactive description, restricts it to Read, Grep, Glob, and Bash, so no editing, and sets the model to inherit.
The prompt body tells it: you are a senior code reviewer. When invoked, run git diff to see recent changes, focus on the modified files, and begin reviewing immediately. Then it lays out a checklist: clear readable code, well-named functions and variables, no duplication, proper error handling, no exposed secrets or API keys, input validation, test coverage, performance. And it tells the agent to organize feedback by priority, critical issues that must be fixed, warnings that should be fixed, and softer suggestions, with specific examples of how to fix each one.
The best-practices guide gives a security-focused cousin of this, a security-reviewer set to the opus model, restricted to those same read-only tools, told to look for injection vulnerabilities, auth flaws, secrets in code, and insecure data handling. You invoke it with a line as simple as: use a subagent to review this code for security issues.
There's a reason a fresh subagent makes a good reviewer, and it's the same reason isolation is the theme of this episode. A reviewer running in a fresh context sees only the diff and the criteria you give it, not the reasoning that produced the change. So it judges the result on its own terms, instead of being biased toward code it just wrote. There's even a built-in slash code-review skill that does exactly this, reviewing your current diff for bugs in a fresh subagent and returning the findings. Worth knowing: this fresh-eyes reviewer is the seed of a much bigger review-and-fix loop we'll build in Act Two. Today it's just one-off delegation.
Running several at once
You can also run subagents in parallel when the work splits cleanly. Research the authentication, database, and API modules in parallel using separate subagents. Each one explores its own area, and then Claude stitches the findings together. The win is wall-clock time. Three searches go out at once, three summaries come back, and the total time is roughly one search instead of three.
But parallelism has real limits, and this is where people overdo it. The overhead is genuine. Every subagent starts from a blank slate and spends its first few turns rebuilding local understanding. So the math matters. Parallelizing four tasks that each take two minutes saves you real time. Parallelizing four tasks that each take thirty seconds costs more than it saves, because the startup overhead swamps the work. Don't parallelize trivial jobs.
On concurrency, the practical sweet spot people land on is three to five subagents at a time. Beyond that, you're juggling so many returned summaries that the synthesis overhead eats the time savings, and on lower API tiers, extra subagents past roughly five just queue up and wait. The docs don't publish a hard cap; they warn instead that running many subagents that each return detailed results can itself consume significant context. Which is the irony to keep in mind: the tool that saves context can spend it if you fan out too wide and every worker hands back a fat report.
There's also a foreground-versus-background distinction. A foreground subagent blocks your session until it's done and passes permission prompts through to you. A background subagent runs while you keep working, uses the permissions already granted, and auto-denies anything that would otherwise prompt. You can background a running task with control-B. For today, foreground is the default mental model.
When to reach for a subagent instead of a skill or a slash command
Now the question you'll actually face. You've got slash commands, you've got skills from last episode, and now subagents. When do you pick which?
The cleanest cut is about context. A skill runs in your main conversation. It injects expertise or a workflow into the thread you're already in. A subagent runs in a separate, isolated context and hands back only a summary. So the deciding question isn't what the work is, it's where you want it to happen. If you want the work in front of you, with extra knowledge loaded into the current conversation, that's a skill. If you want the work offloaded into a side process so its noise never touches your thread, that's a subagent. The docs say it directly: consider a skill instead when you want reusable prompts or workflows that run in the main conversation context rather than isolated subagent context.
A slash command is a different axis entirely. It's a typed entry point you trigger yourself. It's about invocation, about giving yourself a clean way to kick something off, not about context isolation. And worth a quick accuracy note: in twenty twenty-six Anthropic merged the command and skill shapes, so a skill file and a command file can both produce the same slash-name trigger. The point is, slash commands answer "how do I fire this off," while subagents answer "should this run in its own context."
So, the rule of thumb. A typed entry point you invoke by hand: slash command. Reusable expertise or a workflow that should run in your current conversation: skill. A noisy or self-contained job whose middle you don't want polluting your thread, that can come back as a tidy conclusion: subagent. And for a quick throwaway question about something already in your conversation, there's an even lighter tool, slash b-t-w, which sees your full context, has no tools, and discards the answer instead of adding it to history.
The honest costs
Subagents aren't free, and it's worth being clear-eyed about the tradeoffs.
They consume tokens. The work still happens, it just happens elsewhere. A subagent can burn tens of thousands of tokens internally even though only a few hundred to a couple thousand come back. So the savings are about your main context window's cleanliness, not about doing less total work.
They add latency. Because a subagent starts fresh, it needs time to gather context before it can do anything useful. That cold start is the price of isolation. If you're making a quick targeted change, or if you're deep in an iterative back-and-forth, the main conversation is faster, because it already has the context loaded.
And you can't steer them mid-task. Once a subagent launches, it runs to completion on the task prompt Claude wrote. You can't nudge it the way you'd hit escape and redirect the main session. Remember that missing tool from earlier, the one that lets Claude ask you a question? Subagents don't have it. So a subagent can't pause to ask you for clarification. A background one that hits a wall just auto-denies and keeps going, possibly toward a worse answer.
The pitfall you'll actually hit
Which brings us to the single biggest pitfall, the one that'll bite you in week one. A subagent starts with a blank context. It does not see your conversation history. It does not see the files Claude already read. It does not see the skills you already invoked. The only thing it knows is the task prompt that gets handed to it.
So if you say, go fix the auth thing we just discussed, the subagent has no idea what "we just discussed" was. That context lives in your thread, not in its head. Vague delegation across that isolation boundary produces confidently wrong results.
The fix is to scope the delegation tightly and restate anything load-bearing. The docs give a clean example: if a rule matters, like ignore the vendor directory, you have to put it in the prompt you give Claude when delegating, because the subagent won't infer it. Spell out the constraint. Name the files. Say what "done" looks like.
Here's the recognition signal, so you catch it in the moment. If a subagent comes back with an answer that's confidently wrong and ignores some obvious project constraint, your delegation prompt almost certainly didn't carry that constraint across the boundary. The agent didn't fail. The handoff did.
Two smaller versions of the same mistake. One is over-delegation: reaching for a subagent on trivial or tightly-coupled work, where the cold-start cost and the lost shared context make it slower and worse than just doing it inline. The other is the flip side, the infinite exploration, asking a subagent to "investigate" something without scoping it, so it reads hundreds of files chasing a vague target. The cure for both is the same discipline: scope it narrowly, or don't delegate it.
Where this leaves you on the ladder
So here's the rung you've climbed. You can now take a noisy, self-contained job, hand it to a fresh agent with a tight prompt and a restricted toolset, and get back a clean conclusion while your main thread stays uncluttered. You can route the cheap grunt work to Haiku. You can stand up a read-only reviewer that judges your diff with fresh eyes. That's one-off delegation, and it's a real power-user habit.
The moment you start having a lead agent dispatch coordinated waves of workers that hand results to each other, you've crossed into the orchestrator pattern. That needs agent teams, because subagents can't nest, and it lives in Act Two. We named it; now we leave it.
Next episode we step into MCP servers, connecting Claude to external tools and data. And subagents will show up there again, because you can scope a subagent to its own set of MCP servers, keeping a whole tool's worth of context out of your main thread. Same theme, one more turn of the screw. Keep the main session clean, and let the workers carry the noise.