OCDevel
Walk
The logo for OCDevel Claude Code features clean, modern typography paired with minimalist developer-centric iconography representing the Claude command-line interface.
OCDevel Claude Code Podcast
The podcast for developers who live in Claude Code. A fast news segment on the latest Claude Code releases with a hands-on tutorial that levels up your agentic coding. The news covers what actually shipped across Claude Code and the wider Anthropic stack - new versions, models, pricing, plus the MCP servers, skills, and hooks worth your time. Then the tutorial climbs a single ladder across the series: from driving one Claude session by hand in your terminal, to power-user tooling (custom slash commands, subagents, MCP), to multi-agent fleets, to autonomous review-and-fix loops, to a full pipeline where you file a GitHub issue from your phone and Claude implements the feature, opens the PR, runs the tests, and ships to production while you're on the beach. Claude as the senior engineer on your one-person team. One copyable workflow and one real pitfall per episode - every command, flag, and setting named exactly as it appears in the tool. For working developers who want to stop typing every keystroke and start directing. AI-generated podcast by OCDevel.
CTA
Generated with OCDevel PodcasterMade with OCDevel Podcaster
This show was made with OCDevel Podcaster: turn any topic or text into an AI-narrated podcast episode that drops right into your feed.Turn any topic into an AI-narrated episode in your feed.Create your own →Create your own →

The orchestrator pattern: promote one Claude Code session to dispatch waves of subagents

12h ago

Stop hand-wiring parallel sessions and let one Claude become the dispatcher: it spins up waves of subagents that work in parallel and report back. Your first session that runs a team instead of a task, plus how to keep the roughly fifteen-times token bill from running away with you.

Show Notes

The first rung of running a fleet instead of a session: promote one Claude Code session to a lead that dispatches waves of subagents, which work in parallel and report back.

The tutorial. The orchestrator-worker pattern, drawn from Anthropic's multi-agent research system writeup (Opus lead plus Sonnet workers beat single-agent Opus by ~90%, at roughly 15x the tokens of a chat, with effort scaled to query complexity). How it maps onto Claude Code today: the Agent tool (renamed from Task in v2.1.63) spawns workers in their own context windows that return only a summary; the two-level limit (subagents can't spawn subagents, so "waves" are batches); foreground vs background workers and Ctrl+B. Writing a custom subagent in your project's agents folder, with the frontmatter that turns earlier episodes' cost levers into per-worker dials: model (Sonnet/Haiku workers under an Opus lead), maxTurns, effort, tools, skills, mcpServers, and isolation: worktree (the callback to last episode's worktrees). A worked fan-out migration: Explore to map files, partition by file ownership, complete delegation prompts, structured returns, and a synthesis-and-test stage, plus the packaged /batch skill (5-30 worktree subagents, a PR each). Where it scales next: agent teams and dynamic workflows.

The pitfall: token blowup from over-orchestrating, with the blank-context worker, file collisions, and the lead losing the thread underneath it. How to recognize each on /usage and /context, and how to bound it. The rule: orchestrate for breadth and independence, stay single-agent for depth and coupling.

News. Claude Code 2.1.162 (June 3): a waitingFor field in the agents JSON, Read deny rules now hide files from Glob/Grep, and Windows path-matching fixes (changelog). API changes June 2: no billing on zero-output refusals and a max_tokens cap on the advisor tool (release notes).

Earlier episodes referenced: subagents, skills, CLAUDE.md, context windows, MCP servers, cost and rate-limit engineering, and parallel sessions with git worktrees.

Transcript

The news

Light week on the Claude Code beat, but there's one real release to flag. Claude Code 2.1.162 shipped yesterday, on June third, and as of this morning nothing newer has landed on either the changelog or the GitHub releases page. So that's the build to be on.

The change I'd actually act on first: if you run agents in the background, the agents view now tells you why one is stuck. Ask for the agents list as JSON and you'll see a new "waiting for" field on each session, naming exactly what it's blocked on, which is usually a permission prompt. Before this, a parked agent and a busy agent looked identical from the outside. Now you can filter your fleet down to the ones sitting on a dialog box and go unstick them. It's a small field, but it's a real difference once you're running more than two or three sessions at once, and that's exactly where today's episode is heading.

The other change in that build is a quiet security tightening, the kind of thing you want to know about even though nobody put it in a headline. Read deny rules now also hide files from the search tools. So if your settings deny reading, say, your environment file, Claude can no longer stumble onto its contents through Glob or Grep either. Denied files used to leak through search; that gap is closed. Windows users get a matching fix: permission rules now match backslash-style paths and case variants, so your deny rules actually fire instead of being silently bypassed.

A couple of smaller paper-cuts in the same release. Clicking a slash command in the autocomplete menu now fills it into your prompt instead of running it on the spot, so you press Enter to fire it. Remote Control moved from a one-time startup message to a persistent footer pill with the session link, so it's easier to grab mid-run. And there's a fix for a silent hang on startup when your config directory was read-only; Claude now boots with an in-memory config and tells you what went wrong instead of just freezing. The action item is short: check your version, confirm you're on 2.1.162, and if you script your agents, go look at that waiting-for field.

On the API side, two changes landed June second that matter if you drive Claude programmatically, through the Agent SDK or print mode. First, you're no longer billed for a request that comes back as a pure refusal with no output. A refused call that generates zero output tokens is now free, where it used to still cost you. If you run high-volume agent or eval workloads that trip a guardrail now and then, you can stop building cost workarounds for that case. Second, the advisor tool now takes a max-tokens parameter, so you can cap the advisor model's output per call and cut both latency and token cost when you don't need a full-length answer. Both of those are on the platform release notes.

And one piece of industry context, not a feature you'll touch today. Anthropic added a tiered Services Track and a Partner Hub to its Claude Partner Network on June third. It's enterprise-facing, about certified consultancies and where they rank. The number worth hearing is the scale: since that program opened in March, more than forty thousand firms have applied and over ten thousand consultants have earned Claude certification. It matters mostly if you hire or work at a shop building on Claude. Otherwise, file it and move on. That's the board. Onto the main event.

The orchestrator pattern

Last episode you stopped driving one Claude Code session and started running several at once, each in its own git worktree, each on its own branch, so they wouldn't collide. You were still the one wiring them up, though. You opened the worktrees, you handed each session its task, you walked between them. That's parallelism by hand. Today we promote one Claude to do that coordinating work for you. One session becomes the dispatcher, and it spins up waves of subagents that go off, work in parallel, and report back. This is the first real taste of running a fleet instead of a session, and it's where the second act of this show actually starts.

The shape has a name, and it predates Claude Code. Anthropic calls it the orchestrator-worker pattern, and the clearest writeup is their engineering post on how they built their multi-agent research system. A lead agent looks at the job, works out a strategy, and then spins up subagents to chase different parts of it at the same time. Each worker runs on its own, with its own tools and its own context window, does its slice, and hands its findings back. The lead pulls those findings together. That's the whole idea. One brain decomposes the work, routes it, and synthesizes the results; several hands do the parallel labor.

I want to give you the headline numbers from that post up front, because they set the stakes for everything that follows, including the pitfall at the end. On Anthropic's internal research eval, a multi-agent setup with Claude Opus as the lead and Claude Sonnet as the workers beat a single Opus agent by about ninety percent. Hold onto that model split, Opus leading and Sonnet doing the legwork, because it maps directly onto a setting you'll write yourself in a few minutes.

The same post is blunt about the cost, though. Think of a single chat turn as one unit of token spend. A single agent doing tool calls uses roughly four times that. A multi-agent system uses about fifteen times the tokens of a plain chat. And they found that token usage by itself explained about eighty percent of the difference in how well the system performed. Their own one-line summary is worth repeating: multi-agent systems work mainly because they help you spend enough tokens to solve the problem. So the lift is real and the bill is real, and those two facts are the same fact. That tension is the spine of this whole episode.

There's one more number from that research worth carrying, because it's the antidote to over-orchestrating. They wrote effort-scaling rules right into the lead agent's prompt. Simple fact-finding gets one agent and a handful of tool calls. A direct comparison might get two to four subagents with ten or fifteen calls each. Only broad, open-ended work earns a larger fan-out. The point of telling the lead these numbers is to stop it from throwing five agents at a question that wanted one. We'll come back to that idea, because knowing when not to orchestrate is most of the skill here.

What this actually is inside Claude Code

So how does the pattern show up in the tool you already use every day? The orchestrator isn't a special mode you switch on. It's just your main session, given permission to delegate. When the lead hits a chunk of work that suits a worker, it spawns one using the Agent tool. The worker runs in its own fresh context window, does the job, and returns only its final message to the parent. Your main session's context grows by that summary alone, not by all the files and logs and searches the worker churned through to produce it.

Now, a naming note, because the tool moved and most blog posts and even last episode are slightly out of date here. The tool that spawns a subagent used to be called the Task tool. It was renamed to the Agent tool back in version 2.1.63, earlier this year. The old Task references still work as aliases, so nothing breaks, but when you see the agent at work today it'll say it's calling the Agent tool. So when the research paper and the older docs say "Task," picture the Agent tool. Same mechanism we built back in the subagents episode, new name.

That fresh context window is the feature and the footgun at once, and it's the exact tradeoff from the subagents episode, just at a larger scale. Let me be precise about what a worker knows when it wakes up, because the pitfall later hangs entirely on this. A subagent starts with its own system prompt, plus the delegation message the lead writes when it hands off the task. It gets your memory hierarchy, your CLAUDE.md files at the user, project, and local levels. It gets a snapshot of git status taken when the parent session started. And it gets any skills you named for it. That's it.

Here's what it does not get. It does not see your conversation history. It does not see the files Claude already read in the main session, or the earlier tool results, or the skills you invoked earlier. The docs put it as plainly as it can be put: the only channel from parent to subagent is the prompt string the lead passes through the Agent tool. So anything the orchestrator forgets to write into that delegation prompt simply does not exist for the worker. Burn that into memory now. It is the single most load-bearing fact in this episode.

There's a structural limit that shapes the whole architecture, and it surprises people. Subagents cannot spawn their own subagents. The tree is exactly two levels deep: the lead, and its workers, and no grandchildren. The docs say it directly, and one of the built-in agents exists partly to stop infinite nesting. So when I say the lead dispatches "waves" of subagents, I mean it fires a batch, they come back, it fires another batch, and so on. It is not a deep recursive hierarchy of agents spawning agents. It's one coordinator running rounds.

How do those workers actually run at the same time? Claude Code splits subagents into foreground and background. A foreground subagent blocks your main conversation until it finishes, and its permission prompts pass through to you. A background subagent runs concurrently while you keep working; it runs with whatever permissions you've already granted, and it auto-denies anything that would otherwise pop a prompt. That last detail matters: a background worker that hits a clarifying question doesn't wait for you, it gets a no and keeps going. Claude usually picks foreground or background per task, but you can ask for background explicitly, or press Control-B to push a running task into the background, and the slash-tasks command lists everything running in the background of the current session. If you want to switch the whole behavior off, there's an environment variable that disables background tasks entirely.

The way you actually trigger a parallel fan-out is plainer than you'd guess: you ask for it in words. Tell the lead to research the authentication, database, and API modules in parallel using separate subagents, and Claude spawns one worker per area, lets them explore independently, and then synthesizes what they found. The docs add one honest caveat that's easy to forget under deadline pressure: this only works well when the research paths don't actually depend on each other. If module B's answer changes how you'd look at module A, splitting them sends two workers down the wrong roads at once. Independence is the precondition, not a nice-to-have, and the moment your slices start needing to talk to each other mid-flight you've outgrown plain subagents and want one of the heavier modes we'll reach at the end.

Why run them in their own windows at all, instead of just doing everything in the main session? Compression. Each worker's noisy exploration, the dead-end searches, the files it opened and discarded, all of that stays inside the worker's context window and never comes home. Only the distilled summary returns. So the parallel structure isn't only about speed, it's about keeping the lead's context clean while several workers each chew through more material than would ever fit in one window. That's the same "push noisy work into a subagent" idea from the context-windows episode, now running several times over at once.

The pieces you already have

You don't start from nothing here, because Claude Code ships with built-in subagents the orchestrator reaches for on its own. There's Explore, a fast, read-only agent that runs on Haiku and can't write or edit anything; it's for finding files and searching the codebase, and you can ask it for a quick, medium, or very thorough sweep. There's Plan, the read-only research agent that gathers context in plan mode and, not coincidentally, exists partly to prevent that infinite nesting we just talked about. And there's general-purpose, which gets all the tools and inherits the main model for multi-step jobs that need both searching and doing.

Two commands are worth keeping straight because they look almost the same and do different things. The slash-agents command opens a manager with a running tab, where you watch and stop live subagents, and a library tab, where you create, edit, and delete your own custom agents, or have Claude generate one for you. Separately, typing claude agents, as a subcommand rather than a slash command, opens a research-preview agent view for dispatching and monitoring background sessions, and each session it dispatches moves into its own worktree automatically. That second one is the direct descendant of last episode: Claude managing the worktrees for you instead of you running git worktree add by hand.

Writing your own worker

Now to the part you'll actually build. A custom subagent is just a Markdown file with a little block of settings at the top, and the body of the file becomes that agent's system prompt. You put these files in an agents folder inside your project's dot-claude directory, and because they live there, they get checked into version control and your whole team shares them. That project folder is the natural home for an orchestrator and its workers.

Picture a small one. At the top you give it a name, lowercase with hyphens, and a description that tells Claude when to hand this agent work. That description is what drives automatic delegation, so it's worth writing well; if it says "use this for code review," Claude will route review-shaped tasks to it without you asking. Below the settings, you write the system prompt in plain prose: you are a code reviewer, when invoked analyze the code, give specific actionable feedback on quality and security. Only the name and the description are required. Everything else is a dial.

And those dials are where the cost discipline from earlier turns into something you can set. The tools field is an allowlist; leave it off and the worker inherits everything, or list just Read, Grep, and Glob to pin a fan-out worker to read-only so it can't accidentally write. The model field takes Opus, Sonnet, Haiku, or a full model name, and it defaults to inherit. This is exactly where you put your workers on Sonnet or Haiku while the lead stays on Opus, the Opus-leads-Sonnet-works split from the research, written in one line. There's a max-turns setting that caps how many agentic turns a worker takes, which is the per-subagent version of the max-turns flag we met in the cost episode, and it's your seatbelt against a worker that loops forever. There's an effort setting, low through max, which is the thinking-budget lever from that same episode, now scoped to a single agent.

Two more settings reach straight back into earlier episodes. There's a skills field that preloads a full skill into the worker when it starts, so the expertise we packaged back in the skills episode rides along into the fan-out. And there's an MCP-servers setting that scopes a server to just this worker, which means you can keep a heavy MCP server out of your main context entirely and define it only on the one agent that needs it, a nice answer to the context-flooding problem from the MCP episode.

The setting that ties this episode to the last one is called isolation, and you set it to worktree. When you do, the subagent runs in a temporary git worktree, an isolated copy of the repo branched, by default, off your default branch rather than the parent's current commit. Its edits land in that worktree, not in your working copy, and if the worker makes no changes the worktree gets cleaned up automatically. That's the bridge from last episode's manual worktree commands to Claude spinning up and tearing down worktrees for its workers on its own.

One practical gotcha on these files. Subagents are loaded when the session starts. If you create one through the slash-agents command it takes effect right away, but if you write the Markdown file by hand on disk, you have to restart the session for Claude to pick it up. And on precedence, when the same agent name exists in more than one place, a project-level agent in your dot-claude folder beats a personal one in your home directory, so the version your team shares wins over your local copy. Don't burn ten minutes wondering why your edit isn't taking; you probably just need to restart, or you've got a name collision.

You've got three ways to actually invoke one of these, escalating in how much you force the issue. The loosest is natural language: name the agent in your prompt, like "use the test-runner subagent to fix the failing tests," and Claude decides whether to delegate. Firmer is an at-mention of the agent by name, which guarantees that specific worker runs for that task instead of leaving it to Claude's judgment. And firmest, you can launch the whole session as that agent by passing its name on the command line, which replaces the default system prompt, tools, and model with the agent's for the entire run. That last form is how you'd stand up a dedicated orchestrator as your main session rather than your everyday Claude.

There's a related control worth knowing when you build an actual coordinator, because by default a main session can spawn any worker it likes. In the agent's settings you can list the Agent tool with specific worker names in parentheses, and that turns it into an allowlist: the coordinator can only spawn those named workers, and it doesn't even see the others. Write the Agent tool with no parentheses and it can spawn anything; leave the Agent tool out of its tools entirely and it can't spawn workers at all. You can also deny specific subagents globally from your settings. This is how you keep a coordinator on rails, so it dispatches the three workers you designed for the job and can't wander off recruiting agents you never meant it to use.

A worked fan-out

Let me make this concrete with the job orchestration is genuinely good at: a wide, mechanical, multi-file change. Say you're migrating every API route handler in a Next.js app off an old database helper and onto a new typed Postgres client. Dozens of files, the same edit in each, mostly independent. That's the sweet spot, because the work is broad and the slices barely depend on each other.

The lead's first move is cheap reconnaissance. It sends Explore, the read-only Haiku agent, to enumerate every file that calls the old helper. That costs almost nothing and keeps the noisy search out of the lead's own window. Then the lead partitions that file list into slices so that no two workers ever touch the same file. This is the partition-by-file-ownership rule straight from the worktrees episode, and it applies here even without worktrees, because two agents editing the same file will clobber each other's work just as surely as two humans would.

Now the lead spawns one worker per slice, and here's the discipline that separates a working orchestrator from a token bonfire: every delegation prompt is complete on its own. Each worker gets its objective, the exact files it owns, the before-and-after pattern to apply, the tools it's allowed, and the precise format to report back in. Remember, the prompt string is the only channel. If there's a rule like "leave the vendor directory alone" or "we keep auth tokens in http-only cookies," and the lead doesn't write it into the delegation, the worker has never heard of it. So you spell it out, every time, per worker.

The worker file itself is small. You name it something like migration-worker, describe it as migrating a bounded set of files one slice at a time, give it Read, Edit, Grep, Glob, and maybe Bash so it can run that slice's tests, set its model to Sonnet, and optionally set isolation to worktree so it edits its own checkout. Its system prompt tells it to migrate only the files in its task prompt, run the tests for each, and if a file's tests fail, revert that file and report it as needing manual review rather than leaving the build broken. Then it returns a tidy list: each file, its status, and any notes.

That structured return is not a nicety, it's the mechanism. Because only a worker's final message comes back to the lead, a worker that returns three paragraphs of prose hands the lead something it can't cleanly reconcile across ten slices. A worker that returns a clean list of file, status, and notes hands the lead something it can actually aggregate. This is the orchestrator-side version of "subagents return summaries" from the cost episode: design the output format, don't leave it to chance.

The last stage is synthesis, and it's where the lead earns its keep. It collects the slice reports, runs the full test suite once, or better, delegates that to a dedicated test-runner agent that returns only the failures, and gives you one consolidated picture: here's what changed, here are the four files that need a human, here's the test result. If you want belt-and-suspenders, a final verification worker re-reads the changed files for correctness, which is the same move Anthropic's system makes with its dedicated citation agent that does a last pass purely to attribute claims. Orchestration is rarely just fan-out; the good versions bookend the parallel work with a cheap exploration step at the front and a synthesis-and-verify step at the back.

It's worth naming the two ways workers relate to each other in a flow like this, because the migration uses both. Fanning out is the parallel part: independent workers, each on its own slice, all reporting back to the lead, which then synthesizes. Chaining is sequential: one worker finds the problems and returns, and the lead feeds that result into a second worker that fixes them, so context flows forward one link at a time. A real orchestrator mixes the two. You fan out the independent edits, then chain into a synthesis-and-test stage that depends on all of them being done. Knowing which relationship a given step needs, parallel because the slices are independent, or sequential because this step needs the last one's output, is most of the design work.

And here's the payoff: you don't have to hand-roll all of that. Claude Code ships a skill called slash-batch that does exactly this shape for you. The docs describe it as splitting one large change into five to thirty worktree-isolated subagents that each open their own pull request. That's the whole pattern, subagents plus worktrees plus a PR per slice, packaged behind one command. When your migration is big and clean, reach for that before you build your own orchestrator from parts.

The pitfall, and it's the one you'll hit

Every episode gets one pitfall you'll actually run into, and for orchestration it's token blowup, with a small family of failure modes living underneath it. Let's take the headline first. A multi-agent run costs roughly fifteen times what a chat does. Run ten workers and you're burning quota about ten times as fast as a single session would. Agent teams, the heavier cousin we'll get to in a second, can use around seven times the tokens of a standard session when the teammates are planning. The point is that orchestration multiplies every cost lever from the cost episode at once: output is five times input, the thinking budget bills as output, and now you're paying all of it across a fleet.

You recognize this one by watching your meters. The slash-usage and slash-context numbers spike fast, and helpfully, slash-usage now breaks your spend down and attributes it to skills, subagents, plugins, and MCP servers as percentages, with keys to flip between the last day and the last week. When subagents are eating your budget, that screen tells you so directly. You bound it with the dials we just set: workers on Sonnet or Haiku instead of Opus, a max-turns cap so no worker runs away, a lower effort setting, and summary-only returns so the results stay small. And above all the gating question from the research: is this task valuable enough to justify spending an order of magnitude more? If it's a quick targeted change, a single agent is cheaper and clearer, full stop.

The second failure mode is the blank-context worker that's confidently wrong, and it's the direct descendant of the subagents-episode pitfall. Because the prompt string is the only channel, a worker that wasn't told a constraint will cheerfully produce output that violates it, and sound completely sure doing it. Anthropic saw the exact thing: when the lead's delegation was vague, their subagents misread the task or ran the same searches as each other, so they duplicated work, left gaps, or missed what mattered. You recognize it when two workers come back having done the same thing, or when a result looks plausible but quietly breaks a rule you never wrote down. You bound it by writing complete task specs, restating the critical CLAUDE.md rules right there in the delegation prompt even though the worker technically loaded them, and, for a side task that genuinely needs your full session context, reaching for a fork instead, which I'll explain in a moment.

Third, collisions. Two workers editing the same file overwrite each other, and the heavier agent-teams mode does not put teammates in separate worktrees, so you have to keep their files disjoint yourself. You recognize this as lost edits, merge chaos, or a change that passed in one place and then vanished. The fix is the one we already used: partition by file ownership, or set isolation to worktree so each worker edits its own copy and the lead collects the branches afterward.

Fourth, and this one's sneaky: the orchestrator loses the thread. The docs warn about it directly. When many subagents each return a detailed result, all those summaries pile into your main conversation and eat the lead's own context. The coordinator starts forgetting earlier results, repeating itself, or quietly compacting in the middle of the run. You recognize it when the lead re-delegates something it already got back, or starts doing the work itself instead of waiting. You bound it by keeping the worker returns terse and structured, and by preferring fewer fat slices over many tiny ones, so the lead reconciles five reports, not fifty.

There's a related quirk in the heavier agent-teams mode that's worth previewing here, because it's the same family of failure. Sometimes the lead starts implementing tasks itself instead of waiting for its teammates to finish, and sometimes it decides the team is done before all the tasks actually are. The documented fix is almost comically direct: you tell it, in writing, to wait for its teammates to complete their tasks before proceeding. It's a good reminder that a lot of orchestration tuning is just telling the coordinator, explicitly, to behave like a coordinator and not roll up its sleeves and do the work it was supposed to hand out.

And the last one is knowing when not to orchestrate at all, which honestly is most of the skill. A single agent wins when the task needs back-and-forth refinement, when the phases share a lot of context, like planning that flows into implementation that flows into testing, or when it's just a small change. Anthropic's framing is clean: multi-agent underperforms when the agents have to share context or depend heavily on each other. They specifically called out coding as harder to split this way than open-ended research, because code changes are so often coupled. So the rule of thumb to leave with: orchestrate for breadth and independence, stay single-agent for depth and coupling. Three focused workers beat five scattered ones every time.

Where this goes next

Step back and the orchestrator's real job comes into focus. It is not the thing that does the work. It is a router. It decomposes the job, sends each bounded piece to the right specialist by description, scales its effort to the complexity instead of fanning out reflexively, and then synthesizes. Anthropic's principles line up with that: teach the lead to delegate by giving each worker a complete spec, scale effort to the query so you don't send five agents after one fact, and start wide before you narrow, explore first, then aim your workers.

A few more of their principles are worth carrying, because they're cheap to apply and they're the difference between a coordinator that works and one that flails. The first they call thinking like your agents: build a mental model by actually watching a worker run through its prompt step by step, because once you've seen where it gets confused, you know what to add to the delegation. The second is to guide the lead's thinking, using extended thinking as a controllable scratchpad where it plans the decomposition before it spawns anything, so the split is deliberate rather than reflexive. And the third is almost funny but real: let the agents help fix themselves. Claude is good at diagnosing its own failed run and suggesting how to rewrite the prompt that caused it, so when a fan-out goes sideways, one of your better moves is to ask the lead what went wrong and how it would word the delegation differently next time.

Subagents are the entry point to a ladder of parallelism, and it's worth naming the rungs so you know where today sits. What we built, the lead coordinating workers inside one conversation where they report only back to the lead and never to each other, is the cheapest rung, because only summaries come home. Above it sits agent teams, an experimental mode you turn on with a flag, where several full sessions run with a lead and teammates. The difference that makes it more powerful, and more expensive, is the coordination machinery: the teammates share a task list with pending, in-progress, and done states and dependencies between tasks, they claim work off it with file-locking so two don't grab the same item, and they have a mailbox where they message each other directly instead of routing everything through the lead. Each teammate is a separate Claude instance, so this costs real money, and the guidance is to start with three to five teammates and roughly five or six tasks each, with no nested teams and one team at a time. Nicely, you can reuse a subagent definition you already wrote as a teammate just by naming it, so the workers from today carry forward. That's the fleet endgame this show keeps pointing at, and we'll spend real time there later in the act. And above even that, in research preview, are dynamic workflows, where orchestration moves into a script that runs many subagents and cross-checks their results, built for jobs too big to coordinate a turn at a time, think a five-hundred-file migration, and it runs outside the conversation so the lead's window never fills with summaries.

One more tool worth a mention, because it's the escape hatch for the blank-context problem. A fork is a subagent that inherits your entire conversation so far instead of starting empty, same history, same tools, same model, so you can hand it a side task without re-explaining everything. Its tool calls still stay out of your window, only its result returns, and it reuses the parent's prompt cache, so it's cheaper than spinning up a cold worker. When starting fresh would hurt because the context matters, a fork is the lightweight orchestrator move.

Now, everything today happened inside the terminal, with Claude deciding when to spawn workers. The next rung is taking the orchestrator out of the interactive session entirely and driving it from a script, with print mode and the Claude Agent SDK, where you define your agents in code, pick Opus for the strict reviewer and Sonnet for everything else with an actual function, and let multiple subagents run concurrently in a pipeline you wrote. That's next episode. For now, the move to practice is the one in the middle of this one: send Explore to map the work, partition it so nobody collides, write each worker a complete prompt, put the workers on a cheaper model, and make every one of them report back in a format you can actually stitch together. That's the orchestrator pattern, and it's your first session that runs a team instead of a task.