
The two highest-leverage habits in a single Claude Code session: make it interview you and plan the whole change in writing before it touches a file, then make it tear the diff apart, cold, before anything gets committed. Both are free, and they cover each other's blind spots.
The last big habit of driving one Claude Code session by hand well: front-load the thinking, then back-load the review. Two workflows, not commands, built from primitives you already have.
Ultraplan. Plan mode as the substrate (shift-tab into the read-only state, the approval gate you can edit and send back), then the moves that turn it into a workflow: let Claude interview you to lock requirements before it guesses, write the plan to a file so it survives a context reset, and have it critique its own riskiest assumptions before you approve. Spend a large thinking budget where being wrong is costly, skip the ceremony on trivial changes, and remember thinking tokens bill as output (callback to the cost episode). Codify the ritual as a custom slash command with read-only allowed-tools. Sources: Claude Code common workflows, best practices, slash commands, and managing cost.
Ultrareview. Review the diff, not your memory of watching it happen: git diff against main, automated gates first (typecheck, lint, tests, build), then human-and-model judgment on the logic and security bugs no check sees, the untenanted query and the secret in a log line. Use the built-in /security-review and the claude-code-security-review action (mind the prompt-injection caveat on fork PRs). Wire the mechanical floor into hooks so a failing typecheck can't be committed, and write the "before every commit" list into your CLAUDE.md.
The pitfall: review theater. A session that wrote the code rubber-stamps its own work with vague praise and zero findings. Recognize it by the absence of specifics; fix it by reviewing the diff cold, in a cleared context or a subagent that never saw the code written, and by forcing a why-is-this-correct justification per change. That cold-diff reviewer is the doorway to the next episode's review-and-fix loop.
News. Opus 4.8 fast mode reportedly got around 2.5x faster at roughly a third the old price (announcement); Claude Code 2.1.161 (June 2) now carries OpenTelemetry resource attributes through as labels and adds a done/total counter to the agents view (changelog); and a Strava MCP connector lands as the connector list keeps filling in.
Earlier episodes referenced: permissions and plan mode, custom slash commands and hooks, skills, subagents, MCP servers, context windows and CLAUDE.md, cost and rate-limit engineering, and parallel sessions with git worktrees.
What shipped this week
Quiet midweek on the Claude Code release front, so the news is short, and I'm only running the items that are real and fresh. Freshest first.
The one most likely to change your day is the new pricing on Opus four point eight's fast mode. Anthropic's Opus 4.8 announcement keeps the standard token rates flat, but fast mode reportedly got both quicker and a lot cheaper. The reported numbers are roughly two and a half times faster output, at about a third of what fast mode used to cost. The old fast mode billed at around six times the standard rate; the new figure is closer to two times standard. If you'd written fast mode off as a luxury you only reach for when you're truly blocked, that math just flipped.
The thing to remember about fast mode is that it runs the same Opus model with faster output. It does not quietly drop you to a smaller, dumber model. So now that the premium is smaller, it's reasonable to just leave it on for interactive work, the back-and-forth where you're sitting there watching the cursor. The action is tiny. Turn it on with the fast toggle on an Opus four point eight session and see whether the speed is worth the new, lower premium for the way you actually work. If you live in long agentic runs where latency compounds, this is the rare pricing change that makes the faster option the cheaper-feeling one.
Next, the latest Claude Code release, version two point one point one six one, went out yesterday, on June second. Two things in it are worth an actual setting change rather than a shrug. The first is for anyone who exports metrics. The OpenTelemetry data Claude Code emits now carries your resource attributes through as labels, which means the usage and cost dashboards we set up a couple of episodes back can finally slice by team and by repository instead of dumping everything into one undifferentiated bucket. If you export metrics, set the resource-attributes variable with a team tag and a repo tag, and your existing dashboard gets a lot more useful for free.
The second change helps when you fan work out across several agents at once, which is exactly where we've been heading these last few episodes. The agents view now shows a done-over-total count up front, before you drill into any single agent. So a wave of parallel work finally has something like a progress bar instead of a wall of identical spinners, and you can tell at a glance whether the fleet is making progress or stuck. Small change, but if you've run several worktrees side by side you know how blind that view used to feel. Per the Claude Code changelog, upgrade and confirm you're on two point one point one six one or later with the version flag.
Last, a quick read on the wider tooling. The list of Model Context Protocol connectors keeps growing on the consumer side, with a Strava connector landing this week so you can pull workout data into Claude in plain conversation. On its own that's not a developer story. But it's a fair signal that the connector lineup we talked about in the MCP episode is filling in fast, and that's the part that matters when you're deciding what to wire into your own setup. If you want to see what's available to you right now, run claude mcp and read the list, and prune anything you're not using while you're there, because every connected server spends context whether you call it or not. That's the news. Nothing earth-shaking shipped in the last day, and that's worth saying plainly rather than padding the segment with stale items, so we'll keep it honest and move on.
Plan hard before Claude writes a line, review harder after
Today is about two habits that sit on opposite ends of a single session, and they're the two highest-leverage things you can do without any extra tooling at all. The host calls them ultraplan and ultrareview. They aren't buttons you press or commands that ship in the box. They're workflows, deliberate ways of using primitives you already have, from the permissions and plan-mode episode, the slash-commands and hooks episode, and the cost episode. Ultraplan is making Claude think the whole change through, in writing, before it touches a file. Ultrareview is making Claude tear that work apart afterward, in the same session, before any of it reaches a pull request.
Here's the reason both matter so much. A model that starts editing immediately is cheap to start and expensive to finish. It writes confident code against a half-formed picture of your codebase, you catch the wrong assumption three files later, and now you're unwinding edits instead of writing them. Front-loading the thinking is the single biggest lever on how much rework you do. And the back-end review is what stops a plausible-looking diff from quietly shipping a bug. Plan well and skip the review and you ship polished mistakes. Review well but never plan and you spend the review cleaning up avoidable mess. You want both ends of the session covered.
What plan mode actually does
Start with the primitive underneath ultraplan, which is plan mode. We met it back in the permissions episode, so I'll keep this to the mechanics that matter for the workflow. Plan mode is a read-only state. Claude can read files, search the repo, run safe inspection commands like git status and git log, and build up a picture of what it's about to do. What it cannot do is edit, write, commit, or push. The edit and write tools are off, and anything that would change your working tree is blocked.
You get into it from the keyboard. Tap shift and tab together to cycle through the permission modes; one of those modes is plan mode, and you'll see a small pause icon sitting in the status line when you're in it. That little indicator is the thing to glance at before you trust that Claude is only looking, not touching. When Claude is done exploring, it stops and presents a plan for you to approve, and only when you approve does it leave plan mode and start making changes. Anthropic documents the flow in the common workflows guide.
One detail worth knowing at the approval moment: the plan Claude shows you isn't take-it-or-leave-it. You can send it back with feedback and it revises, you can edit the plan yourself before you accept, and you can keep iterating until it's right. The approval gate is a conversation, not a yes-no button. That's the hinge the whole workflow turns on, so treat it like the most important moment in the session, because it is. Everything cheap to fix is on the near side of that gate; everything expensive to fix is on the far side.
So out of the box, plan mode already gives you a research phase and an approval gate. That's the skeleton. Ultraplan is what you do to put muscle on it.
Make Claude ask before it plans
There's a step that belongs in front of planning, and most people skip it: let Claude interview you first. The failure mode of a fast model is that it fills every gap in your request with a confident guess and then plans around the guess. You ask for "activity logging on the dashboard," and it silently decides you mean page views, kept forever, shown in real time, when you actually meant API calls, kept thirty days, in a batch report. The plan looks great. It's a great plan for the wrong feature.
The fix costs you one sentence. Before it plans, tell Claude to ask you the questions it needs answered first, the decisions and the edge cases, instead of assuming. It'll come back with the things you hadn't pinned down: what to track, how long to keep it, who can see it, whether there are privacy rules in play, real-time or batch. You answer in a minute, and now the plan is built on locked requirements instead of guesses. This is the same logic as planning itself, pushed one step earlier: surface the buried assumption while it's still a question, because a question is free to change and a finished implementation is not. Slowing down here is how you speed up everything after.
A good ultraplan flow, then, has three beats before any code: interview, to lock the requirements; plan, to lock the approach; critique, to stress the plan. Only then do you exit to act.
Turn the plan into an artifact you can argue with
The first move that turns plain plan mode into ultraplan is to stop treating the plan as a throwaway message and start treating it as a document. Ask Claude, while it's still in plan mode, to write the plan to a file in the repo. Call it PLAN dot M D or a spec, whatever you like. The point is that a plan living in a file survives a context reset, it can be edited by hand, it can be reviewed line by line, and it can be handed to a fresh session later without re-deriving everything.
Once the plan is a file, you work it like a draft instead of accepting the first version. Read it. Where it's vague, push back. A plan that says "update the dashboard component" is too soft to constrain the act phase; a plan that says "add an activity_log table with a tenant id and a created-at index, a logging route that validates the session, and a server component that reads the last thirty days" actually pins Claude down. Vague plans produce vague, drifting implementations, so the time you spend sharpening the plan is the cheapest time in the whole session.
Then ask Claude to critique its own plan before you approve it. A simple prompt does a lot of work here: tell it to list the three riskiest assumptions in the plan, the parts most likely to be wrong, and what it would check to be sure. The model is genuinely good at poking holes in a plan when you explicitly ask it to switch from author to skeptic. You're cheap-failing on paper, where a mistake costs a sentence, instead of in code, where the same mistake costs a rewrite. Anthropic makes this same point in its Claude Code best practices: the plan-then-act split exists precisely so you catch the wrong direction before it becomes wrong code.
Spend your thinking budget where it pays off
The second move is to turn up how hard Claude thinks, but only during planning. This is where the "ultra" in ultraplan comes from. You can nudge reasoning depth with plain language in your prompt, the words "think," "think hard," and "ultrathink," which historically asked the model for progressively larger thinking budgets. On current versions the more durable knob is the effort level; the default here is Opus four point eight running at high effort, which is already a lot of reasoning before it answers.
Here's the part to internalize from the cost episode: thinking tokens bill as output tokens, and output is the expensive side of the meter, roughly five times the price of input. So thinking isn't free, and that's exactly why you want to spend it deliberately. Planning a real feature is the right place to burn a big thinking budget, because every reasoning token there saves you a multiple of that in rework. Asking for maximum thinking to rename a variable is the wrong place; you're paying premium tokens to over-engineer a thirty-second task. The discipline is simple: think hard on the plan, think normally on the trivia.
A useful way to picture the effort level is as a behavioral dial, not a hard cap. When you set it high, you're telling Claude to try hard, to consider alternatives and check its own reasoning before committing. On a genuinely tricky problem it'll think more; on a simple one it won't waste the budget, because newer models adapt the depth to the difficulty rather than spending a fixed amount every time. So the move isn't to crank everything to maximum. It's to reach for more thinking when the cost of being wrong is high, an architectural decision, a tricky migration, a security-sensitive route, and to leave it modest when you're doing something the model could do in its sleep.
There's a nice second effect. Plan mode is read-only, so all that heavy thinking happens against a clean, focused context window instead of one already cluttered with half-written files and tool output. That ties straight back to the context-window episode. A planning session that reads what it needs and reasons in one place gives you a better plan per token than a long, messy session that's been editing for an hour and is starting to forget its own earlier decisions. If you've ever watched a long session get slower and dumber as it fills up, this is the antidote: do the expensive reasoning early, while the window is still mostly empty and the model is still sharp.
And know when not to bother. Ultraplan is for changes with real surface area, a feature touching several files, a refactor, anything where the wrong approach is costly to unwind. A one-line fix or a rename does not need an interview, a written plan, and a self-critique; that ceremony costs more in thinking tokens and your time than the task is worth. Over-planning a trivial change is its own small failure, the planning equivalent of a meeting that should have been a message. Match the ritual to the stakes.
Codify the plan habit as a command
If you're doing ultraplan more than once, stop retyping it. This is the slash-commands episode coming back as substrate. Save your planning ritual as a custom slash command, a markdown file in the commands folder under your project's dot-claude directory, committed so the whole team gets it. The body is just the prompt you'd type anyway: research the codebase read-only, do not write any code, produce a numbered plan as a checklist, write it to a plan file, list the riskiest assumptions, and stop for my approval.
A few frontmatter fields make it sharper. A one-line description shows up in the command picker. An argument hint tells you what to pass. And the allowed-tools list lets you restrict the command to read-only tools, so the planning command literally can't reach for edit or write even if the conversation drifts. The arguments placeholder drops whatever you typed after the command into the prompt, so you can write something like "slash plan, add rate limiting to the login route" and it threads the feature description straight in. The full set of fields is in the slash commands docs. Five minutes to write, and ultraplan stops being a thing you remember to do and becomes a thing you type.
The way to find your command is to look backward. Skim your last week of sessions and find the paragraph you keep retyping, the planning preamble or the review checklist you paste in every time. That repeated paragraph is a slash command that hasn't been written yet. Pull it into a file, give it a one-line description, run it once, tweak the wording, commit. The whole loop is under five minutes, and the payoff compounds every time you'd otherwise have typed it again. This is the same move we made with skills back in the skills episode, just lighter weight: a skill is for expertise Claude loads when a task matches, a slash command is for a workflow you trigger by name.
After the code, the diff is the unit of review
Flip to the other end of the session. Claude has written the code. This is where ultrareview lives, and the most important mindset shift is this: review the diff, not your memory of watching it happen. You sat there while Claude made the edits, you nodded along, and that feeling of having seen it is exactly what makes you skip the actual reading. Don't. Run a plain git diff to see the working changes, and a git diff against your main branch to see the whole feature as one unit, before anything gets committed.
When you read that diff, or when you ask Claude to read it, you're hunting for specific things, not a general vibe. Hardcoded values that should be environment variables. Debug logging or a temporary shortcut left in. A database query that forgot to scope to the current tenant. A new route with no test. Log lines that print an email address or a whole request body. Anthropic's own review guidance is blunt about where to aim: don't waste the review re-flagging lint, formatting, or type errors, because your continuous-integration pipeline already catches those. Spend the human-and-model attention on logic, security, and the things automated checks can't see.
Take the tenant-scoping example, because it's the kind of bug that passes every test and still ends careers. In a multi-tenant Postgres app, a query that fetches activity rows without a where-clause on the tenant id returns everyone's data, not just the current account's. The types are fine. The tests probably pass, because your fixtures only have one tenant. The linter is happy. It's a logic-and-security bug, invisible to every automated check, and it's exactly the kind of thing a careful read of the diff catches and nothing else does. That's why the diff review isn't optional even when the build is green.
Which means the review actually starts with the boring automated gates, run first. Your typecheck, your linter, your tests, and a production build. The TypeScript compiler in no-emit mode tells you the types hold. The test run tells you behavior didn't regress. The build tells you it'll actually ship. Get those green before you spend any judgment on the diff, so that when you and Claude finally read the code, you're reading it for the mistakes machines can't find. Running them first also sharpens the review: a diff that already compiles and passes tests has nothing trivial left to distract you, so the logic and the security are all that's standing.
A review command and the built-in security pass
Just like planning, your review ritual wants to be a command. Write a review slash command that forces structure instead of inviting a thumbs-up: tell it to go through the diff against main, and for each change, state what it does, why it's correct, and one way it could be wrong. Make it call out new routes without tests, untenanted queries, and anything logged that shouldn't be. The trick is forcing concrete output. A review that's allowed to say "looks good" will say "looks good." A review that has to produce a finding per file produces findings.
For the security half, you don't have to write your own. Claude Code ships a security-review command that runs a focused security pass over your pending changes, reading the code the way a human reviewer would, following data flow across files to catch things a pattern-matching scanner misses, like broken access control, an auth bypass, or unsafe handling of untrusted input. There's a matching GitHub Action, Anthropic's claude-code-security-review, that does the same thing on a pull request and posts findings as comments. One caveat from Anthropic's own readme worth repeating: that action is not hardened against prompt injection, so if your repo takes pull requests from outside contributors, require approval before it runs on a fork. We'll come back to prompt injection properly in the autonomous-safety episode; for now, just know the gate exists.
Gate it with hooks so review can't be skipped
The weakness of any habit is the day you're tired and skip it. So wire the floor of your review into hooks, from the hooks episode, and take the discipline out of your hands. A pre-tool-use hook that matches the git commit command can run your typecheck and linter first; if either fails, the hook exits with the failure code and the commit is blocked before it happens. A second hook on git push can run the test suite the same way. Exit code two means stop, and Claude doesn't get to commit broken code no matter how the conversation is going.
That's the real division of labor in ultrareview. Hooks enforce the non-negotiable, mechanical floor, the stuff that should never, ever slip through, like a failing typecheck reaching main. The review command and the security pass handle the judgment layer on top, the logic and security questions a passing test suite says nothing about. Hooks make the floor automatic so your attention is free for the ceiling. You can read how to wire both in the hooks docs.
Write the habit into CLAUDE.md so it survives the session
There's one more place to anchor both halves so they don't depend on you remembering, and that's your CLAUDE dot M D file, the project memory we covered in the context-window episode. The hooks enforce the floor on the way out; CLAUDE dot M D shapes how Claude behaves on the way through. Put your key commands there so Claude knows how to typecheck, test, and build without being told each time. Put a short "before every commit" list there, run the typecheck, run the tests, read the diff against main, and Claude will tend to do it on its own as part of the work rather than as a separate request.
Keep it lean, though, and that warning is the same one from the context-window episode. A CLAUDE dot M D file that's grown to hundreds of lines of style rules eats your window on every single turn and quietly makes the model slower and more forgetful, the opposite of what you wanted. Stick to the things that actually change behavior: the commands, the review floor, the handful of conventions Claude keeps getting wrong. Run the memory command once in a while to see what it's actually carrying, and delete anything the model already does without being asked. The goal is a short, sharp file that encodes the plan-and-review discipline as the default, not a wall of text that buries it.
One full pass, start to finish
Let me put both halves together on a concrete feature, so you can see the whole arc. Say you're adding activity logging to the dashboard of a Next.js app backed by Postgres, the kind of stack most of us actually ship.
You start in plan mode. Shift-tab into it, point Claude at the feature, and let it read the existing schema, the dashboard components, and how sessions are handled. It comes back with a plan, and you have it write that plan to a file. The first draft proposes a logging table but says nothing about retention or tenant scoping, so you push back, and the revised plan adds a tenant id, a thirty-day retention story, and an index on the timestamp. You ask it for the three riskiest assumptions; it flags that it's guessing about how auth middleware exposes the current user, so it adds a step to verify that first. Now the plan is tight. You approve it, and only now does Claude leave plan mode and start writing.
It implements in the act phase, running your typecheck and tests as it goes because your CLAUDE dot M D file tells it to. When it's done, you don't just accept it. You run a git diff against main and read the whole feature as one change. You run your review command, which forces a finding per file, and it catches two things the green test suite said nothing about: the logging route never checks the session, so anyone could write rows for any tenant, exactly the assumption the plan flagged as risky, and the read query forgot its tenant filter, so the dashboard would show one account another account's activity. Both pass every test, because the tests only ever exercised one tenant. You fix them. You run the security-review command for a second angle, and it confirms the access-control hole is closed. Then you commit, and your pre-commit hook runs the typecheck and lint one last time before letting it through.
Walk back through what each habit caught. The interview pinned down retention and privacy before a line was written. The plan, stress-tested, flagged the auth assumption as the riskiest part. The act phase ran clean because the plan was tight. The diff review caught the two tenant bugs that no automated check could see. And the hook guaranteed the mechanical floor regardless of how the session went. No single step would have caught all of that. The point of ultraplan and ultrareview is that the two ends of the session cover each other's blind spots.
The pitfall: review theater
Here's the failure you'll actually hit, and it's sneaky because it looks like success. When the same session that wrote the code reviews the code, it tends to rubber-stamp its own work. The model has already told itself a story about why this implementation is correct, and when it reviews, it reads the diff through that story and nods along. You ask for a review and you get a paragraph of vague praise, "this looks solid, the logic is clean," with no line numbers, no edge cases, no actual findings. That's review theater. The motions of a review with none of the value, and it's worse than no review because it feels like cover.
You recognize it by the absence of specifics. A real review names files and lines and says what could break. A theatrical one speaks in adjectives. The moment your review output is all confidence and no findings, distrust it.
The fix is to break the model out of its own framing. The cleanest way is to review with a cold context. Clear the session, or better, hand the diff to a subagent, from the subagents episode, whose entire job is to review a diff it has never seen written. A reviewer that only sees the final diff, with none of the author's reasoning attached, is dramatically more thorough than one that watched itself write the code, because it isn't deferring to a story it already believes. You can also just force the issue in your prompt: make the review justify why each line is correct, not whether it is. "Why" is much harder to fake than "looks fine."
Now, notice the seam I'm leaving on purpose. A subagent reviewing a cold diff in your own session is the doorway to the next rung, a second agent that critiques while the first repairs, in a real review-and-fix loop. That's its own episode. Today the whole loop still lives in one session with you driving, and that's the right place to build the habit before you automate it.
One smaller pitfall on the plan side, quickly, so you can spot it. Plan mode is a strong behavioral constraint, not a hardware lock. On a very long, cluttered session, the read-only instruction can lose force as it gets buried under tokens, and occasionally an edit slips through that shouldn't have. Recognize it the same way you recognize anything: by checking, not trusting. Glance at the pause icon, and after you exit plan mode, run a quick git status and git diff before the first real command, so any edit that leaked during planning shows up immediately. Keeping plan sessions short and starting fresh for the act phase keeps that drift from ever building up. The point isn't paranoia, it's the same habit again: in a single long session, verify what happened instead of trusting that the mode held the whole way through.
Where this leaves you
Both of these are free. No new server, no pipeline, no second machine. Plan mode, a thinking budget you already pay for, a couple of custom commands, the built-in security pass, and two hooks. What they buy you is a session that catches its own mistakes at both ends, the wrong direction before it's code and the wrong code before it's committed.
If you only take one thing from this episode, take the shape: interview, plan, critique, then act, then review the diff cold, with the boring gates wired into hooks so the floor is never your problem. Everything else is detail you can add later. The shape is what makes a single session trustworthy, and a trustworthy single session is the unit everything in this show gets built out of from here.
That's the last big habit of driving one session by hand really well. From here the move is to stop being the one who has to remember to review at all, and let a second agent do it on its own. But you build that loop out of exactly these pieces, so get them solid first.