OCDevel
Walk
The logo for OCDevel Claude Code features clean, modern typography paired with minimalist developer-centric iconography representing the Claude command-line interface.
OCDevel Claude Code Podcast
The podcast for developers who live in Claude Code. A fast news segment on the latest Claude Code releases with a hands-on tutorial that levels up your agentic coding. The news covers what actually shipped across Claude Code and the wider Anthropic stack - new versions, models, pricing, plus the MCP servers, skills, and hooks worth your time. Then the tutorial climbs a single ladder across the series: from driving one Claude session by hand in your terminal, to power-user tooling (custom slash commands, subagents, MCP), to multi-agent fleets, to autonomous review-and-fix loops, to a full pipeline where you file a GitHub issue from your phone and Claude implements the feature, opens the PR, runs the tests, and ships to production while you're on the beach. Claude as the senior engineer on your one-person team. One copyable workflow and one real pitfall per episode - every command, flag, and setting named exactly as it appears in the tool. For working developers who want to stop typing every keystroke and start directing. AI-generated podcast by OCDevel.
CTA
Generated with OCDevel PodcasterMade with OCDevel Podcaster
This show was made with OCDevel Podcaster: turn any topic or text into an AI-narrated podcast episode that drops right into your feed.Turn any topic into an AI-narrated episode in your feed.Create your own →Create your own →

Headless Claude Code: drive claude -p and the Agent SDK from your scripts

17h ago

Take Claude Code out of the terminal and into your scripts. Print mode and structured JSON, the Claude Agent SDK in TypeScript and Python, chaining sessions, and the permission-and-cost discipline that keeps an unattended run from deleting your repo or running up an API bill once the June 15 billing change lands.

Show Notes

The Act II pivot from driving one Claude Code session by hand to calling it from a script: same agent, same loop, but you pre-decide what's allowed in code before the run ever starts.

The tutorial. Print mode (claude -p) as a Unix citizen — piping stdin (and the 10MB cap), the --bare flag for deterministic CI runs, and structured output via --output-format json (the result, session_id, total_cost_usd, and subtype fields), stream-json with the init and api_retry events, and --json-schema for typed data instead of prose. The run-bounding flags — --max-turns, --max-budget-usd, --model/--fallback-model, --allowedTools/--permission-mode — and chaining turns with --resume/--session-id/--fork-session. Why a model refusal can't be caught from the exit code. Copyable patterns: a commit-message generator (and the space-before-* permission footgun), a stdin-fed typo linter that needs no Bash permission, and a locked-down CI run. Then the Claude Agent SDK (renamed from the Claude Code SDK in September 2025): query() and the options that mirror the CLI flags, custom in-process tools, the Python ClaudeSDKClient, hooks and subagents in code, and the can_use_tool permission callback. Full reference in the headless docs and the migration guide.

The pitfalls. --dangerously-skip-permissions in an unattended run — how to recognize the silent-success failure, and the least-privilege allowlist that replaces it — and the June 15, 2026 billing change that moves Agent SDK and claude -p usage to a separate metered credit pool, plus how to watch total_cost_usd and bound it.

News. Claude Code 2.1.166 (June 6): a fallbackModel setting (up to three), thinking-off controls, a "*" deny-all glob, and a cross-session permission-escalation fix; latest is 2.1.167 (changelog). 2.1.163 added additionalContext from Stop hooks, /plugin list, and version-pinning settings. And Claude Opus 4.1 is deprecated, retiring on the API August 5, 2026 (release notes).

Earlier episodes referenced: CLAUDE.md and --resume, permissions and plan mode, custom slash commands and hooks, skills, subagents and the orchestrator pattern, MCP servers, cost and rate-limit engineering and evals, ultraplan/ultrareview, and parallel sessions with git worktrees.

Transcript

This week in Claude Code

This week moved fast on Claude Code, with several releases landing across just two days.

The headline is version 2.1.166, out June sixth. The biggest addition is a new fallback-model setting. You can now list up to three fallback models, tried in order, for when your primary model is overloaded or simply unavailable, and the fallback flag finally takes effect in interactive sessions, not just print mode. Claude Code will also retry a turn once on the fallback when the API returns an unexpected error. If you've ever had a session die in the middle of an Anthropic load spike, this is the fix. Drop a fallback-model into your settings and you degrade gracefully instead of failing hard. The details are in the changelog.

That same release expands your thinking controls. Setting max thinking tokens to zero, or passing thinking disabled, now genuinely turns thinking off, even on models that think by default. That's a real latency and cost lever for the times you don't need the reasoning. There's also a new glob in deny rules: a single asterisk in the tool-name position denies every tool at once, which makes locking a session down much simpler. And there's a quiet but important security fix. Messages relayed between Claude sessions no longer carry your user authority, so one session can't escalate permissions inside another. The current version as I record this is 2.1.167, a bug-fix patch on top.

Going back a couple of days, version 2.1.163 added a few things worth flipping on. Stop and SubagentStop hooks can now return extra context to feed Claude without it being treated as a hook error, so your end-of-session hook can cleanly nudge the model instead of looking like a failure. There's a new plugin-list command with enabled and disabled filters. And for organizations, new required-minimum and required-maximum version settings let Claude Code refuse to start when the installed version falls outside an approved range.

On the API side, June fifth brought a deprecation notice. Claude Opus 4.1 is being retired on the API on August fifth, 2026, with Opus 4.8 as the recommended migration, per the release notes. If any of your scripts or configs pin the old Opus 4.1 model id, grep for it now and move over before the cutoff so nothing breaks while you're not looking.

Running Claude Code without your hands on the keyboard

For the last dozen episodes you've been the human inside the loop. You drive a session, you read the diff, you approve the risky step, you press enter. Today we pull you out of the middle of that loop and turn Claude Code into something you can call from a script. Same agent, same tools, same loop. The only thing that changes is who's standing in the middle of it.

Anthropic describes that loop the same way for every one of their agents. Gather context, take action, verify the work, repeat. Interactive Claude Code runs exactly that loop, and so does headless Claude Code. The difference is that interactively, a human approves each risky step as it comes up. Headless, you've pre-decided what's allowed, in code, before the run ever starts. Hold onto that sentence, because every flag and every option we touch today is really about making that one pre-decision safely.

Here's something that surprised me when I went digging in the docs. The headless page and the Agent SDK page describe the same product. The command-line reference literally defines print mode as "query via the SDK, then exit." Print mode isn't a lightweight cousin of the SDK. It is the SDK, wearing a command-line costume. So we'll start with the command line, because it's the fastest way in, and then peel back to the SDK underneath it.

Print mode, where Claude becomes a Unix citizen

The entry point is print mode. You add dash p, lowercase, to any claude command, and instead of dropping you into the interactive interface, it answers your prompt, prints the result, and exits. The docs' own example is about as plain as it gets: claude, dash p, and then in quotes, "what does the auth module do." That's the whole thing. You get text back on standard out.

Because it exits cleanly and writes to standard out, print mode behaves like any other Unix tool, and that means the entire shell is suddenly available to you. You can pipe a file in. Cat a build error into print mode with a prompt like "concisely explain the root cause of this build error," and redirect the answer into a file. You can use command substitution, pipe a git diff straight into a prompt, chain the result through grep and jq. The model becomes one more stage in a pipeline.

One limit to know up front. As of version 2.1.128, standard input is capped at ten megabytes. Go over it and the process exits with a clear error and a non-zero status. If you're feeding something huge, write it to a file and point the prompt at the path instead of piping the whole thing in. Worth remembering before you pipe a giant log file and sit there wondering why it died on you.

Making CI runs deterministic with bare mode

There's a flag a lot of people haven't met yet, and it matters more than its size suggests. It's called bare mode. Run claude with the bare flag alongside print mode, and it skips auto-discovery of hooks, skills, plugins, MCP servers, automatic memory, and your CLAUDE.md. Faster startup, yes. But the real prize is determinism.

Think about what regular print mode actually loads. It pulls in the same context an interactive session would, including a teammate's hook sitting in their home-directory config, or an MCP server declared in the project's dot-mcp file. On your laptop, that might be fine. In CI, on a fresh runner, or on a colleague's machine, it's a different environment every single time, and your script behaves differently for reasons you can't see from where you're sitting. Bare mode reads none of that, so you get the same result on every machine.

In bare mode Claude has only three tools: run a shell command, read a file, edit a file. You hand it everything else explicitly. Extra instructions go in through the append-system-prompt flag, configuration through the settings flag, servers through the mcp-config flag. And because bare mode skips the keychain and the OAuth reads, your authentication has to come from an environment variable or a small key helper. The docs are blunt about this: bare mode is the recommended way to run scripted and SDK calls, and it'll become the default for print mode in a future release. So you may as well adopt it now. This is all spelled out in the headless docs.

Getting structured data out instead of prose

Now the single most important habit for scripting Claude, and the source of a pitfall I'll keep coming back to. By default, print mode gives you plain text. That's fine for a human reading the terminal, and a disaster for a script, because you end up writing regular expressions against English prose that changes its wording on every run.

Instead, reach for the output-format flag. It takes three values. Text is the default. Json gives you a structured object, the answer plus a pile of metadata. And stream-json gives you newline-delimited json, one object per line, arriving in real time.

The json envelope is where the good stuff lives. You get a result field, which holds the actual text answer. That's the one you usually want. You get a session id. You get the total cost in US dollars for that single invocation, along with a per-model cost breakdown, so a script can track spend per call without ever opening the usage dashboard. You get the duration, the number of turns, an is-error flag, and a subtype that tells you exactly how the run ended. You pull the answer out with a one-liner: pipe the json into jq, ask for dot result, done. Capture dot session id into a shell variable and you've got a handle to continue the conversation later.

If what you want is typed data rather than prose, there's an even cleaner option that's newer than a lot of people realize. It's the json-schema flag. You hand Claude a schema describing the shape you want back, say an object holding an array of function names, and the structured answer comes back in a dedicated structured-output field. No parsing prose, no hoping the model formatted its list the same way twice. You either get the object you asked for, or you get a schema violation you can detect mechanically. That distinction matters more than it looks, and we'll lean on it in a minute. The full flag list lives in the headless reference.

Streaming, and the events that tell you what's happening

When you want to watch a long run unfold rather than wait for the end, switch to stream-json. Each line is a JSON event as it happens. Pair it with the verbose flag and the include-partial-messages flag, and you can even stream the answer token by token, filtering on the text deltas with jq if you want that live-typing effect in your own tool.

Two events are worth knowing by name. The first is the init event, the very first line, which reports the model, the tools, the MCP servers, and the plugins that loaded, including a plugin-errors array. That array is your hook for failing a CI job when a plugin didn't load the way you expected. The second is the api-retry event, emitted right before a retryable failure gets retried. It carries the attempt number, the delay, an error status code, and an error category. Those categories include rate limited, overloaded, authentication failed, and one called billing error. Circle that last one in your mind. After June fifteenth, it's the event you'll be watching for, and we'll get there before the end.

The flags that put walls around a run

A handful of print-mode flags exist specifically to bound an unattended run, and they're the bones of safe automation.

Max-turns caps how many turns the agent takes, and when it hits the ceiling it exits with an error rather than grinding on forever. By default there's no limit at all, which is exactly what you don't want in a script that might loop. More directly, max-budget-usd sets a hard dollar ceiling. Pass it five dollars and the run stops when the estimated API cost reaches five dollars, ending with a subtype that says it ran out of budget. For a batch job, that one flag is the difference between a known maximum and an open tab you find out about later.

Then the model controls. The model flag picks your model by alias or by full name. The fallback-model flag, the one I mentioned in the news, auto-falls-back when your default is overloaded, and note the asymmetry: it takes effect in print mode and background sessions, but it's ignored in interactive ones. For batch work you'll often pin a cheaper model like Sonnet and let Opus stay home.

And then the permission flags, which are really the heart of all this. Allowed-tools lists the tools that run without prompting. Disallowed-tools denies them, with a subtle twist worth pausing on. Name a bare tool, and you remove it from the model's context entirely. But write a scoped rule, like deny any Bash command starting with rm, and the tool stays available while only the matching calls get blocked. There's also a separate tools flag that restricts which built-in tools exist at all, which is a different question from allowed-tools. One decides what's auto-approved; the other decides what's even on the table. We'll put all of these to work in the pitfall section.

One more pair that trips people up: system-prompt versus append-system-prompt. The append flavor adds your instructions on top of Claude's default prompt, preserving its built-in tool guidance, its safety instructions, and its coding conventions. The plain system-prompt flavor replaces the default entirely, which means you've thrown away all of that and you now own whatever the task still needs. The docs are clear: reach for replace only when you're building a non-coding agent in a pipeline that no human watches. The rest of the time, append, and keep the safety rails Anthropic wrote for you.

Chaining turns across separate processes

Each print-mode call is a fresh process, so how do you hold a multi-step conversation? Through the session primitives you already met back in the very first episode, the same resume mechanic, now driving a script instead of your fingers.

Continue, the dash c flag, picks up the most recent conversation in the current directory. Resume, dash r, takes a specific session id or name. You can even pin a session id up front with the session-id flag, handing it a UUID you chose rather than capturing one after the fact. And fork-session, when you resume, mints a brand-new session id instead of mutating the original, so you can branch off a run without trampling the version you came from.

The canonical recipe is two lines. Start a review with output-format json, and capture the session id into a shell variable. Then call print mode again with resume and that variable, saying "continue that review." Even though it's a completely new process, resume reloads the entire prior transcript from disk, every file it read, every bit of analysis it did, so the word "that" still resolves to the right thing. It's the resume button from episode one, except now a shell script is the one pressing it.

Why you can't trust the exit code alone

Here's a trap that bites people writing their first robust script. The Unix contract holds for hard failures. Standard input over the cap, max-turns exceeded, those give you a non-zero exit, so a simple error check catches them. But a model refusal, where Claude reads your task and politely declines in prose, is still a successful run at the process level. Exit zero. Subtype success. You cannot tell from the exit code that the agent didn't actually do the thing you asked.

So don't branch on the exit code alone for anything that matters. Use output-format json and branch on the is-error field and the subtype, which distinguishes a clean success from out-of-turns, out-of-budget, or an error during execution. Better still, force a json schema, so a refusal yields an empty or invalid structured output that your code can catch, instead of you grepping the prose for the word "sorry." And for a CI preflight, there's a cheap check: claude auth status exits zero if you're logged in and one if you're not, so you can fail fast before the real work even starts.

Scripting patterns you'll actually copy

Let's make this concrete with a few workflows you can lift straight into a project.

First, generating a commit message. Run print mode with a prompt like "look at my staged changes and write an appropriate commit," and allow it just the handful of git commands it needs: git diff, git log, git status, git commit. And here's a genuine footgun in the permission syntax, so listen closely, because it's the kind of thing that silently widens your blast radius. When you write an allow rule like Bash, git diff, space, asterisk, that trailing space before the asterisk actually matters. With the space, it matches any command that starts with "git diff." Without the space, "git diff" run together with the asterisk would also match git diff-index and other commands you never meant to allow. One missing space quietly broadens what the agent can run. That detail is called out directly in the headless docs.

Second, and this is my favorite because it's so cheap: a project typo linter, wired right into your package file as a script. Pipe git diff against main into print mode with a prompt that says, in effect, "you're a typo linter, report each typo as filename and line number, and nothing else." Notice the move here. Because you piped the diff in through standard input, Claude needs no Bash permission at all to read it. The data arrives as input; you never had to hand it a tool to go fetch it. Fewer tools, smaller blast radius. That least-privilege instinct, feed the data in rather than grant the tool, is genuinely the whole game in headless work.

Third, a permissioned CI run with no human approver. Pass permission-mode accept-edits, and Claude will auto-approve file writes and benign filesystem commands like mkdir and mv, but other shell commands and network calls still need an explicit allow rule, or the run aborts the moment one is attempted. There's a stricter mode called dont-ask, which denies anything not explicitly allowed or already in the built-in read-only set, and the docs flag it as the one for locked-down CI. In an unattended run there is nobody to click approve, so that allowlist, plus your permission mode, is your security boundary. That's the whole boundary. There isn't another one.

One limitation to call out, because it connects right back to earlier episodes. The slash-command skills you invoke by name interactively, like slash code-review, are interactive only. In print mode you cannot type slash code-review; you describe the task in prose instead. Skills still load as auto-invoked capabilities, and the SDK can expose them, but the slash-name invocation simply doesn't exist headless. There are a few exceptions that ship dedicated headless equivalents, like running ultrareview non-interactively against a target, which is a nice callback to the ultraplan and ultrareview episode, but the general rule holds: the interactive slash menu isn't there when no one's watching.

The Claude Agent SDK

Everything so far has been the command line. Now let's open it up, because for a real application you'll want the library that lives underneath print mode.

First, a naming thing that'll save you some confusion. In late September 2025, Anthropic renamed the Claude Code SDK to the Claude Agent SDK. The reason is the interesting part. The same harness that powers Claude Code turned out to power almost all of their agent loops, research, support, legal, video, not just coding. So the Python package moved from claude-code-sdk to claude-agent-sdk, the npm package moved out of the claude-code namespace into claude-agent-sdk, and the options class went from ClaudeCodeOptions to ClaudeAgentOptions. If you stumble on an older tutorial using the code names, that's your translation table. The details are in the migration guide.

What the SDK actually gives you is the load-bearing point of this whole episode. It is not a thin wrapper around a single model call. It hands you the same tools, the same agent loop, and the same context management that power Claude Code, programmable in Python and TypeScript. The tool-use loop, the automatic context compaction, the built-in file and shell tools, all of it runs inside your own process, on your own files.

Contrast that with the raw Messages API, the lower-level Anthropic client. With the client, you make one model call and you implement the tool loop yourself. You check whether the model wants to use a tool, you execute it, you feed the result back, you call again, around and around until it's done. Maximum control, and a lot of plumbing you have to maintain. With the Agent SDK, you write one line, async-for over a query, and Claude handles the tools autonomously inside the loop. Anthropic walks through exactly this contrast in their writeup, building agents with the Claude Agent SDK.

So when do you use which? The docs lay out a simple grid. Reach for the raw Messages API when you want a single call and your own tool execution, full control. Reach for print mode on the command line for interactive development, one-off tasks, and quick shell automation. Reach for the Agent SDK as a library for CI pipelines, custom applications, and production automation, where you want the full agent loop running on your own files and infrastructure, with the session state sitting as plain files on your own disk. The docs note that many teams use both: the command line for daily development, the SDK in production.

The SDK in TypeScript

In TypeScript you install the package under the at-anthropic-ai scope, claude-agent-sdk, and the binary for Claude Code ships bundled as an optional dependency, so you don't install the CLI separately. The main entry point is a function called query, and it returns something you can async-iterate. Each message streams out as the agent works. You pass it a prompt and an options object, you loop over the messages, and you pick out the result when it arrives.

That options object is where all those command-line flags reappear as fields. Model. Allowed-tools. Disallowed-tools. Permission-mode. Max-turns. The working directory. Max-budget-usd. Fallback-model. Mcp-servers. Hooks. System-prompt. And a permission-mode type whose values are the same modes we discussed: default, accept-edits, bypass-permissions, plan, dont-ask, and auto. If you learned the flags, you already know the options.

Two things the SDK gives you that the bare command line can't. First, custom in-process tools. You define a tool as a function right there in your code, give it a name, a description, and a schema for its arguments, and register it through a small in-process server helper, with no separate MCP subprocess to manage. The model sees it under the mcp double-underscore naming convention and can call it like any other tool. Second, live control of a running agent. The query object exposes methods to interrupt it mid-run, change its permission mode on the fly, swap the model, or stream new input in while it works. You're not just firing and forgetting. You can steer it in flight.

The SDK in Python

Python is the same shape with two flavors. You install claude-agent-sdk, and you need Python 3.10 or newer. If pip tells you there's no matching distribution, your interpreter is too old; that's the tell, not a broken package.

There are two entry points. The query function is the one-shot: async-for over it and you get a fresh session each call. ClaudeSDKClient is the stateful one, for a continuous conversation that remembers itself. You open it inside an async-with block, send a query, read the response, then send another query that still has the context from the first. Ask "what's the population of that city" after "what's the capital of France," and "that city" resolves correctly, because the client held the thread. It also exposes connect, interrupt, set-permission-mode, and set-model methods for the same live control the TypeScript side has.

Custom tools use a tool decorator plus a create-server helper, the same idea as TypeScript, just spelled in Python. And the SDK exposes the full Claude Code feature set in code. Hooks on events like pre-tool-use and post-tool-use. Subagents, defined as agent definitions and invoked through the Agent tool we built the orchestrator out of last week. Skills. Your CLAUDE.md memory. You scope what loads with the setting-sources option, and setting it to an empty or narrow list is the SDK's version of bare mode, a hermetic run that ignores stray config it would otherwise pick up.

The one SDK feature I want to single out is the permission callback, can-use-tool. It's a function that runs on every single tool call and returns either allow, optionally rewriting the arguments first, or deny, with a message explaining why. That is the SDK's real answer to "there's no human to approve." You write the approval logic as code. Deny any write under a system path. Allow reads anywhere. Rewrite a risky command into a safe one before it runs. The boundary stops being a person clicking a button and becomes a function you can unit-test, which is a much better place for a boundary to live.

Pitfall one, reaching for skip-permissions

Now the two ways people get burned with all of this, and how to recognize each before it costs you.

The first is the dangerous one, and it's tempting precisely because it makes the immediate error go away. In a headless run there's no human to approve tool calls, so people reach for the dangerously-skip-permissions flag, which is the same as setting permission mode to bypass. It skips every prompt. And in an unattended context, that flag removes the only thing standing between the model and your filesystem, your shell, and your network.

Here's why that's worse than it sounds. Combine skip-permissions with the Bash tool, and a misread instruction, or a prompt-injection payload hiding in a file or a diff or a web page the agent happens to read, can run arbitrary commands. Delete files. Curl your secrets out to someone else's server. Force-push over a branch. There's no approver standing there, and there's no undo. We'll spend a whole future episode on prompt-injection defense and sandboxing, but the seed of it is right here.

How do you recognize you're in this hole? That's the genuinely nasty part. The run looks like it worked. In json output it still reports subtype success. There's no prompt and no warning at runtime. You only find out afterward, when you see file writes, git operations, or network calls in the history that you never intended to happen. The silence is the symptom.

The fix is the least-privilege discipline we've been building all episode. Never skip permissions in CI. Instead, write a tight allowlist that names exactly the tools and command prefixes the job needs, minding that space-before-the-asterisk footgun while you do it. Set a baseline permission mode like dont-ask. Feed data in through standard input instead of granting a tool to fetch it. Cap the run with max-turns and max-budget-usd. In the SDK, deny by rule inside the permission callback, and log every action with a post-tool-use hook, which, by the way, is exactly the audit-log pattern the autonomous-safety episode is going to build on. And run bare, with narrow setting sources, so a stray hook in someone's home config can't quietly widen the agent's powers behind your back. The rule to carry out the door: in headless, allowlist what you need, and never blanket-skip. Skip-permissions belongs only in a throwaway sandbox you don't mind destroying, never against a real repo with real credentials attached.

Pitfall two, the June fifteenth billing change

The second pitfall isn't about safety, it's about your bill, and it's a direct callback to the cost-engineering episode. Starting June fifteenth, 2026, Agent SDK and print-mode usage on the subscription plans draws from a new, separate monthly Agent SDK credit, distinct from your interactive usage limits. This is laid out in Anthropic's support article on using the SDK with your plan.

Read carefully what's in and what's out, because the split is the whole point. The new separate credit covers your personal Agent SDK projects, non-interactive print mode, the Claude Code GitHub Action, and third-party apps authenticating through the SDK. What stays on your normal subscription limits is interactive Claude Code, web conversations, and Cowork. So the exact thing this episode teaches, headless and SDK runs, is the precise thing that just moved billing buckets. That's not a coincidence you can ignore.

The amounts, as published, are a monthly Agent SDK credit of twenty dollars on Pro, a hundred dollars on Max five-x, two hundred on Max twenty-x, with the team and enterprise tiers landing on similar numbers. And here's the trap baked into it. When that monthly credit runs out, additional usage flows to pay-as-you-go credits at standard API rates, but only if you've turned usage credits on. The credit doesn't roll over from month to month, and it can't be shared or pooled across your teammates. It's a small fixed allowance, per person, that resets and vanishes.

Why does this bite in headless specifically? Because interactive use is one human at a keyboard, naturally rate-limited by your own typing speed. A script is not. A CI matrix can fan out hundreds of print-mode calls in a few minutes, and a long SDK agent loop can churn through turns completely unattended overnight. Before June fifteenth that all drew on your subscription and felt free. After, it eats a small fixed monthly credit, twenty dollars on Pro, and then bills at full API rates beyond that. A nightly batch that felt free can hand you a real API bill by morning.

Recognizing it uses the same instrumentation we've been building this whole episode. In stream-json, watch for that api-retry event with the error category billing-error. Watch for runs that used to complete now aborting once the credit's exhausted. And above all, log the total-cost-in-dollars field from every json result and sum it; multiplied by your call volume, that number is your real spend, no dashboard required.

Bounding it is a checklist you already hold all the pieces for. Put max-budget-usd on every unattended call, a hard ceiling that ends the run cleanly. Pin a cheaper model like Sonnet for batch work instead of reaching for Opus on every call. Cap the turns. Use the prompt-cache-friendly flags, bare mode and the one that excludes the per-machine system-prompt sections, so repeated calls across machines reuse the cache instead of paying for it fresh each time. And run a promptfoo eval over print mode with json output before you ship the batch job, measuring your cost-per-task on a small sample instead of discovering it live in production. That's exactly the setup we built back in the cost episode, now earning its keep. One last decision, and make it on purpose rather than by accident: leaving usage credits off means the job halts at the cap, which is predictable and gives you no surprise bill; turning them on means the job finishes but can run up real charges. Pick per workload. Don't just default into whichever one happens to be set.

Where this goes next

Step back and notice what you can do now that you couldn't an hour ago. You can call Claude from a shell script or a Python program, get structured data back, chain turns across separate processes, bound both the cost and the blast radius, and define your own tools and your own approval logic in code. That's the agent loop with you standing outside it instead of inside it.

Everything from here builds on exactly that. The Claude Code GitHub Action, which the headless docs literally list as the next step, is this same machinery wired up to at-mention Claude on an issue or a pull request, and it's billed out of that same new credit pool we just talked about. Review-and-fix loops, label-driven runs, auto-PR workflows, they're all scripts and SDK calls with a trigger bolted on the front. And the permission discipline you practiced today, allowlist what you need, feed the data in, cap the run, log every action, is the on-ramp to the autonomous-run-safety episode, where boxing the agent in stops being a side note and becomes the entire topic. You've stopped pressing enter. Next, we start deciding what gets to press it for you.