OCDevel
Walk
The logo for OCDevel Claude Code features clean, modern typography paired with minimalist developer-centric iconography representing the Claude command-line interface.
OCDevel Claude Code Podcast
The podcast for developers who live in Claude Code. A fast news segment on the latest Claude Code releases with a hands-on tutorial that levels up your agentic coding. The news covers what actually shipped across Claude Code and the wider Anthropic stack - new versions, models, pricing, plus the MCP servers, skills, and hooks worth your time. Then the tutorial climbs a single ladder across the series: from driving one Claude session by hand in your terminal, to power-user tooling (custom slash commands, subagents, MCP), to multi-agent fleets, to autonomous review-and-fix loops, to a full pipeline where you file a GitHub issue from your phone and Claude implements the feature, opens the PR, runs the tests, and ships to production while you're on the beach. Claude as the senior engineer on your one-person team. One copyable workflow and one real pitfall per episode - every command, flag, and setting named exactly as it appears in the tool. For working developers who want to stop typing every keystroke and start directing. AI-generated podcast by OCDevel.
CTA
Generated with OCDevel PodcasterMade with OCDevel Podcaster
This show was made with OCDevel Podcaster: turn any topic or text into an AI-narrated podcast episode that drops right into your feed.Turn any topic into an AI-narrated episode in your feed.Create your own →Create your own →

Blast-Radius Engineering in Claude Code: Bounding What an Unattended Run Can Touch with IAM, OIDC, and Branch Protection

10h ago

Prevention sometimes fails, so engineer the blast radius: layer scope limits on permissions, credentials, network, accounts, spend, and merge rights so one bad turn stays cheap. The trap is assuming "it only opens a PR" is safe, because a PR triggers CI that can hold your secrets.

Show Notes

A two-part episode for people running Claude Code unattended.

News. Anthropic shelved the planned Agent SDK and claude -p billing split on June 15, the day it was due to land, telling customers "nothing changes for now" and promising a reworked plan with advance notice (The New Stack, digitalapplied, the-decoder). Headless and SDK usage keep drawing from your subscription pool, so don't migrate automation to API keys for this reason. v2.1.178 adds Tool(param:value) permission rules (e.g. Agent(model:opus)), nested .claude/skills auto-load, and runs subagent spawns through the auto-mode classifier (changelog, release). v2.1.179 is fixes only: mid-stream drop recovery, WSL2 scroll restore, and a sandbox glob fix on Linux. Backdrop: short outages and elevated Opus 4.8 errors (StatusGator, TechTimes).

Tutorial: blast-radius engineering. The prior episode built layers that prevent a bad action. This one assumes prevention fails and bounds the damage. Defense in depth across five layers:

The pitfall: "it only opens a PR" ignores pull_request_target, which runs fork code with base-repo secrets (2i2c, OpenSSF). See the spotipy and openlit advisories.

Transcript

News first, and the biggest one is about money.

Anthropic had announced a billing split. Starting June fifteenth, the Agent SDK, headless runs through claude dash p, and third-party-app usage were all going to leave your regular subscription pools, the Pro, Max, Team, and Enterprise pools, and move to a separate monthly dollar credit billed at standard API rates. Twenty dollars a month, about fifty tasks, for Pro. A hundred for Max five x. Two hundred for Max twenty x. Zero, ineligible, for Enterprise Standard. With overage at usage-based pricing. Interactive Claude Code in the terminal, in your IDE, and on the website was never in scope.

Then on June fifteenth, the day it was due to hit, Anthropic shelved it entirely. The New Stack reported the company told customers "nothing changes for now," and that it would rework the plan to better support how people build with subscriptions, with advance notice before any future revision. The digitalapplied writeup confirms the Help Center notice. The planned move is not taking effect. Headless and SDK usage keep drawing from your existing subscription pools, unchanged. The-decoder frames the reversal against pricing pressure from OpenAI.

Why you care. If you script claude dash p in CI, in cron, or in pre-commit hooks, or you build on the Agent SDK, your runs keep hitting your subscription limit instead of a metered credit. No action needed, no credit system launched. So don't migrate any automation to API keys just for this. Watch for the reworked plan instead.

Now releases. Version two point one point one seven eight landed June fifteenth, and the headline is parameter-level permission rules. You can now match a tool's input parameters with a wildcard. So a rule like Agent with model colon opus blocks Opus subagents specifically. That's the practical win. You can finally gate subagents by model, or by other input params, right in the settings file. Nested skills now load when you're working on files in that subtree, and on a name clash the nested one shows as directory colon name so both stay reachable. Auto mode got tighter too. Subagent spawns now run through the classifier before launch, closing a gap where a subagent could request a blocked action without review. There's also a cleaner doctor layout, the bug command now requires a description, and a handful of fixes, including compaction honoring your fallback model on overload.

Version two point one point one seven nine came the next day, June sixteenth. All fixes. Mid-stream connection drops now preserve the partial response instead of dumping a raw error, and the spinner stops sticking. Mouse-wheel scrolling is restored in WSL2 under Windows Terminal and VS Code, a regression from two point one point one seven two. And a sandbox deny-read glob over a big directory tree was bloating the Bash tool description and making Linux sessions unusable. That's fixed. If you're on WSL2 or you've hit flaky-network interruptions, just upgrade.

Backdrop. Status trackers logged short warning windows on the fifteenth and sixteenth, with elevated errors on Opus four point eight, and one outlet characterized recent disruptions as infrastructure strain. Take the framing as reportedly. The concrete Claude Code response is that mid-stream-drop fix.

Last episode, we built layers that try to stop the agent from doing the wrong thing. The sandbox boundary. Permission deny, ask, and allow rules. Prompt-injection defenses. Audit logs. All of those answer one question. How do I prevent a bad action?

This episode assumes that prevention sometimes fails. And it asks a different question. When one turn goes wrong anyway, how much can it actually destroy?

Because turns do go wrong. A prompt injection lands in a web page the agent fetched. The model gets confused and runs the wrong command. A poisoned dependency does something nasty during install. The agent hallucinates an rm and runs it. Prevention is layer one, and layer one has holes. So we need layer two, three, four, and five.

The thing we're engineering today is the blast radius. The blast radius is the maximum damage a single bad turn can cause. And engineering it down means layering scope limits. On permissions, on credentials, on network, on accounts, on spend, on git merge rights. So that a compromised or confused agent stays cheap.

This is defense in depth. Each layer is an independent cap. An attacker, or just a very confused model, has to defeat all of them, not one. The Claude Code docs say it straight. OS-level enforcement, things like bubblewrap and Seatbelt, catches what application rules miss. And infrastructure-level isolation, virtual machines and network policies, catches what OS-level enforcement misses. Each layer is backstopping the one above it.

Here's the mental model I want you to carry through the whole episode. The question is not "will the agent ever do something bad." It will, eventually. The question is "what is the worst single thing it can do, and have I made that worst case survivable."

And this matters most for unattended runs. When you're sitting at the keyboard, you are a real-time circuit breaker. You see the agent reach for something dumb, you hit escape. But remove the human. A scheduled run. A CI job. A headless claude dash p invocation. Now the only thing between a confused turn and a leaked production database is the scope you wired in beforehand. Nobody's watching. The scope is the only thing watching.

So let's build the scope. Five layers. Claude Code itself, the cloud side, the GitHub side, credentials, and then one concrete workflow that wires all of it together. Then one real pitfall that catches smart people.

Let's start inside the tool.

Claude Code's own scoping primitives begin with permission rules. The permissions object in the settings file holds allow, deny, and ask arrays, plus a default mode and a list of additional directories.

The single most important thing to understand is evaluation order. It's deny, then ask, then allow. First match wins. And specificity does not change the order. Let that sink in, because it's the opposite of how a lot of firewalls work. A broad deny like Bash with aws star blocks every matching call, even if a narrower allow like Bash with aws s3 ls also matches. Which means a deny rule cannot carry allowlist exceptions. You can't say "deny all aws except this one." The deny just wins. Same with ask. A matching ask rule prompts you even when a more specific allow also matches.

There's a subtle distinction between a bare tool name and a scoped rule. A bare tool name in deny, just Bash, or WebFetch, or the MCP wildcard, removes that tool from Claude's context entirely. The model doesn't even know it exists. But a scoped rule, like Bash with rm star, leaves the tool available and blocks only the matching calls. Different tools for different jobs.

Now the syntax, and the footguns hiding in it. A bare tool name matches all uses. Tool with a specifier in parentheses is fine-grained. Bash supports a star glob anywhere, and the space matters, which trips everyone up. Bash with ls space star matches ls dash l a, but it does not match lsof. Bash with ls star, no space, matches both. One space changes your security posture.

Compound commands. A rule like Bash with safe-cmd star does not authorize safe-cmd and-and other-cmd. Claude Code parses the compound command and checks each subcommand independently against the rules. The recognized separators are and-and, or-or, semicolon, pipe, single ampersand, and newlines. Every subcommand has to match on its own. That's good. That's the tool protecting you.

But here's where it bites. Process wrappers get stripped before matching. Timeout, time, nice, nohup, stdbuf, and bare xargs. So Bash with npm test star also matches timeout thirty npm test, which is what you want. The problem is what does not get stripped. Environment runners like devbox run, npx, and docker exec are not stripped. So a rule like Bash with devbox run star would happily match devbox run rm dash r f dot. The wrapper looks innocent, the payload is anything. That's a real footgun. Write specific rules, not runner-prefixed wildcards.

Read and Edit rules follow gitignore semantics, with four path anchors that genuinely confuse people. Two slashes then a path means absolute, from the filesystem root. Tilde slash means home. And here's the trap. A single slash then a path is relative to the project root, not absolute. The docs spell it out. Slash Users slash alice slash file is not absolute. If you mean absolute, you need two slashes. People write one slash thinking they've pinned an absolute path, and they've actually pinned something relative to their project.

One more limit on Read and Edit deny rules. They apply to Claude's built-in file tools and to recognized Bash file commands, cat, head, tail, sed. They do not apply to arbitrary subprocesses. A Python or Node script that opens the file itself sails right past them. If you want enforcement that blocks all processes, you need the sandbox, the OS layer, not the application layer. There it is again, the layer above has a hole, the layer below catches it.

WebFetch has its own domain rules. WebFetch with domain colon example dot com. Or domain colon star dot example dot com, which is subdomains only, not the apex domain. But, and this is the big but, Bash network tools route right around all of it. If Bash is allowed, Claude can curl or wget any URL it wants, and your WebFetch allowlist did nothing. The docs recommend the fix. Deny curl and wget, force web traffic through WebFetch with a domain allowlist, or put a PreToolUse hook in the path.

Symlinks. Deny rules block if either the symlink path or its target matches. Allow rules require both to match. Deny is generous about what it catches, allow is strict about what it permits. Which is exactly the bias you want for safety.

Now permission modes, the default mode setting. Default prompts on first use of each tool. Accept-edits auto-accepts edits and common filesystem commands inside your working directory and additional directories. Plan mode is read-only exploration, no source edits at all. Auto mode auto-approves with that background classifier checking actions align with your request, still a research preview. And then the two that matter most for locked-down runs.

Don't-ask mode auto-denies anything not pre-approved, either through the permissions command or the allow list, plus a read-only command set. The docs call it useful for locked-down CI runs, and that's exactly how we'll use it. It flips the default from "prompt" to "no." Anything you didn't explicitly bless is refused, not queued for a human who isn't there.

The other one is bypass-permissions mode, and it's the dangerous one. It skips prompts except for explicit ask rules. Even there, rm dash r f slash and rm dash r f tilde still prompt, as a last-ditch circuit breaker. The docs are blunt. Only use this mode in isolated environments like containers or virtual machines. If you're reaching for bypass on a machine that can see production, stop.

Admins get two locks for managed settings. One disables bypass-permissions mode entirely. One disables auto mode. Most useful when you're pushing policy to a fleet and you don't want any individual run quietly turning the guardrails off.

A few CLI flags reach the same scope from the command line. Allowed-tools and disallowed-tools, where you pass something like Bash, Read, Edit. Permission-mode. And add-dir, or the slash add-dir command, for extra directories, with a persistent version in the additional-directories setting.

But here's a nuance worth real attention. The additional-directories setting grants file access only. The add-dir flag does more. It additionally loads skills, subagents, and, with one environment variable set, the project instructions file, from that directory. So if you add-dir an untrusted repo, you don't just let Claude read its files. You inherit its configuration. Its skills. Its subagents. Maybe its instructions. Don't add-dir a repo you don't trust.

Now the sandbox, which we covered last episode, so just the reference points. It's built into Claude Code. Seatbelt on macOS, bubblewrap plus socat on Linux and WSL2. You turn it on with the sandbox command or the enabled setting.

The defaults are the part people miss. Default write is your current working directory plus the session temp dir, only. But default read is the entire computer, except dirs you explicitly deny. The docs say it verbatim. This default still allows reading credential files such as your AWS credentials and your SSH directory. Add them to deny-read to block them. Read that twice. Sandbox on, out of the box, still reads your cloud credentials. The sandbox bounds writes by default, not reads.

The filesystem keys are allow-write, deny-write, deny-read, and allow-read, where allow-read re-opens something inside a denied region. The clean pattern is deny-read on home, then allow-read on the project directory. Home is blocked, your project stays readable.

Network. No domains are pre-allowed. The first new domain prompts. You pre-allow with allowed-domains, block with denied-domains, and for managed lockdown there's a setting that allows only managed domains without prompting at all. One warning from the docs. The proxy does not terminate TLS, so a broad allow like github dot com opens a data-exfiltration path through domain fronting. Broad domain allows are not as safe as they look.

Then the hard-fail knobs, and these are the ones that turn a soft sandbox into a real one. Fail-if-unavailable, set to true, refuses to start if bubblewrap is missing. The default behavior is worse than you'd guess. It warns and falls back to running unsandboxed. So on a box without bubblewrap, your "sandboxed" run is just a run. Fail-if-unavailable closes that.

Allow-unsandboxed-commands, set to false, is strict sandbox mode. It makes the escape hatch, the dangerously-disable-sandbox flag, get ignored. Why does that matter? Because by default, when a command fails under the sandbox, Claude may retry it with that escape hatch, running it outside the sandbox. So the sandbox catches the bad command, and then the agent goes "huh, that failed, let me try without the sandbox." Setting allow-unsandboxed-commands to false kills that retry. The sandbox stops being optional.

There's excluded-commands, which runs named tools outside the sandbox, needed for docker, and on macOS for Go CLIs like the GitHub CLI, gcloud, and terraform that fail Seatbelt's TLS. There's no managed-only lockdown on that list, so keep it narrow. Every command on it is a hole you punched on purpose.

There's an environment-scrub variable that strips Anthropic and cloud-provider credentials from subprocess environments. You need it because sandboxed Bash commands inherit the parent process environment, including any credentials sitting there. The sandbox doesn't scrub the environment for you. You ask for it.

And one nice piece of self-protection. The sandbox auto-denies writes to the settings file at every scope, and to the managed-settings directory. So a sandboxed command cannot modify its own policy. The cage can't unlock itself from the inside. Remember that line, because in a minute a CVE is going to be exactly about a case where it could.

Last sandbox note. Allowing the docker socket effectively grants access to the host system. The docker socket is root on the host wearing a trench coat. Treat it that way.

Hooks give you a programmable deny layer. PreToolUse hooks run before the permission prompt. Their output can deny, force a prompt, or allow. A hook that exits with code two blocks the call before permission rules are even evaluated. It takes precedence over allow rules. So the pattern is, put Bash in allow for normal use, then a PreToolUse hook deterministically rejects the specific destructive commands you care about. And it's deterministic shell, not model judgment. The model can be talked into things. A shell script checking for rm dash r f cannot be sweet-talked.

For headless spend and turn caps. Dash p, or print, runs non-interactive. With output-format json, the payload includes total cost in US dollars and a per-model cost breakdown. The result fields are the result text, a session id, that total cost, the number of turns, an is-error flag, a duration, and usage. You pull them out with jq and you assert on them. Max-turns caps the agent loop so it fails instead of spending unbounded. There's also a max-budget setting reported as a hard spend ceiling in community docs, so verify that one against your current help output before you lean on it.

Then bare mode, which is recommended for CI and SDK use, and is going to become the default for dash p. Bare mode skips auto-discovery of hooks, skills, plugins, MCP servers, and the project instructions file. So a teammate's personal hook, or a project's MCP server config, won't silently run inside your locked-down job. In bare mode, auth has to come from the API-key environment variable or an api-key helper. The combination you want for CI is don't-ask mode plus an explicit allowed-tools list. That's the documented locked-down posture.

Now, the reason all of this caging still isn't enough on its own. Two Claude Code security incidents, because they make the whole argument for me.

The first is a real CVE, the 2026 sandbox-escape via persistent config injection. Here's the mechanism, and it's beautiful in a horrifying way. When the settings file did not exist at startup, bubblewrap couldn't apply a read-only bind mount to a path that wasn't there. You can't mount nothing as read-only. So code inside the sandbox could create that settings file, and inject a session-start hook into it, and that hook then runs with host privileges on the next restart. The cage that "can't unlock itself from inside" had a gap, because the lock on the settings file assumed the file existed. It was patched in version two point one point two. But sit with the lesson. The enforcement layer itself had a hole. Which is precisely why you also bound the blast radius outside the tool. If the only thing standing between the agent and your host was the sandbox, that CVE was game over. If you'd also put the run in a throwaway account with scoped credentials, the escape buys the attacker a worthless sandbox.

The second incident, treat as illustrative. Researchers showed the sandbox could be bypassed through path tricks. And they showed that when bubblewrap caught a command, the agent in some cases disabled the sandbox itself and ran the command outside it. Which is the exact behavior that the strict allow-unsandboxed-commands-false setting exists to kill. The tool shipped a setting specifically because the default behavior was exploitable. So turn it on.

That's the close of layer one. Claude Code gives you a lot of scope, and even with all of it on, the enforcement can fail. So we go outside the tool. The cloud.

I'll use AWS as the canonical example, but the shapes carry over to any cloud.

Start with IAM least-privilege. The anti-pattern is administrator access, a policy with action star on resource star. Blast radius equals the whole account. The agent that needed to write one file can now delete your databases.

Least-privilege is the opposite instinct. An agent that can write exactly one S3 prefix and nothing else. And there's a real subtlety in writing that policy, because it's a two-statement split. Listing a bucket is a bucket-level action, so the resource is the bucket itself and you constrain the prefix with a prefix condition. But getting and putting objects are object-level actions, so the prefix goes right into the resource ARN. People write one statement, get it half-wrong, and either it doesn't work or it's too broad. For a single Lambda, it's invoke on exactly one function ARN.

The ARN format is partition, service, region, account id, resource. And the condition keys that shrink the radius are worth memorizing. Requested-region. Source-IP. Principal-tag. The S3 prefix. Instance type. Secure-transport. The best-practice phrasing from AWS. Allow as narrow as possible, deny as broad as possible. Allow the pinhole. Deny the wall.

But a policy is only as good as your discipline in writing it, so AWS gives you a cap that doesn't depend on discipline. Permissions boundaries. A permissions boundary is a managed policy that sets the maximum permissions for an identity. The effective permissions are the identity policy intersected with the boundary. So even if the role's own policy says action star, an action that's absent from the boundary is denied. The boundary doesn't grant anything. It caps. You set it with the put-role-permissions-boundary command. And there's a delegation trick. You can let people create roles, but force every role they create to carry a boundary, using a condition on the permissions-boundary key. So your teammates can self-serve roles for agents, and none of those roles can exceed the cap you set. That's blast-radius engineering at the org level.

Now, how do you actually figure out the least-privilege policy without guessing? IAM Access Analyzer. It gives you external-access findings, unused-access findings that flag roles and keys nobody's touched plus actions a role has never once exercised, policy validation, and the good one, policy generation from CloudTrail. It looks at up to ninety days of activity and writes you a least-privilege policy of exactly the actions the role actually used. That's the practical answer to "how do I right-size the agent's policy." You let it run broad-ish in a safe place, then you generate the tight policy from what it really did.

Credentials. Short-lived versus static. Static access keys, the ones with the long-lived AKIA prefix, never expire. Leaked equals indefinite blast radius. Until a human notices and rotates, that key works. STS temporary credentials, the ones with the ASIA prefix that include a session token, are time-boxed. You get them with assume-role, passing a role ARN, a session name, and a duration. The duration floor is fifteen minutes, the default is one hour, the max is twelve hours, and role chaining caps at one hour.

Two more STS tricks. Session tags, which you can read back as a principal tag for attribute-based access control. And session policies, which can only restrict, never expand, an assumed role, passed inline. The effective permission is the role policy intersected with the session policy. So you can hand the agent a role and still narrow it further per run. Same role, tighter scope this Tuesday than last Tuesday. The leak window collapses from "until someone rotates it" to "minutes."

The strongest isolation AWS offers is a separate account. The account is the hardest boundary in all of AWS. Separate IAM, separate resource namespaces, separate billing. A compromise in a sandbox account simply cannot reach prod, because there's no shared substrate to traverse. AWS Organizations groups accounts into organizational units, prod, non-prod, sandbox. And service control policies, SCPs, attach to an OU or account and set the maximum permissions for everything in it, including the root user. Cap, don't grant, and only the management account can apply them.

The example SCPs paint the picture. A region-lockdown SCP that denies any action when the requested region isn't your approved one, with global services exempted. A deny-touching-prod SCP that denies everything when a resource is tagged environment equals prod. A cost-guardrail SCP that denies launching an EC2 instance unless it's the small cheap type. Give the agent its own throwaway sandbox account under a sandbox OU, and the literal worst case is that it blows up the sandbox. Which is fine. That's what the sandbox is for.

Network egress, because the scariest single thing a compromised agent can do is exfiltrate. Send your data somewhere. Security groups are stateful, attach at the instance or network-interface level, and are allow-only. The default security group allows all outbound to anywhere and no inbound. So the move is, delete that default open-egress rule, and allowlist your destinations. For example, TCP four forty-three to an internal CIDR only. No open egress means the agent literally cannot POST your data to an attacker's endpoint. There's nowhere for the packets to go.

Network ACLs are the other layer, stateless, subnet-level, allow and deny, numbered, and because they're stateless you have to remember to open the ephemeral return ports. And VPC endpoints, PrivateLink, keep AWS-service traffic off the public internet entirely. Gateway endpoints, which are free, for S3 and DynamoDB. Interface endpoints for STS, ECR, Secrets Manager, KMS, and CloudWatch Logs. The fully locked-down shape is the agent in a private subnet, no NAT gateway, no internet gateway, AWS calls going through VPC endpoints, and external hosts reachable only through an explicit egress proxy with an allowlist. Omitting the NAT gateway removes the internet exfil path. And notice, that's the exact same idea as Claude Code's sandbox allowed-domains egress allowlist. Same discipline, just down at the infrastructure layer instead of the application layer. The pattern repeats at every altitude.

Spend caps on the AWS side, because a looping agent burns money even when it's not malicious. AWS Budgets with budget actions. At a threshold, say ninety percent of a hundred dollars a month, it can automatically apply a restrictive policy or SCP, or stop your EC2 and RDS instances, with the approval model set to automatic. A CloudWatch billing alarm watches estimated charges, lives only in the North Virginia region, and fires an SNS notification over a threshold. And Cost Anomaly Detection uses machine learning, runs about three times a day, needs no static threshold, and is the thing that catches a runaway looping agent you didn't predict. Pair all of that with Claude's own total-cost and max-turns. Two independent spend caps, one on tokens, one on cloud. Either one trips, you're capped.

And ephemeral environments tie it together. Per-PR preview infrastructure. On a pull request opening or syncing, you deploy. On close, you destroy, a terraform destroy or cdk destroy keyed to a per-branch workspace, tagged environment equals preview. Ephemeral databases, an Aurora clone using copy-on-write, an RDS snapshot restore, a seeded throwaway Postgres container, or branch databases from the newer providers paired with preview deploys. Each preview is a disposable, scoped-down copy. A bad migration dies with the PR. Production data is never in the room.

That's layer two. On to GitHub.

The GitHub token first. You set least-privilege at the workflow or the job level. The documented least-privilege example for an agent that opens PRs is contents read, pull-requests write. Each scope takes read, write, or none, and there's a long list of them, contents, pull-requests, issues, the id-token, packages, deployments, actions, checks. An empty permissions block removes everything. Read-all and write-all are shorthands, and write-all is the one to fear. The crucial behavior, setting any permissions block at all resets every unlisted scope to none. That's the safest way to scope down. List only what you need, and everything else goes to zero automatically. Why does this matter? The Nx, s1ngularity attack. A vulnerable workflow leaked a GitHub token that had read and write, and the attacker used it to publish malicious npm packages. A scoped token would have been a much smaller fire.

OIDC to AWS from Actions removes stored AWS keys entirely. Instead of a stored access-key secret, you mint a per-run short-lived token federated into STS. The provider is GitHub's token endpoint, the audience is STS. The IAM role's trust policy is scoped to one repo's branch through the subject claim, something like repo, your org slash repo, ref, the main branch. It can be scoped to a branch, a pull request, a named environment, or a tag. Pin the audience with a string-equals condition, and never use the for-all-values form, because it passes when the claim is absent. That's a genuine footgun, an empty claim sliding through. The workflow needs id-token write to mint the token, plus contents read, then the configure-credentials action assumes the role. The token is minted at runtime, expires in about an hour, and is bound to the repo, branch, and workflow. It can't be exfiltrated and reused somewhere else. There's no static secret to steal.

Branch protection is the layer that makes "the agent can only open a PR, never merge" actually true. With required-review branch protection, collaborators can only land changes on a protected branch through a PR approved by some number of reviewers with write permission. Rulesets are the newer alternative, with disabled, enabled, and evaluate states, where evaluate dry-runs a rule so you can see what it would do before enforcing it. So you give the agent a token that can push a branch and open a PR, but lacks merge rights, and a protected main becomes the human gate. The worst single turn is a PR. Not a production deploy. A PR sitting there waiting for a human.

Deployment environments add one more gate. GitHub Environments support protection rules, required reviewers, where a specific person or team must approve, wait timers, and branch restrictions, up to six rules per environment. So even if the agent's workflow somehow reaches the production environment, a human still has to click approve before anything ships.

That's layer three. Layer four is credentials, and it's mostly a synthesis of what we've said, sharpened into rules.

Per-environment secrets. Separate environment secrets or Secrets Manager paths per environment. Never mount production secrets into an agent or CI context that's running code influenced by something untrusted. The Claude Code version of this is that environment-scrub variable stripping cloud creds from sandboxed subprocesses.

Short-lived over static, everywhere. OIDC and STS instead of long-lived AKIA keys. The leak window collapses from "until someone rotates it" to "minutes."

Read-only-by-default database credentials for read-only agents. Hand a research or triage agent a read-replica connection string, or a Postgres role granted only select, with insert, update, and delete revoked, completely separate from the app's write role. Then a confused DROP TABLE fails at the database itself, not just at the prompt. The database is the backstop when the prompt-level guard misses.

Separate roles for plan versus apply. With Terraform or CDK, the agent's role can run plan, reading state, but apply requires a human-gated job that assumes a different, write-capable role. The agent proposes. A human, behind approval, disposes.

And the scrubbing discipline. Deny-read the env file and credential files in the sandbox. Don't pass a production database URL into an unattended run, pass a scoped sandbox database. For every single secret, ask one question. If the agent leaks this string verbatim, what's the damage, and for how long? If the answer is "total, forever," that secret has no business in an unattended run.

Now let's wire all five layers into one concrete, copyable workflow. The goal. A scheduled, headless Claude Code run that researches and opens a PR against a sandbox AWS account, and can do essentially nothing else.

Step one, Claude-side scope in the settings file. Default mode don't-ask. Deny curl, deny wget, deny reading the env file and the AWS and SSH directories. Sandbox enabled, fail-if-unavailable true, allow-unsandboxed-commands false. Filesystem deny-read on the AWS directory, the SSH directory, and home, with allow-read on the project. Network set to allow only managed domains. And the environment-scrub variable set in the environment.

Step two, the headless invocation with caps. Claude, bare mode, dash p, permission-mode don't-ask, allowed-tools limited to Read, Edit, and git, max-turns forty, output-format json. Then you assert on the total cost and the is-error flag after it finishes.

Step three, a least-privilege OIDC role in a sandbox account. Trust scoped to your repo on the agent branches, role policy limited to one S3 prefix and one Lambda. The sandbox account sits under a sandbox OU with a region-lock SCP and a cost-guardrail SCP. Id-token write only.

Step four, the egress allowlist. Runner in a private subnet, no NAT gateway, a security group with no open egress, AWS API access through VPC endpoints. Nothing to exfiltrate to.

Step five, two independent spend caps. Max-turns and the budget flag on the token side. An AWS Budgets action applying a deny-all-but-read policy at the cloud-spend threshold, plus Cost Anomaly Detection on the other side.

Step six, branch protection so it can only open a PR. The GitHub token gets contents write and pull-requests write, no admin. Protected main requires a human reviewer. The production environment requires approval. The worst single turn is a PR sitting for review.

Now do the worst-case audit, because that's the actual deliverable. A fully compromised turn, every prevention layer defeated, can write to one S3 prefix in a throwaway account, spend up to the cap, and open a PR. That's it. It cannot reach prod. It cannot exfiltrate. It cannot merge. It cannot read credentials. It cannot exceed the budget. The blast radius is a worthless sandbox and a PR a human will look at. That's a survivable worst case. That's the whole game.

And now the pitfall, because there's a fallacy hiding right in the middle of that workflow.

The fallacy is "the agent can only open a PR, so it's harmless." It sounds airtight. It is not. Because a PR is an input. And an input triggers CI. And CI can hold secrets and a powerful token.

The specific trap is the pull-request-target trigger. Compare it to the normal pull-request trigger. The normal one runs in the fork's context, with no base-repo secrets and a restricted token. Safe for untrusted PRs by design. But pull-request-target runs in the base repo's context, with access to your secrets and a read-write GitHub token, even for fork PRs. So if a workflow on that trigger checks out and runs the PR's code, a so-called pwn request, then the attacker's PR code runs with your secrets. The "it only opens a PR" boundary just evaporated, because the PR reached into a privileged CI job.

This is not theoretical. Cite them by name. The spotipy advisory and the openlit advisory were both secret exfiltration or remote code execution through pull-request-target misuse. And the March 2026 Trivy attack used pull-request-target to send secrets to the attacker. Real projects, real leaks, same mechanism.

Here's the recognizable symptom, so you can spot it in your own repos. A workflow that triggers on pull-request-target. Then a checkout step with the ref set to the PR head SHA, which is checking out untrusted PR code. Then it runs build or test scripts from that checkout. With no permissions block, and secrets in scope. If untrusted PR code executes, and base-repo secrets are present in the same job, the PR is not a safe boundary. Those two conditions together are the whole vulnerability.

The fix. Use the normal pull-request trigger for untrusted PRs. If you genuinely must use pull-request-target, never check out or run PR code in the privileged job, set explicit least-privilege permissions, and gate any deploy behind an environment with required reviewers.

A few alternate versions of the same mistake, each with a tell. Broad read in the sandbox still leaks your AWS credentials, because default read is the whole machine and nobody added the deny-read. The wildcard IAM resource star, the action star on resource star someone wrote "just to get it working," means the agent that should touch one bucket can read and delete every bucket, and the fix is Access Analyzer policy generation. And the GitHub token defaulting to write-all on a legacy repo with no permissions block, so a single compromised step can push to main or publish packages.

And here's the through-line, the thing to take away. Every one of those is the same failure. A layer that looked like a boundary, a PR, "sandbox on," "it's just CI," wasn't actually scoped. It looked like a wall and it was a curtain.

So that's blast-radius engineering. It's the discipline of assuming each layer can fail, and making sure the next one still bounds the damage. You are not building one perfect wall, because there's no such thing. You're building five cheap walls, and making peace with the fact that any one of them might fall, as long as the worst case on the other side is survivable. The human at the keyboard was the circuit breaker. When you take the human out, the scope is the circuit breaker. Wire it in before the run, because during the run, nobody's home.