OCDevel Claude Code Podcast

The podcast for developers who live in Claude Code. A fast news segment on the latest Claude Code releases with a hands-on tutorial that levels up your agentic coding. The news covers what actually shipped across Claude Code and the wider Anthropic stack - new versions, models, pricing, plus the MCP servers, skills, and hooks worth your time. Then the tutorial climbs a single ladder across the series: from driving one Claude session by hand in your terminal, to power-user tooling (custom slash commands, subagents, MCP), to multi-agent fleets, to autonomous review-and-fix loops, to a full pipeline where you file a GitHub issue from your phone and Claude implements the feature, opens the PR, runs the tests, and ships to production while you're on the beach. Claude as the senior engineer on your one-person team. One copyable workflow and one real pitfall per episode - every command, flag, and setting named exactly as it appears in the tool. For working developers who want to stop typing every keystroke and start directing. AI-generated podcast by OCDevel.

Generated with OCDevel PodcasterMade with OCDevel Podcaster

This show was made with OCDevel Podcaster: turn any topic or text into an AI-narrated podcast episode that drops right into your feed.Turn any topic into an AI-narrated episode in your feed.Create your own →Create your own →

Fleet observability for parallel Claude Code: built-in OpenTelemetry, Grafana cost dashboards, and per-agent spend attribution

1d ago

Wire every parallel Claude Code session, subagent, and headless run into one OpenTelemetry feed and watch per-agent token burn and dollar cost on a Grafana dashboard, with stall detection that catches a looping agent before it drains the budget. The one trap: the same opt-in log flags that give you attribution can ship raw prompts and pasted credentials to your observability backend, and one shared API key collapses every worktree's cost into a single lying bucket.

Learn Faster with a Walking DeskWalk While You Learn

Sitting for hours drains energy and focus. A walking desk boosts alertness, helping you retain complex ML topics more effectively.Boost focus and energy to learn faster and retain more.Discover the benefitsDiscover the benefits

Show Notes

This week's Claude Code releases, then a full tutorial on fleet observability and cost dashboards for parallel runs.

News (2026-06-26 to 2026-07-03, from the Claude Code changelog):

Claude Sonnet 5 is now the default (v2.1.197, Jun 30): native 1M-token context, promo pricing $2/$10 per Mtok through Aug 31. Shows in /model. (Some trackers misdate it to Jul 1 - the docs changelog is authoritative.)
Background subagents by default (v2.1.198, Jul 1): agents keep running and notify on finish; claude agents runs now commit, push, and open a draft PR from a worktree; Notification hook fires agent_needs_input/agent_completed; Explore agent inherits the session model (capped at opus); new /dataviz skill; the /agents wizard was removed.
Stacked skills + retry controls (v2.1.199, Jul 2): /skill-a /skill-b loads up to 5; CLAUDE_CODE_RETRY_WATCHDOG raises retries to 300; fixed subagents reporting usage-limit errors as success, plus a Linux daemon that killed all agents every ~50s.
Org defaults + durability (v2.1.196, Jun 29): admin org default model, stream idle watchdog on by default, background sessions survive restarts.

Tutorial - Fleet observability and cost dashboards:

Native OpenTelemetry, opt-in via CLAUDE_CODE_ENABLE_TELEMETRY=1 - Monitoring docs. Metrics like claude_code.cost.usage, claude_code.token.usage, claude_code.active_time.total; log events; beta trace spans (claude_code.interaction root).
Reference stack: ColeMurray/claude-code-otel (Collector to Prometheus + Loki to Grafana, six-section dashboard).
Cost tools: /cost slash command (Manage costs); ccusage (npx ccusage@latest, ccusage blocks --live); org-level Usage and Cost API and the per-user Claude Code Analytics API.
Headless mode: claude -p --output-format json returns total_cost_usd, num_turns, session_id; guard with --max-turns and --max-budget-usd.
AWS worked example: Analyzing Claude Code usage with CloudWatch and OpenTelemetry.
Pitfall: OTEL_LOG_USER_PROMPTS=1 / OTEL_LOG_TOOL_DETAILS=1 leak prompt text and credentials (Elastic Security Labs); one shared API key breaks per-worktree attribution.

Transcript

Welcome back. Let's run the news for the week of June 26th through July 3rd, straight from the Claude Code changelog. Four releases landed, and the headline is a whole new brain under the hood.

Version two point one point one ninety-seven, shipped June 30th, makes Claude Sonnet 5 the default model in Claude Code. This is the big one. Sonnet 5 ships with a native one-million-token context window, not a beta flag you have to trip, and promotional pricing of two dollars per million input tokens and ten dollars per million output tokens, running through August 31st. TechCrunch framed it as a midsize model with large gains over Sonnet 4.6 on multi-step reasoning, tool use, coding, and agentic workflows run end to end. Why do you care? It's your new default, it shows up in the slash-model menu, and that promo price genuinely changes the math on parallel and background runs, which is exactly what today's tutorial is about. Update to one ninety-seven or later. One caveat worth naming: some third-party trackers date Sonnet 5 to July 1st and offset the version numbers by one. The docs changelog is authoritative, so it's June 30th, version one ninety-seven.

Version one ninety-eight, July 1st, is a fleet release through and through. Subagents now run in the background by default, so Claude keeps working while they run and pings you when they finish. Background agents launched from the agents command now commit, push, and open a draft pull request when they finish code work in a worktree, instead of stopping to ask. Those sessions fire the Notification hook with agent-needs-input and agent-completed events, which is real wiring for the observability we're about to build. The built-in Explore agent now inherits your main session's model, capped at Opus, instead of running on Haiku. Claude in Chrome went generally available. There's a new dataviz skill for chart and dashboard design with a runnable color-palette validator. And the old agents wizard is gone; you manage subagents now by asking Claude or editing the agents directory directly.

Version one ninety-nine, July 2nd, is about not falling over at scale. You can now stack slash-skills, so calling skill-a, then skill-b, then "do XYZ" loads up to five leading skills at once. The retry watchdog raises the default retry count for transient errors to three hundred and lifts the old cap on max retries. And a pile of fleet-reliability bugs got squashed: a Linux daemon that killed every running agent every fifty seconds after an unclean shutdown, a stop command silently undone by a respawn race, and subagents that reported usage-limit errors as successful results. That last one matters a lot if you plan to trust your dashboards.

Finally, version one ninety-six, June 29th, added organization default models set in the console, a streaming idle watchdog on by default that aborts and retries after five minutes of silence, and durable background sessions that survive the process being stopped, restarted, or updated. Update, and let's talk observability.

Okay. By now, across this arc, you've built a small machine. You've got parallel sessions running in git worktrees. You've got the orchestrator pattern, a lead agent dispatching waves of subagents. You've got headless runs driving claude dash p and the Agent SDK. You've got the Claude Code GitHub Action, the at-claude one, with auto-pull-request workflows and label-driven implement runs. And you wrapped all of it in autonomous-run safety and blast-radius engineering, sandboxing, prompt-injection defense, audit logs, IAM and OIDC and branch protection. Here's the honest problem with everything we've built. You have a fleet now, and you can't see it. Five, ten, twenty agents burning tokens across worktrees and CI jobs, and no single pane of glass telling you which one is stuck, which one is looping, and which one just quietly spent nine dollars. Today we fix that. Today we build the data faucet, the dashboard, and the alarm, and I'll show you the one pitfall that turns your shiny observability stack into a liability.

Let's start with the faucet, because Claude Code ships one for free. There is native OpenTelemetry support built right into the tool. OpenTelemetry, OTel for short, is the vendor-neutral standard for emitting metrics, logs, and traces, and Claude Code speaks it out of the box. It's opt-in, and the master switch is a single environment variable: claude code enable telemetry, set to one. Flip that, and Claude Code starts emitting three kinds of signal. Metrics, which are numeric counters and gauges. Structured log events, which are timestamped records with attributes. And, behind a second beta flag, trace spans, which show causality, what caused what. The reference for all of this is the monitoring page in the Claude Code docs, the monitoring-usage page. Read it once end to end; it's the source of truth for every name I'm about to say.

Once telemetry is on, you tell Claude Code where to send each signal type with three exporter variables. There's the metrics exporter, which accepts console, otlp, prometheus, or none. There's the logs exporter, which accepts console, otlp, or none. And there's the traces exporter, console, otlp, or none. Console just prints to your terminal, which is great for a first smoke test; you literally see the metrics scroll by. The otlp value means "ship it over the OpenTelemetry protocol to a collector," which is what you want in production. So for a real setup you'd set the metrics exporter and the logs exporter both to otlp.

Traces are special, and they're where the orchestrator finally becomes visible. To get the enhanced tracing beta you set two flags together: claude code enable telemetry equals one, and claude code enhanced telemetry beta equals one, and then you point the traces exporter at otlp. What you get is a tree. Every user prompt becomes a root span named claude code dot interaction. Underneath it hang child spans: claude code dot llm request for each API call, and claude code dot tool for each tool execution, which itself has sub-spans, one called tool dot blocked on user, for the time spent waiting on your approval, and one called tool dot execution, for the actual work. There's also a claude code dot hook span. So a single prompt links cleanly to every API request and every tool run it triggered. Hold onto that tree; it's the backbone of subagent attribution later.

Now, where does the data physically go, on the wire? That's the OTLP endpoint configuration. The protocol variable, otlp protocol, takes grpc, http slash json, or http slash protobuf. The endpoint variable, otlp endpoint, defaults to localhost port 4317, which is the gRPC port; if you switch to HTTP the conventional port is 4318. And the headers variable, otlp headers, is how you authenticate, typically something like "Authorization equals Bearer, then your token." There are signal-specific overrides too, so you can send metrics to one endpoint and logs to another using the per-signal endpoint variables. And for mutual-TLS setups there are certificate variables, node extra CA certs and a client cert variable among them, so you can present a client certificate to a locked-down collector.

Two more knobs before we talk about what's actually in the stream: export intervals and temporality. The metric export interval defaults to sixty thousand milliseconds, so once a minute. The logs export interval defaults to five thousand milliseconds, so every five seconds, because you want events to feel near-real-time. And temporality: there's a metrics temporality preference variable that takes delta or cumulative. This one bites people. Prometheus wants cumulative, so if you're feeding Prometheus and your counters look wrong, check that you set cumulative.

Here's the part to actually memorize, the metrics Claude Code emits, by exact name, because your dashboard queries these strings. There's claude code dot session dot count, the number of sessions started. There's claude code dot lines of code dot count, lines modified, with an attribute marking added versus removed. There's claude code dot pull request dot count and claude code dot commit dot count, PRs and commits created. There's claude code dot cost dot usage, measured in US dollars, the estimated session cost. There's claude code dot token dot usage, measured in tokens, with a type attribute that splits input, output, cache-read, and cache-creation, which matters because cached tokens are cheap and you want to see your cache hit rate. There's claude code dot code edit tool dot decision, which records accept or reject on proposed edits. And there's one you should fall in love with: claude code dot active time dot total, measured in seconds of genuinely active time, distinct from wall-clock time. That gap between active time and wall-clock is your single best stall detector, and we'll come back to it hard.

Every one of those metrics and events carries a standard set of attributes, and these are the labels you'll group and filter by. Session id. App version. App entrypoint, which tells you how the session was launched, values like cli, sdk-cli, sdk-ts, sdk-py, or claude-vscode, so you can separate your interactive work from your headless fleet in one query. Organization id. User account uuid. User id. User email, populated on OAuth sessions. And terminal type. On top of those, you can attach any custom attributes you want with the resource attributes variable, a comma-separated list of key equals value pairs. Remember that one; it's the whole trick for the fleet angle.

Because attributes are labels, and labels have cost, Claude Code gives you cardinality controls, and it ships sane defaults. Including session id in metrics is on by default. Including version is off by default. Including account uuid is on by default. Including entrypoint is off by default. Cardinality just means how many distinct values a label can take; session id has effectively unlimited distinct values, which is powerful for attribution but expensive in a metrics database, so keep that trade-off in the back of your mind. We'll pay that bill in the pitfall section.

Now the log events, which are richer than metrics because they carry context. The big one is claude code dot user prompt, with attributes for prompt length, the prompt text itself which is gated behind a flag, and a command name. There's claude code dot api request, and this event is a goldmine: model, cost in USD, duration in milliseconds, input tokens, output tokens, cache-read tokens, cache-creation tokens, a request id, an effort attribute, and a field called query source. Query source is the magic word. Its value is repl-main-thread for your foreground work, or compact for context compaction, or the name of a subagent when the work came from one. There's claude code dot api error, with a status code and an attempt number. There's claude code dot api retries exhausted, which fires when Claude gave up after all its retries. There's claude code dot tool result, with the tool name, whether it succeeded, and a duration. There's claude code dot tool decision. And there's claude code dot compaction, with a trigger that's either auto or manual, plus pre-tokens and post-tokens showing how much context got squeezed out.

Here's why query source matters so much for people running the orchestrator pattern. Query source, combined with two span attributes on the llm-request span, agent id and parent agent id, lets you attribute spend to a specific subagent. That's the thing you couldn't do before. When the orchestrator we built dispatches a wave of children, each child's API cost now carries its own identity. You can finally answer "which of my subagents is expensive," not just "my session cost this much."

One more thing on the collection side, for teams. Enterprise admins don't have to beg every developer to set these variables. Admins can force the telemetry configuration for all users through managed settings, the managed-settings JSON file, so an entire fleet ships to one central collector without anybody opting in individually. And a genuinely important privacy fact: your file contents and your code are not in these metrics or events. The only way response text ever leaves is if you explicitly enable prompt logging, and even then it goes only to your own OTel endpoint, never back to Anthropic. Hold that thought, because the pitfall is going to complicate it.

So that's the faucet. Now, where does the water land? Let's talk about the canonical pipeline, because there's a well-worn path here and you shouldn't reinvent it. The flow is: Claude Code emits OTLP, that flows into an OpenTelemetry Collector, which listens for OTLP in on port 4317 for gRPC and 4318 for HTTP. The collector fans the signals out: metrics go to Prometheus, events go to Loki, and Grafana sits on top querying both. That's the whole shape. Collector in the middle, Prometheus and Loki as stores, Grafana as the face.

You don't have to wire that by hand, either. There's a reference community stack, Cole Murray's claude-code-otel project on GitHub. It's a docker-compose that stands up the OTel Collector, Prometheus on port 9090, Loki on port 3100, and Grafana on port 3000, all pre-wired. And it ships a Grafana dashboard with six sections, which is a nice tour of what's even worth showing. Section one, Overview: active sessions, aggregate cost, tokens, code modifications. Section two, Cost and Usage: per-model spend, token breakdown by type, API request counts. Section three, Tool Usage and Performance: tool adoption and success percentages. Section four, Performance and Errors: latency and error rate. Section five, User Activity and Productivity: lines of code, commits, pull requests. And section six, Event Logs: live tool events and API errors streaming by. Every panel in there queries the exact metric names I read you a minute ago, which is why the names are worth knowing; the dashboard is just SQL-ish queries against those strings.

Grafana and Prometheus are the open-source default, but they're not your only target. Because Claude Code speaks plain OTLP, any first-class OpenTelemetry backend works: SigNoz, Honeycomb, Datadog, New Relic, Elastic, or a self-hosted Grafana paired with VictoriaMetrics if you want a leaner metrics store. There's also a drop-in wrapper alternative called claude telemetry, by TechNickAI, which swaps the claude command for one called claudia and logs to Logfire, Sentry, Honeycomb, or Datadog, if you'd rather not manage a collector at all. And Anthropic's own docs list partner turnkey dashboards, the ones where you basically paste a token and go: Grafana Cloud, Datadog, Honeycomb, CloudZero, and Vantage. Pick your poison; the emitter side doesn't change.

Now let's get specific about cost, because OTel is general-purpose and Claude Code also gives you cost tools that are purpose-built. The first is the slash-cost command, interactive, right inside a session. It prints total cost in US dollars, total API duration, wall-clock duration, code changes, and usage broken down by model. There's a wrinkle depending on how you pay. For API-key users, slash-cost shows the full dollar breakdown. For subscription users, Pro or Max, the dollar figure is effectively hidden, it reminds you that your cost isn't per-token under a subscription, but it still shows you token counts. The habit to build: run slash-cost during a long refactor or a max-effort session, right when you suspect the meter's running hot. The reference is the manage-costs page in the docs.

The second cost tool is ccusage, and yes, this is the community CLI, not part of the news, it lives here in the tutorial. It's MIT-licensed, around sixteen thousand eight hundred stars, by the developer ryoppippi. The lovely thing about ccusage is it needs no API key at all; it parses your local session logs. You run it with npx ccusage at latest, or bunx ccusage if you're on Bun. Its data source is the session JSONL files under your Claude projects directory, and it respects the Claude config directory variable if you've moved that. Costs are estimated from LiteLLM's pricing database. The commands are daily, weekly, monthly, and session for those rollups; there's a blocks command that reports against Claude's five-hour billing windows; there's a statusline command; and there's an mcp command. The flags that matter: json for machine output, breakdown for per-model splits, since and until for date ranges, and the fleet-relevant pair, instances and project, which group usage by workspace. And the one to demo live: ccusage blocks, dash dash live, gives you a real-time dashboard showing your active session's progress, your token burn rate, and a projected cost for the current five-hour window. Same caveat as slash-cost, though: Max and Pro users see the would-be API cost, what it would have cost pay-per-token, not the actual amount billed against your subscription.

The third cost surface is for org admins, and it's programmatic: the Anthropic Console plus the Admin Usage and Cost API. This one requires an Admin API key, the kind with the sk-ant-admin prefix, and only an Org Admin can mint one; individual accounts can't. There are two report endpoints. The usage report, at the organizations usage-report messages path, takes a start time, an end time, a bucket width of one minute, one hour, or one day, and a group-by that accepts model, workspace id, api key id, or service tier. And the cost report, at the organizations cost-report path, is daily granularity only, grouped by workspace id or by description, returning amounts as US-dollar decimal strings. Data freshness on both is about five minutes.

But for our purposes, the one to reach for is the dedicated Claude Code Analytics API, at the organizations usage-report claude-code path. This is per-user and per-day: one record equals one user's Claude Code activity for one day. Each record returns sessions, lines of code, commits, pull requests, tool usage, and per-user token counts with estimated cost broken out by model. It's free with Admin API access, and it only returns data older than one hour. Anthropic explicitly recommends this endpoint over trying to split cost across many API keys, and there's a reason for that recommendation that lands squarely in our pitfall later. And one clarifying fact people always ask: headless runs cost exactly the same per token as interactive ones. Headless doesn't change pricing; it just means you're the one scraping the cost number instead of reading it off a screen.

Which brings us to the cheapest telemetry of all, the logs your headless and programmatic runs already produce. When you run claude dash p with a prompt and add output-format json, headless print mode, you get back a structured blob. It carries a type and subtype, the total cost in USD, an is-error flag, a duration in milliseconds, a separate API duration, the number of turns, the result text, a session id, and a usage object with input tokens, output tokens, cache-read input tokens, and cache-creation input tokens. If you want it live instead of at the end, use output-format stream-json, which emits newline-delimited events you can tail as they happen. The fleet pattern writes itself: every cron run, every CI run, pipes that JSON through jq, plucks out total cost, number of turns, duration, and session id, and ships those four values as a metric or a log line. And here's the robustness detail, every result subtype carries total cost, usage, number of turns, and session id, even the error ones, so you can attribute spend and even resume a session that failed.

For unattended runs, two cost-control flags are your seatbelts, and they tie straight back to the safety episodes. The max-turns flag is a hard cap on agentic tool-use turns, and it is the single most important runaway guard you have; note it counts tool-use turns only, not conversational back-and-forth. And the max-budget-usd flag caps the dollar spend for the run outright. These are the same guards from the autonomous-run-safety and blast-radius episodes, except now we're going to make them observable instead of just protective. Also worth knowing: the session JSONL under your Claude projects directory, in the per-project-hash folders, is the exact same substrate ccusage reads, so a fleet can collect and batch-parse those files centrally. And your GitHub Actions run logs plus uploaded artifacts are a completely legitimate telemetry source; the auto-pull-request and label-driven workflows from earlier episodes can echo that cost JSON into the job summary or upload it as an artifact for later harvest.

Now let's assemble the fleet angle, because this is the payoff, aggregating across parallel agents, worktrees, and CI runs. Session id is a standard attribute, on by default. But the real move is injecting your own tags with the resource attributes variable. Picture a worktree working on an auth refactor. Right before you launch the agent, you export resource attributes set to something like worktree equals auth-refactor, branch equals whatever git branch dash dash show-current returns, issue equals 1234, agent equals orchestrator-child-3. Now every metric and event from that session carries those four labels. And a Grafana or Prometheus panel can group claude code dot token dot usage and claude code dot cost dot usage by worktree, or by branch, or by agent, and suddenly you're looking at per-agent token burn across your whole fleet, sliced however you tagged it. And remember, subagent-level attribution is even richer in traces, native, through agent name, agent id, parent agent id, and query source. So between resource-attribute tags on the metrics side and those span attributes on the trace side, the orchestrator's children are finally, individually visible.

Stall detection is where this all earns its keep, so let me make it concrete. The tell is claude code dot active time dot total versus wall-clock time. If wall time keeps climbing but active time flatlines, the agent is stuck, plain and simple, sitting there consuming a session slot and producing nothing. In headless JSON, the equivalent signal is the number of turns spiking, or hitting your max-turns cap, together with a rising duration and no pull request or commit coming out the other end. That pattern, lots of turns, growing time, zero output, is a looping agent. There's even a known hang bug where an agent stops producing output and never returns control, tracked as issue two eight four eight two in the anthropics claude-code repo, and the way you catch it is exactly this: alert when a session is open but active time isn't increasing. And the compaction event, with its pre-tokens and post-tokens, reveals context thrash, an agent compacting over and over because it keeps blowing its context window.

For runaway spend specifically, you set alarms. In Grafana, you alert on the sum of the rate of claude code dot cost dot usage over a one-hour window, per team, and fire when it crosses budget. Or you push the whole thing into AWS Budgets and CloudWatch alarms. The AWS blog on this gives two concrete, copyable alarm ideas: fire when a single user's hourly spend exceeds twice their own daily average, which catches an individual gone rogue, and fire when a team's daily spend exceeds a defined budget, with five hundred dollars a day as the worked example. Those two alarms alone will catch most disasters before they finish happening.

Now the pitfall, and this is a two-faced one, so stay with me, because both faces cut. The sharpest edge is a content leak through opt-in log flags. By default, and this is the safe default, Claude Code records only prompt length, a number, and never ships the prompt text. But there's a flag, log user prompts set to one, that ships the full prompt text to your observability backend. And there are siblings: a log tool details flag that ships tool names and arguments, the full bash command strings, and file paths; plus raw-API-body logging that captures even more. The moment you turn any of those on, your Grafana, your Loki, your Datadog starts storing messy raw prompts, internal URLs, partial credentials somebody pasted by accident, and vulnerability descriptions, exactly the sensitive stuff flagged by Elastic Security Labs and by General Analysis. Here's how you recognize whether you've already done this to yourself: search your logs backend for claude code dot user prompt records and check whether the prompt attribute is populated, not just prompt length. If the prompt field has text in it, you're storing content, full stop. The mitigation is two-part. Keep those flags off, that's the default and you should respect it. And if you need any of them on, use the OpenTelemetry Collector as your redaction control point; the collector has attributes and transform processors that can strip the prompt field and the user email field before anything fans out to storage. One choke point, one redaction rule, applied to every agent at once.

The second face of that same pitfall is the tension between cardinality, PII, and attribution, and it's a genuine trilemma, you can't have all three cheaply. User email gets emitted on OAuth sessions, and it's both personally identifiable information and a high-cardinality label that can inflate your metrics bill. Session id, on by default, is high cardinality by its very nature. So the tempting move is to turn those off, save money, dodge the PII exposure. But the instant you turn them off, you lose per-developer and per-session attribution, the exact thing you built this system to get. That's the trade, and you have to make it deliberately, not by accident.

And there's a classic fleet trap hiding right next to it, the one Anthropic was steering you around when it recommended the per-user Analytics API. When many parallel agents and worktrees all share one API key, your per-session cost attribution is simply wrong. The Usage and Cost API groups spend by api key id and by workspace id. So a single shared key collapses every worktree's spend into one bucket, and when the bill spikes you cannot tell which worktree burned it. Two fixes. Either give each agent or worktree its own API key or its own workspace, so the billing dimension itself carries the identity. Or, tag your OTel resource attributes per launch, the worktree-and-branch-and-agent tagging we did earlier, so the dashboard carries attribution even though the billing key doesn't. Pick one, but pick one, because a shared key with no tags means your beautiful dashboard is lying to you about where the money went.

Let me close with the concrete AWS example, because it makes the whole thing real, and it happens to answer "is all this observability going to cost me a fortune?" You can ship OTLP straight to CloudWatch with no ADOT sidecar at all, using a bearer token. The setup is six exports. Enable telemetry. Set the metrics exporter to otlp. Set the protocol to http slash json. Set the endpoint to the CloudWatch monitoring URL for your AWS region. Set the headers to Authorization, Bearer, your token. And set resource attributes to tag the user id, from whoami, and a team id. That bearer token is a CloudWatch metrics API key, and it ties to an IAM user carrying the CloudWatch API key access managed policy, which is what lets a tool outside AWS push metrics without pulling in the whole AWS SDK. If you'd rather use gRPC, the alternative is running an ADOT collector as a sidecar. Either way works.

And here's the number that should set your mind at ease. The AWS blog estimates that for two hundred developers, at roughly twenty sessions a day, across seven metrics, the telemetry ingestion itself costs about fourteen cents a month. Fourteen cents. It stays under fifteen dollars a month even at a hundred times that volume. So fleet observability is effectively free; the spend you're watching for is the agents, never the dashboard. Contrast that with the thing you're actually hunting: a single max-effort Opus refactor session can run several dollars on its own, and a looping headless agent that hits your max-turns cap at forty turns instead of the six you expected, that is precisely where the max-budget-usd flag earns its whole keep.

So tie it back to the arc, and you'll see we didn't build anything new so much as light up what we already had. The collector redaction point extends the cost and rate-limit engineering episode. The max-turns and max-budget-usd flags are the autonomous-run-safety and blast-radius guards from before, now made observable instead of merely defensive. Per-subagent attribution, through query source and the agent id span attributes, is how you finally see the orchestrator's children as individuals. And the headless JSON scrape is the telemetry tap for every GitHub Action and cron run in your pipeline. You had a fleet. Now you can see it, and you can put a price tag on every agent in it. Next time, we start pointing that fleet at harder targets. See you then.

OCDevel Claude Code Podcast

@media (min-width:0px){.css-6k8fz8{display:none;}}@media (min-width:1200px){.css-6k8fz8{display:block;}}Generated with OCDevel Podcaster@media (min-width:0px){.css-1rb0nos{display:block;}}@media (min-width:1200px){.css-1rb0nos{display:none;}}Made with OCDevel Podcaster

Fleet observability for parallel Claude Code: built-in OpenTelemetry, Grafana cost dashboards, and per-agent spend attribution

Learn Faster with a Walking DeskWalk While You Learn

Generated with OCDevel PodcasterMade with OCDevel Podcaster

Generated with OCDevel PodcasterMade with OCDevel Podcaster