Claude Code Agent Teams on a Real Production Chore
This is the companion post to the video below. The post is the reference card; the video is the walkthrough. Also on Substack.
What this is
Claude Code agent teams shipped as an experimental feature about a week before I recorded this. I wanted to know how it behaved on something real, so I pointed it at a job I actually had to do: bring scality/mountpoint-s3-csi-driver, a fork of the AWS Mountpoint S3 CSI driver - back to feature parity with upstream after a gap, and fix a known bug (missing securityContext on mount pods) along the way.
We hadn’t rebased the fork since AWS’s v2.0. The diff was large. There were features I’d want, features I’d skip (EKS, S3 Express One Zone - Scality’s S3 implementation doesn’t need those), and bugs to triage. Classic intensive chore.
“I tried Claude Code agent teams for like 10 minutes and I was very surprised by how much I can scale myself.” ▶ 00:00
Agent teams vs. subagents - the one-line version
Subagents report back to the main agent and don’t talk to each other. Agent teams share a task list and message each other directly, each in its own context window. You can also message any teammate yourself.
That’s the whole architectural difference. The cost difference is real: each teammate is a full Claude Code instance, so tokens scale linearly with team size.
What I actually did
Half a day of my own reading first - release notes, PRs, issues on the upstream - before delegating. Then I spawned a team with four analyzers running in parallel: feature filter, security context analyzer, upstream analyzer, divergence analyzer. A team lead synthesized.
“I wanted to understand what I am delegating, so I can direct it in the right direction.” ▶ ~05:00
“I don’t call it vibe coding. I call it intentional AI coding.” ▶ ~26:30
The output was a 4-phase roadmap: critical security bug, 5 high-value fixes, ~30 developer-days of work across 74 analyzed commits. After the first pass I asked it to re-render the markdown reports as an HTML site so I could read them in a browser, then ran a second verification pass on Opus in tmux mode to challenge the findings against the actual code.
Chapter map:
- 00:00 - Real production use (not a demo)
- 00:45 - Project context: CSI driver for S3
- 08:30 - Spawning agent teams & delegation
- 14:20 - Reviewing output + token/cost reality check
- 18:40 - Turning findings into an action plan
- 30:00 - Tmux teammate mode (watch agents live)
- 40:50 - Results, scaling reflections & next steps
Three settings worth knowing
These are the three things I’d want someone to have in front of them before trying this.
1. Enable it
Agent teams are disabled by default. Flip the env var in ~/.claude/settings.json:
{
"env": {
"CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
}
}Needs Claude Code v2.1.32 or later. Check with claude --version.
2. Display mode - in-process vs. tmux split panes
Two ways to watch teammates work:
- In-process (default-ish): all teammates live in your main terminal.
Shift+Downcycles through them,Enterjumps into a teammate’s session,Ctrl+Ttoggles the task list. Works in any terminal, zero setup. - Split panes: each teammate gets its own pane via tmux or iTerm2. You see everyone at once. The mode I switched to halfway through the video to actually watch a 19-teammate Opus pass spawn.
Override in ~/.claude/settings.json:
{
"env": {
"CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
},
"teammateMode": "tmux"
}Or per-session: claude --teammate-mode in-process.
Default is "auto" - split panes if you’re already inside a tmux session, in-process otherwise. For tmux mode in iTerm2, install the it2 CLI and enable the Python API in iTerm2 settings. tmux -CC from iTerm2 is the suggested entrypoint.
“I think it should display all the agents on the right side of the screen automatically once the agents start.” ▶ ~34:55
It does. It’s the most fun part of the feature.
3. Pick the model per teammate
Teammates don’t inherit the lead’s /model selection. They use whatever you set as Default teammate model in /config → “Default teammate model”, or whatever you specify in the prompt.
If you want the team on Opus, say so in the spawn prompt:
Use agent teams to spawn one teammate per independent task, maximizing parallelism with an upper limit of 10 concurrent teammates. If there are more tasks than that, queue the rest - new teammates spawn as active ones finish and slots open up. Where tasks have dependencies, allow communication among teammates so dependent agents wait for prerequisites to finish, and the team lead coordinates handoffs between them. All teammates should use the Opus 4.6 model. Each teammate should verify their output by comparing it against the actual code…
A note on this prompt
- “Use agent teams to spawn…” - without this phrasing, Claude Code often won’t engage the agent teams feature at all. Being explicit is the trigger.
- “limit of 10 concurrent teammates” - sweet spot for most tasks I’ve run. Past ten, coordination overhead and permission prompts start eating the gains. New agents still spawn for queued tasks once a slot frees up.
- “allow communication among teammates” - teammates can message each other and the lead directly. Worth it whenever tasks have ordering dependencies; without it the lead has to broker every handoff.
- “All teammates should use the Opus 4.6 model” - specified inline here, but you can also set it via
/config → Default teammate modelso you don’t have to repeat it in every prompt. - “verify their output by comparing it against the actual code” - keeps agents grounded in the codebase instead of their own summaries. The most effective hallucination guard I’ve found for this kind of work.
What the cost actually looked like
Token usage scales linearly with team size - each teammate is a full instance. The first pass with four Sonnet analyzers and a lead landed around 350k tokens total - roughly $8 at API rates. The second pass on Opus with ~19 teammates re-reading large portions of both codebases for verification was substantially heavier - somewhere in the $50–$100 range if you were paying API token prices.
If the chore would have taken me a focused day, that math is fine. For routine work, a single session is cheaper and faster. The actual lesson isn’t “agent teams are cheap” - it’s that you can run a small Sonnet team to plan, then spend the Opus budget only where verification actually matters.
“There are 74 commits analyzed… That’s like a lot of work.” ▶ ~24:30
What I’d do differently next time
- Pre-approve permissions. Every teammate is its own Claude Code session, so the same permission prompts surface from each one. Adding allowlists to
.claude/settings.jsonbefore spawning saves a lot of clicking. - Specify model in the spawn prompt. Don’t rely on the lead inheriting your
/model- it doesn’t propagate. - Pull the verification pass into the same run. I split “analyze” and “verify against code” into two prompts. One prompt with explicit verification-as-a-task would have been cleaner.
- Save the workflow as a skill. This is a chore I’ll repeat. Next time, a skill triggers the team with the right prompt structure instead of me re-typing it.
- Try out agent teams with features and bugs.
Bottom line
Agent teams are worth the token bill when the work has parallel structure and you’d otherwise be context-switching across files yourself. Research, review, and divergence analysis are the sweet spot. The output I got back was something I’d actually use - not a vague summary, a 4-phase plan with intent, system impact, and dependencies for each item. The bug got fixed. The rebase has happened and new versions have been released.
It scales the work, not the understanding. You still need to know what you’re delegating. The agents are fast. The judgment is still yours.
Links
- Claude Code agent teams docs
- Subagents docs - for the comparison
- scality/mountpoint-s3-csi-driver
- awslabs/mountpoint-s3-csi-driver - upstream
- awslabs/mountpoint-s3 - the underlying FUSE client both drivers use
it2CLI - for iTerm2 split-pane mode