One agent is useful. Multiple agents working together is a system. Today we go from single-agent tool use to delegation, handoffs, and parallel dispatch.
office-hours.dev · 1900 Broadway, 2nd Floor · Oakland, CA
Returning faces, new faces. What shipped from Session 01. Quick show of hands on multi-agent experience.
Why a single agent breaks at scale: context limits, mixed concerns, compounding errors. When delegation pays for itself.
The four shapes most multi-agent systems collapse into. Which to reach for, and when.
One Sonnet orchestrator dispatches three Haiku reviewers in parallel — security, style, correctness. Fan out, fan in, aggregate. Clone the repo and follow along.
Pick a build project based on your skill level and what you care about. We help you scope it.
You build. We float. Ask anything. Change direction. Go deeper. The room is yours.
Quick demos of what people built. 2 minutes each. No pressure, but encouraged.
Quick show of hands. No wrong answers.
Tool definitions, function calling, the basic loop from Session 01.
Parent agent dispatches a child with a scoped task and a fresh context.
Fan-out to multiple workers, fan back in with aggregated results.
Running in the real world, handling real traffic, recovering from real failures.
We'll use these words a lot in the next three hours. If any of them feel fuzzy, come back to this slide.
A language model plus tools it can call. From Session 01. The base unit of everything we build today.
A function the model can invoke, declared with a name, description, and typed input schema. The model requests, your code executes.
Everything a model sees on one call — system prompt, messages, tool results. Each subagent gets its own fresh one.
An agent invoked by another agent, usually with a scoped task and isolated context. Exposed as a tool to its parent.
The parent agent that plans, dispatches subagents, and aggregates their outputs. Does no domain work itself — only routing.
The structured data passed when one agent invokes another. Typed inputs in, typed outputs out. The contract between agents.
Dispatch N independent subagents in parallel (fan-out), collect all N results (fan-in), then aggregate. The PR Review Crew pattern.
The software that runs the agent loop for you — message plumbing, tool dispatch, retries, state. Examples: Claude Code, Claude Agent SDK, OpenAI Agents SDK, LangGraph.
One context window holds everything.
One role, doing all the jobs.
Sequential: think, act, think, act.
One failure cascades into total failure.
Hard to specialize. Prompts bloat.
Isolated contexts, scoped to each subtask.
Specialists with narrow responsibilities.
Parallel where independent, sequential where ordered.
One subagent fails; the supervisor retries or routes around it.
Each agent is small, readable, testable.
The key insight: An agent system is just an agent that treats other agents as tools. Same tool-calling API surface from Session 01 — pointed inward. You already know how to build this.
The parent agent invokes a child with a scoped task and a fresh context. The child doesn't see the parent's history — only what it needs.
Structured inputs and outputs between agents. Pydantic, JSON Schema, typed messages — anything that turns "text in, text out" into a real contract.
Who decides when a subtask is done, what to retry, when to escalate. This is the code the orchestrator owns and subagents don't touch.
Four shapes most multi-agent systems collapse into. Each has its own strengths and failure modes. Real systems mix and match.
One planner dispatches specialists in parallel and aggregates their outputs. Hub and spokes. Fan-out, fan-in.
Use when: tasks cleanly split into independent subtasks.
Each agent transforms the input and hands off to the next. Sequential, typed stages. The inbox triage pattern.
Use when: work has a natural order — classify → act → confirm.
A classifier decides which specialist gets the task. Only one path executes. Cheap, deterministic dispatch.
Use when: input types are heterogeneous but each needs one specialist.
An executor produces output; a critic reviews and sends feedback. Repeat until the critic accepts. Quality over latency.
Use when: correctness matters more than speed — code, long-form writing, plans.
The PR Review Crew you'll see in the demo is Orchestrator-Worker. Once you see one pattern clearly, the others are small variations on the same mental model.
Four runtimes — software that runs the agent loop for you, handling message plumbing, tool dispatch, parallel calls, retries, and state. Same mental model across all of them; pick based on how much control you need.
Claude Code subagents. Define an agent in markdown, give it tools, reference it from a parent. The CLI handles the loop.
Claude Agent SDK. OpenAI Agents SDK. LangGraph. You define agents, tools, and handoffs in code. The SDK runs the loop with typed messages.
Why this matters today: You don't have to build the orchestrator loop from scratch. Pick a runtime that matches your control-vs-ergonomics preference, wire in your tools and prompts, and you're coordinating agents in an afternoon.
Same four-step shape as a tool call from Session 01. The tool just happens to be another agent.
The orchestrator never reviews code directly. It plans, routes, and aggregates. Specialists do the work. Each runs in its own context, sees only what it needs, and returns structured output.
A working reference implementation of the orchestrator-worker pattern. One Sonnet orchestrator. Three Haiku reviewers. Parallel dispatch, structured handoffs, per-run cost tracking.
Clone the repo, flip MOCK=true to see the orchestration flow end-to-end with no API calls.
Flip it to false and the cost tracker prints Sonnet + Haiku spend for every run — plus what the same work would cost all-Sonnet or all-Opus.
The SUBAGENT_TOOLS declaration. The parallel dispatch. The schemas.py handoff contract. The cost report.
Subagent system prompts, the aggregation logic, and a fourth specialist of your choice — performance, accessibility, whatever matters to your team.
Declarative. Markdown files define subagents. Parent agent references them. No orchestrator code to write. Fastest path from idea to working system.
Best for: dev workflows, research crews
Claude Agent SDK or OpenAI Agents SDK. Define agents and typed handoffs in code. SDK runs the loop, handles retries, tracks state. Good middle ground.
Best for: product features, APIs
LangGraph or your own orchestrator. Full control of routing, state, and retries. More code, more flexibility. The PR Review Crew lives here.
Best for: production systems with specific needs
Pick one. Build it in the next 2 hours. We're here to help.
One orchestrator plus three researcher subagents, each looking at a different angle of the same topic. Aggregate into a single report.
Tools: Claude Code · No code required
Take your Session 01 briefing agent and split it into parallel specialists — news, weather, calendar, markets — each writing its own section.
Tools: Claude Desktop + MCP · Light config
Clone aiofficehours/pr-review. Tune the three subagent prompts for a codebase you know. Add a fourth specialist — performance, accessibility, tests, your call.
TODO(you): promptsSUBAGENT_TOOLSTools: Python + Anthropic SDK · Some coding
Chain three agents: classifier, responder, escalator. Each hands off to the next with structured context. Point it at a real inbox.
Tools: Agent SDK + Gmail MCP · Some coding
Build your own orchestrator with the Claude Agent SDK or OpenAI Agents SDK. Typed handoffs, parallel dispatch, retries. The full loop, your way.
Tools: Agent SDK + your editor · Full code
Build an MCP server that exposes an internal agent team as a single tool. Any MCP client calls review_pr and the orchestrator fans out under the hood.
Tools: Python or Node + MCP SDK · Full code
No skill gate. Pick a real trip you actually want to take. Wire up agents that coordinate across flights, hotels, calendar, weather, maps, and your budget — then let them hand off to each other. Start in Claude Desktop with MCP servers; graduate to the PR Review Crew pattern if you want parallel dispatch and a supervisor loop. Bring back a plan you'd actually book.
Tools: your call · Any skill level
If your project needs API keys (Anthropic, OpenAI, Google, etc.), we'll provision them on-site through Keyfree. No credential management required. You use the capability without seeing the key.
keyfree.dev
github.com/aiofficehours/pr-review
Clone it, make dev, read the orchestrator
docs.anthropic.com/en/docs/agents-and-tools
Claude Agent SDK · Claude Code subagents
Blog: Building effective agents
OpenAI Agents SDK
LangGraph (graph-based orchestration)
MCP servers: mcp.so, github.com/modelcontextprotocol/servers
office-hours.dev/session02
Recording posted after the session
Built something? Show the room. 2 minutes each. No slides needed. Just share your screen and tell us what you made.
Not required. But you'll be surprised how much you built in 2 hours.
What happens when agents take irreversible actions? Build an agent that makes a real transaction, inspect the receipt chain, and see what happens when something goes wrong. May 7, same time.
May 7 · Trust, Safety & Letting Agents Spend Money
May 21 · Debugging Agents in Production
June 4 · Agents Meet the Physical World
office-hours.dev
hlos.ai
day-zero.dev