TL;DR
The Claude Agent SDK is Anthropic's toolkit for building agents that use the same loop Claude Code runs on — tool use, permissions, hooks, subagents, context management, streaming. If you've ever wrapped messages.create in a while loop and bolted on your own tool routing, the SDK replaces that code with primitives that are already battle-tested. This post walks through what the SDK gives you, when to use it, and how I'd structure a first production agent.
What the SDK actually is
The Claude Agent SDK is a library (TypeScript and Python) that exposes the primitives Claude Code uses internally:
- The agent loop — plan, act, observe, repeat, until the task is done
- Tool definitions — typed tools with schemas, handlers, and permission gates
- Hooks — pre/post intercepts, the same three types Claude Code exposes
- Subagents — spawn isolated child agents with their own context and tools
- Memory and context compaction — automatic summarization as conversations grow
- Streaming — token-by-token output for responsive UIs
If you've built agent frameworks before, this isn't revolutionary. What's useful is that it codifies Anthropic's view of how a Claude-backed agent should behave — and that view is the same one driving Claude Code's reliability.
When to use it (and when not to)
Use the Agent SDK when:
- You want agent behavior embedded in your own product (chatbot, autonomous worker, CI automation)
- You're building something Claude Code can't reach — a webhook handler, a Slack bot, a Kubernetes operator
- You need to customize tool availability per-user or per-tenant
- You want the agent loop but your own UI or invocation pattern
Don't use the Agent SDK if:
- You're a developer who wants to code faster — that's Claude Code, not the SDK
- Your use case is a single
messages.createwith a static prompt — use the base Anthropic SDK - You only need retrieval over a doc set and no real tool use — a vanilla RAG setup is simpler
The SDK pays for itself when your agent needs to do things — write files, query APIs, run commands, coordinate subtasks. If it only needs to say things, you're overpaying in complexity.
The core loop, in ~30 lines
A minimal production-shaped agent in TypeScript:
import { ClaudeAgent, tool } from '@anthropic-ai/claude-agent-sdk';
import { z } from 'zod';
const agent = new ClaudeAgent({
model: 'claude-opus-4-7',
systemPrompt: 'You are a support triage agent. Classify issues and open tickets.',
tools: [
tool({
name: 'open_ticket',
description: 'Open a support ticket in Linear.',
input: z.object({
title: z.string(),
severity: z.enum(['low', 'medium', 'high']),
body: z.string(),
}),
handler: async ({ title, severity, body }) => {
const ticket = await linear.createIssue({ title, body, priority: severity });
return { ticket_id: ticket.id, url: ticket.url };
},
}),
],
});
for await (const event of agent.run('User reports the dashboard is blank after login.')) {
if (event.type === 'message') console.log(event.text);
if (event.type === 'tool_call') console.log('→', event.name, event.input);
}
Three things worth noting:
Typed tools. Each tool has a Zod schema (or Pydantic in Python). The SDK validates the model's tool calls against the schema before your handler runs. Invalid calls are rejected and the agent retries — your code never sees garbage input.
Streaming events. agent.run returns an async iterator. You get message, tool_call, tool_result, thinking, and end events in real time. Wire them to a UI, a log, or a Slack thread.
One agent, many runs. Once configured, you can call agent.run(...) repeatedly. Each run is isolated unless you explicitly pass a conversation id.
Hooks — the same mechanism as Claude Code
The SDK exposes PreToolUse, PostToolUse, and Notification hooks with the same semantics as Claude Code:
agent.addHook('PreToolUse', async ({ tool, input }) => {
if (tool === 'open_ticket' && input.severity === 'high' && !isBusinessHours()) {
return { decision: 'deny', reason: 'High-severity tickets require human approval outside business hours.' };
}
return { decision: 'approve' };
});
agent.addHook('PostToolUse', async ({ tool, result }) => {
await auditLog.write({ tool, result, ts: Date.now() });
});
This is the same pattern I use in Claude Code — system-level guardrails the model can't bypass. The rule is identical: prompts are suggestions, hooks are contracts.
Subagents — when one agent isn't enough
If your task has sub-parts that benefit from isolated context (different tools, different system prompt, different model), spawn a subagent:
const researcher = agent.subagent({
systemPrompt: 'Research mode. Use web_search, return structured findings.',
tools: [webSearchTool],
model: 'claude-haiku-4-5-20251001',
});
const result = await agent.runWithSubagent(
'Find the three most reliable sources on <topic> and summarize them.',
researcher
);
Subagents don't see the parent's conversation, so they can't drift. They come back with only the information the parent needs. For cost: parent on Opus, subagents on Haiku for cheap throughput — standard pattern.
Context management
A long-running agent accumulates tokens. The SDK ships automatic compaction — when context approaches the limit, older turns are summarized into a compressed form, keeping recent activity verbatim.
const agent = new ClaudeAgent({
model: 'claude-opus-4-7',
// compaction: 'auto', (default)
contextWindowStrategy: {
compactAt: 0.8, // trigger at 80% of the window
keepRecent: 10, // keep last 10 messages verbatim
},
});
Don't disable this unless you have a reason. Un-compacted agents degrade in predictable ways — losing instructions from early turns, hallucinating past decisions, "forgetting" the goal.
Prompt caching — the single biggest cost win
Any agent SDK app should enable prompt caching. A large system prompt + tool schemas + CLAUDE.md-style context evaluates at 10% of the input token price on cache hits. Your second and third calls in a conversation become dramatically cheaper.
const agent = new ClaudeAgent({
model: 'claude-opus-4-7',
systemPrompt: [
{ type: 'text', text: loadLongSystemPrompt(), cache_control: { type: 'ephemeral' } },
],
tools, // tool definitions are cache-eligible too
});
In practice this is the difference between "agent SDK is expensive" and "agent SDK is cheaper than hiring another human for the task." Measure cache hit rate. Aim for >70% on conversation turn 2+.
A production checklist
When I ship an Agent SDK service to production, these are the non-negotiables:
- Hooks for every destructive tool. Anything that writes, deletes, or charges money has a PreToolUse gate.
- Audit log via PostToolUse. Every tool call persisted with input, output, ts, and agent session id.
- Timeouts at every level. Per tool call, per agent run, per session. Agents can loop — don't let them loop forever.
- Prompt caching. See above. Measure, iterate.
- Model tier mapping. Opus for hard tasks, Sonnet for the default loop, Haiku for classification and subagents.
- Structured errors. When a tool fails, return a structured error the agent can reason about, not a raw exception message.
- Session resumability. Store conversation state so an interrupted run can resume instead of restart.
- Per-user permission scoping. Don't ship a single god-agent with all tools available to every caller.
Miss any of these and you'll learn about it the expensive way.
Where this fits in the bigger picture
The Agent SDK is the "roll-your-own" arm of the same framework that Claude Code sits on. Claude Code is the end-user product, the SDK is the primitive. If you understand how Claude Code's hooks and subagent model work, the SDK feels instantly familiar — because it's the same thing, exposed as a library.
If you're planning to build an agent into a product and want a second set of eyes before shipping, I run workshops on exactly this. me@jakubkontra.com