Claude Code CLI as agent runtime: a pattern instead of a custom framework
Use the claude CLI as a subprocess for an agent runtime — subscription pricing, native tools, prompt caching, model swap without re-engineering.
Claude Code CLI as agent runtime: why we did not write a custom framework
When you build an AI agent that must read and write files, call tools, keep context across steps, the dominant option in 2025-2026 was: write a custom framework on top of an LLM SDK. Thousands of teams invested months in this direction. We chose an alternative path: we use the Claude Code CLI as a subprocess of our agent. The benefits proved substantial: subscription cost, native tools, free prompt caching, model swap with a flag.
This article describes the pattern, the reasons for the choice, and the pitfalls identified in operation.
TL;DR
- A custom agent on top of an SDK has to reimplement: tool registration, context management, prompt caching, error recovery, multi-turn loop.
- The
claudeCLI already contains all these, reliably, behind a stable interface. - Called as a subprocess with input on stdin and captured output, it becomes a complete agent runtime.
- Key benefits: subscription instead of per-token cost, native tool integration, model swap with a flag, delegated context scrolling.
- Pitfalls: long session handling, output formatting for parsing, sandbox security.
The “agent on top of SDK” problem
Building a custom agent on top of an LLM SDK seems straightforward, but it hides considerable complexity:
Tool registration and execution. You must map tool names to functions, serialize/deserialize arguments, handle execution errors, map results back to the model format.
Multi-turn loop. An agent does: model responds, calls tool, receives result, model responds again. This loop must be written with proper error handling, stop on token limit, stop on iteration limit, recover on rate limit.
Context management. As the conversation grows, you must decide what to keep, summarize, archive. This logic is non-trivial.
Prompt caching. Recent models offer caching for large system prompts. You must correctly mark cacheable blocks, monitor hit rate, manage invalidation.
Rate limiting and retry. Exponential backoff, distinguishing temporary vs permanent errors, fallback to alternate model.
File system access. If the agent reads/writes files, you need sandbox, permission management, path traversal protection.
Conservative estimate: 3-6 months of work to build a robust custom agent. Plus continuous maintenance.
The solution: claude CLI as subprocess
Anthropic distributes claude as a complete CLI: local install, authentication via the user’s account, ability to read and write files, internal tool calling, automatic context management. The CLI is already a complete agent, not just a wrapper around the API.
Our pattern:
import subprocess
def run_claude_session(task_prompt: str, working_dir: str) -> str:
proc = subprocess.run(
["claude", "--non-interactive", "--allow-tools", "Read,Write,Bash"],
input=task_prompt,
capture_output=True,
text=True,
cwd=working_dir,
timeout=600,
)
if proc.returncode != 0:
raise ClaudeRuntimeError(proc.stderr)
return proc.stdout
Our application-level call does not see the Claude API directly. It sees a subprocess receiving a task, working with the file system, returning a result.
Practical benefits
1. Subscription pricing instead of per-token cost. A Claude Pro / Max account has a fixed monthly cost and allows intensive use. For an agent doing tens or hundreds of invocations per day, the monthly cost is predictable and usually lower than the per-token bill. For design-phase work, where each invocation can consume many tokens, this saving is significant.
2. Native tools. The CLI already has tools for: Read, Write, Edit, Bash, Glob, Grep, WebFetch. We do not have to reimplement them. Our application receives the final output; the coordination work happened inside the CLI.
3. Automatic context management. The CLI handles context compaction near the limit. We do not write custom logic to decide what to keep.
4. Prompt caching. The CLI uses prompt caching for system prompt and tools. The economic benefit appears automatically, without manual configuration.
5. Model swap. claude --model sonnet-X changes the model. The application does not change. For cost-aware routing, we pass the model as a parameter.
6. Native audit log. The CLI writes session history to jsonl on disk. For audit, we read these files.
7. Updates without re-engineering. When Anthropic improves the CLI (new tools, performance, fixes), the application benefits immediately. Our team does not maintain CLI-level code.
How it fits in our architecture
In the agent-orchestrator architecture described in previous articles, an orchestrator receives a request, builds a plan, asks for approval, and after approval executes. The step-by-step execution phase is where the CLI appears.
User: requests via Telegram → orchestrator (premium model via API) →
produces plan → asks for approval via Telegram →
user approves →
orchestrator starts a `claude` subprocess with the instruction "execute step 1" →
Claude CLI works: reads files, writes configurations, calls shell, reports →
orchestrator reads output, validates, moves to step 2 →
...
The orchestrator is the brain; the CLI is the executor. Communication is through stdin/stdout/file system.
Identified pitfalls
Pitfall 1: long sessions exceeding context. A long claude session consumes context. If the step to execute is very large, the CLI can hit the limit. Solution: decompose the plan into smaller steps, each step in a separate CLI session. Communication between steps via the file system and parameters passed to the next session.
Pitfall 2: parsing output for the application. The CLI produces natural text in the user’s language. The application wants structures. Solution: ask the CLI to write structured output to a specific file (e.g., /tmp/claude_result.json), and the application reads the file. Or use --output-format json mode if available.
Pitfall 3: sandbox security. The CLI has file system access. For an agent running on our infrastructure, that is OK. For an agent serving external clients, it is mandatory to run in an isolated container per request, with volumes mounted only to relevant directories.
Pitfall 4: non-deterministic errors. Two claude invocations with identical input can produce different output (the language model has stochasticity). For testing, use --temperature 0 and check only invariants (was file X created? Yes/No).
Pitfall 5: dependency on CLI version. A CLI update can change behavior. Solution: pin the CLI version in deploy, test before upgrade, include the version in the audit log.
Pitfall 6: pricing tier limits. Subscription accounts have daily usage limits. If your agent needs large volumes, either scale across multiple accounts, or use API directly. Multiple accounts are OK only if the provider’s terms of service allow.
Comparison with existing frameworks
Frameworks like LangChain, LlamaIndex, AutoGen, CrewAI have merit for certain cases (research, fast prototyping, specific scenarios). For a production ops agent, our observations after evaluation:
- Frameworks add an abstraction layer that complicates debugging
- Their design decisions do not always align with our requirements (e.g., parallel vs serial tool calling)
- Framework updates can break the custom agent
- Documentation tends to be premature (or retroactive) on old versions
Using the CLI directly removes this layer. The application communicates directly with the official Anthropic CLI. Fewer moving parts.
When the pattern does not fit
The claude CLI runtime pattern is not universal:
- If you run exclusively on a non-Claude model, the CLI does not apply (use a similar CLI if it exists, or a direct API adapter).
- If per-request latency must be under 1 second, subprocess startup time can be prohibitive. For slower requests (5-30 sec), it is insignificant.
- If you have a strict on-prem requirement with no cloud calls, the Claude CLI calls Anthropic’s cloud API. For full on-prem, use a local model and matching CLI or a direct adapter.
In other cases, the pattern captures more of what is needed for an agent than most frameworks.
Conclusion
Building an AI agent on top of a mature CLI (Claude Code, or equivalents for other models) reduces development time from months to days. Give up the fantasy of controlling every aspect of the LLM loop and benefit from the work of the team that built the CLI. Your application focuses on what is unique — orchestration, business logic, specific integrations — instead of reinventing tool calling infrastructure.
This is the pattern we recommend to clients who do not have strict requirements to avoid Anthropic. For clients with such requirements, BYO-LLM with minimal adapters is the alternative.
Related articles
- BYO-LLM pattern with minimal adapters
- Cost-aware LLM routing
- The propose-then-act architecture for AI agents
- Pillar IRIS — the CAI Technology orchestrator agent
External sources
- Claude Code documentation — official reference for CLI and native tools
- Anthropic Messages API reference — for cases when you use API directly
- Anthropic prompt caching — caching used natively by CLI
- Python subprocess documentation — standard pattern for calling CLIs from an application
- “Constitutional AI” — Anthropic, arXiv 2212.08073 — design context of the models used by the CLI
Next step
If your team is evaluating an AI agent architecture and is curious whether the CLI-as-runtime pattern fits, we offer a 30-minute technical consultation at no cost.