In-depth analysis based on v2.1.88 source code. Comprehensive analysis of Claude Code architecture, covering Prompt caching, multi-agent coordination, security permissions, and performance optimization.
Claude Code Engineering Architecture Whitepaper
Claude Code Engineering Architecture Whitepaper
Based on v2.1.88 source code in-depth analysis
Document Version: 1.0
Date: 2026-04-01
Table of Contents
- Executive Summary
- Architecture Overview
- Core Technology Stack
- Prompt Caching Architecture
- Tool System Design
- Multi-Agent Coordination
- Memory System Implementation
- Security and Permission Architecture
- Performance Optimization
- Appendix: Special Features
Executive Summary
Claude Code is Anthropic's production-grade AI programming assistant, whose architecture design represents the engineering benchmark for AI Agent systems in 2026. This document provides an in-depth analysis of its core innovations based on the leaked v2.1.88 source code (~510,000 lines of TypeScript, 1,884 files).
Key Insights:
- Excellent AI application layer development is essentially fine-grained exploitation of API caching systems
- Architectural evolution from "monolithic thinking" to "cluster concurrent collaboration"
- Dynamic permission design using small AI to regulate large AI
Architecture Overview
2.1 System Scale
| Metric | Value |
|---|---|
| Source Files | 1,884 |
| Lines of Code | ~510,000 |
| Top-level Modules | ~40 |
| Built-in Tools | 40+ |
| Core Module Size | 33 MB |
2.2 Dual-Mode Architecture
┌─────────────────────────────────────────────────────────┐
│ Claude Code │
├─────────────────────────┬───────────────────────────────┤
│ REPL Interactive │ Headless/SDK Mode │
├─────────────────────────┼───────────────────────────────┤
│ • React + Ink Terminal │ • QueryEngine Class │
│ • Human Developer Focus │ • JSON Stream Output │
│ • Real-time Thinking │ • Embeddable in IDE/Cursor │
│ • Tool Progress/Diff │ • CI/CD Integration │
└─────────────────────────┴───────────────────────────────┘2.3 Directory Structure
src/
├── main.tsx # Entry point (concurrent startup)
├── commands/ # CLI commands (87 modules)
├── components/ # React components (103)
├── services/
│ ├── api/
│ │ └── claude.ts # Core API (3,419 lines)
│ ├── mcp/ # Model Context Protocol
│ └── analytics/ # Telemetry & GrowthBook
├── tools/ # 40+ tool implementations
├── screens/
│ └── REPL.tsx # Main UI (5,005 lines)
├── state/ # Zustand-style Store
├── memdir/ # Memory system
├── coordinator/ # Coordinator pattern
├── tasks/ # Multi-agent tasks
├── buddy/ # Tamagotchi easter egg
└── utils/
├── permissions/ # Permission system
├── swarm/ # Agent swarm
└── undercover.ts # Undercover modeCore Technology Stack
3.1 Technology Selection Decisions
| Layer | Technology | Rationale |
|---|---|---|
| Language | TypeScript | Type safety, maintainability for large projects |
| Runtime | Bun | Extreme performance, 3-4x faster than Node.js |
| CLI Framework | Commander | Mature command-line parsing |
| UI Layer | React + Ink | Declarative terminal rendering for complex streaming states |
| State Management | Custom Store | Zustand-style, lightweight |
3.2 Why Use React for CLI Tools?
The answer is in screens/REPL.tsx (5,005 lines):
In scenarios with LLM streaming output + multi-tool concurrent execution, terminal UI state management is extremely complex:
- Simultaneously rendering thought process
- Tool call progress bars
- Code diff preview
- Multi-agent status monitoring
Declarative React + Minimal Store is the best practice for handling high-frequency partial refreshes.
Prompt Caching Architecture
4.1 Background: Anthropic Prompt Cache Mechanism
- Uses prefix matching
- Cache hit rate directly determines API costs
- Small prompt changes cause cache misses
4.2 Segmented Caching Architecture
System Prompt
├── Static Segment ← Global cache
│ ├── Model identity intro
│ ├── System safety rules
│ ├── Code style limits
│ └── Basic tool usage guide
│
├── Dynamic Boundary ← Hard-coded marker
│ SYSTEM_PROMPT_DYNAMIC_BOUNDARY
│
└── Dynamic Segment ← Session-level cache/no cache
├── Current Working Directory (CWD)
├── Git status
├── MCP instructions
└── User config4.3 Cache Optimization Strategies
| Optimization | Implementation | Effect |
|---|---|---|
| Deterministic Sorting | Tool descriptions sorted alphabetically | Prevents order changes breaking cache |
| Hash Path Mapping | Config files use content hash as path | Avoids UUID randomness |
| State Externalization | Agent list moved to Attachments | Reduces 10.2% Cache Creation Tokens |
4.4 Core Code Logic
// services/api/claude.ts
function systemPromptSection() {
// Static segment: globally cacheable
return asSystemPrompt([
"You are Claude Code...",
systemSafetyRules,
codeStyleLimits,
toolUsageGuide,
]);
}
// Dynamic boundary hard-coded
const DYNAMIC_BOUNDARY = "SYSTEM_PROMPT_DYNAMIC_BOUNDARY";
function dynamicPromptSection() {
// Dynamic segment: high-frequency changes
return [
getCurrentWorkingDir(),
getGitStatus(),
getMcpInstructions(),
getUserConfig(),
];
}Tool System Design
5.1 Factory Pattern Architecture
All tools inherit the base Tool interface:
interface Tool {
name: string;
description: string;
inputSchema: ToolInputJSONSchema;
// Permission check
checkPermissions(context: ToolPermissionContext): PermissionResult;
// Input validation
validateInput(input: unknown): ValidationResult;
// Concurrency safety marker
isConcurrencySafe(): boolean;
// Execution
execute(input: unknown, context: ToolContext): Promise<ToolResult>;
}5.2 ToolSearch Deferred Loading
Problem: 40+ tool descriptions crammed into Prompt makes token costs unacceptable.
Solution:
// Non-core tools marked for deferred loading
const nonCoreTools = {
defer_loading: true, // Not exposed in initial Prompt
searchable: true, // Discoverable via ToolSearch
};
// Model only knows ToolSearch exists
const toolSearchSchema = {
name: "ToolSearch",
description: "Search and load additional tools",
parameters: {
query: "Tool keyword to search"
}
};Flow:
- Model needs additional capability → calls ToolSearch
- Dynamically loads target tool description
- Subsequent calls to that tool
5.3 StreamingToolExecutor Concurrent Execution
Tool Call Request
│
▼
┌─────────────────────┐
│ ToolOrchestration │
│ (toolOrchestration.ts) │
└─────────────────────┘
│
├─── Concurrent ───┬─── Sequential ───┐
│ │ │
▼ ▼ ▼
Read File A Edit File X Read File C
Read File B → Edit File X Write File D
Web Search (same file) (dependency)
(no dependency)Concurrency Safety Determination:
isConcurrencySafe() === true: Can execute simultaneously- Operating on same resource: Forced sequential
5.4 Large Result Set Management
const MAX_RESULT_CHARS = 100000; // 100KB budget
function handleLargeResult(result: string): ToolResult {
if (result.length > MAX_RESULT_CHARS) {
// Truncate and persist to temp file
const tempFile = persistToTempFile(result);
return {
summary: result.slice(0, 1000) + "...",
fullResultLocation: tempFile,
size: result.length,
};
}
return { content: result };
}Multi-Agent Coordination
6.1 Fatal Flaws of Monolithic Agents
When executing complex tasks (cross-file bug investigation):
- Model repeatedly reads wrong files
- Attempts wrong commands
- Garbage context rapidly pollutes main conversation
- Leads to "schizophrenia" or forgetting initial goals
6.2 Coordinator Pattern
Architecture Refactoring:
┌─────────────────────────────────────────┐
│ Coordinator │
│ ┌─────────────────────────────────┐ │
│ │ Available Tools: │ │
│ │ • Agent (spawn sub-agent) │ │
│ │ • SendMessage │ │
│ │ • TaskStop │ │
│ │ │ │
│ │ Responsibility: Workflow │ │
│ │ Research → Synthesis → │ │
│ │ Implementation → Verification │ │
│ └─────────────────────────────────┘ │
└─────────────────────────────────────────┘
│
│ Fork Sub-Agent
▼
┌─────────────────────────────────────────┐
│ Worker │
│ • Carries specific tool descriptions │
│ • Executes in isolated context │
│ • Shares parent Prompt Cache │
│ • Destroy after use, only keep summary │
└─────────────────────────────────────────┘6.3 Fork Inheritance Mechanism
// Child agent inherits parent cache
const childAgent = forkAgent(parentContext, {
inheritCache: true, // Share Prompt Cache, save costs
isolatedContext: true, // Isolate subsequent operations
returnFormat: 'synthesis', // Only return distilled summary
});
// Return results via XML format
const result = await childAgent.run(task);
/*
<task-notification>
<synthesis>
Key finding: Issue in auth.ts line 145
Related files: src/auth.ts, src/middleware.ts
Suggested fix: ...
</synthesis>
</task-notification>
*/6.4 Swarm Cluster Mode (utils/swarm/)
in_process_teammate task type:
Main Process (Leader)
│
├── Parallel Wake ──┬─── Teammate A ───┐
│ ├─── Teammate B ───┤
│ └─── Teammate C ───┘
│
▼
Leader Permission Bridge (permissionSync.ts)
│
└── Unified handling of all permission promptsEngineering Challenges & Solutions:
| Challenge | Solution |
|---|---|
| Permission prompt conflicts | Leader permission bridge, unified interception |
| UI rendering chaos | iTerm2 AppleScript automatic pane splitting |
Marks AI's official evolution from "monolithic thinking" to "cluster concurrent collaboration".
Memory System Implementation
7.1 Retro Architecture Design
Unexpected choice: Completely based on local file system, not vector database.
memdir/
├── MEMORY.md # High-level index (≤ 200 lines / 25KB)
├── user_role.md
├── feedback_testing.md
├── project_deadline.md
└── reference_api_keys.mdMemory Types (Frontmatter format):
---
name: user_role
description: User role and preferences
type: user
---
User is senior backend engineer, new to frontend...| Type | Purpose |
|---|---|
user | User role, preferences |
feedback | User feedback and corrections |
project | Project context, decisions |
reference | External system pointers |
7.2 KAIROS Assistant Mode (Unreleased)
Long-running (Daemon) Mode:
Daytime Runtime
│
└── Log append mode
└── logs/2026/04/2026-04-01.md
Nighttime/Idle
│
└── DreamTask (dreaming agent)
├── Read daily logs
├── Summarize and distill
└── Extract to structured long-term memoryAdvantages:
- Bypasses vector retrieval recall rate pain points
- Represents edge AI evolving toward "always online, continuous learning"
Security and Permission Architecture
8.1 Multi-Layer Protection System
┌─────────────────────────────────────────┐
│ Layer 1: Sandbox Isolation │
│ @anthropic-ai/sandbox-runtime │
│ • File access restrictions │
│ • Network access control │
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Layer 2: Dangerous Operation Blocking │
│ • git push --force │
│ • rm -rf / │
│ • Sensitive API calls │
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Layer 3: YOLO Classifier │
│ Side query mechanism using small model │
│ to assess risk │
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Fallback: Graceful Degradation │
│ Denial Tracking → Manual confirmation │
└─────────────────────────────────────────┘8.2 YOLO Classifier Dynamic Permissions
Problem: Static rules can't handle complex scenarios.
Solution: Side Query mechanism
// yoloClassifier.ts
async function classifyCommand(
transcript: ConversationTranscript,
command: string
): Promise<'Allow' | 'Deny'> {
// Call smaller, cheaper LLM
const result = await smallLLM.complete({
prompt: `
Conversation context: ${transcript.summary}
About to execute: ${command}
Assess risk, reply Allow or Deny
`,
model: 'claude-haiku',
});
return result.trim() as 'Allow' | 'Deny';
}Denial Tracking:
- Tracks frequency of automatic tool denials
- Exceeds threshold → gracefully degrades to manual confirmation
8.3 Undercover Mode (utils/undercover.ts)
Target Scenario: Anthropic employees working on open/public repositories
Activation Conditions:
CLAUDE_CODE_UNDERCOVER=1forced on- Or auto-detection: non-internal repo → auto-on
- No force-OFF option
Constraints:
## UNDERCOVER MODE — CRITICAL
- Prohibit exposing model identity
- Prohibit internal codenames (Capybara, Tengu...)
- Prohibit version numbers (opus-4-7, sonnet-4-8)
- Prohibit "Claude Code" text
- Prohibit Co-Authored-By lines
- Write commits in human developer stylePerformance Optimization
9.1 Startup Optimization
Concurrent strategy in main.tsx:
// 1. Mark entry time
profileCheckpoint('main_tsx_entry');
// 2. Parallel MDM config reading
startMdmRawRead(); // Subprocess
// 3. Parallel keychain prefetch
startKeychainPrefetch(); // OAuth + legacy API key
// 4. Main module loading (~135ms)
// Above I/O operations run in parallel with thisEffect: Avoids 65ms of synchronous sequential blocking
9.2 Runtime Optimization
| Strategy | Implementation |
|---|---|
| Prompt Cache | Byte-level fine-tuning, segmented caching |
| Tool Deferred Loading | ToolSearch mechanism, reduces initial tokens |
| Concurrent Execution | StreamingToolExecutor partitioned scheduling |
| Result Truncation | Large results persisted to file |
| Fork Reuse | Child agents share parent cache |
Appendix: Special Features
A.1 Buddy System Tamagotchi
// buddy/sprites.ts
const SPECIES = [
'duck', 'goose', 'blob', 'cat', 'dragon',
'octopus', 'owl', 'penguin', 'axolotl', 'capybara',
// ... 18 more
];
const RARITY = {
common: 60%,
uncommon: 25%,
rare: 10%,
epic: 4%,
legendary: 1%,
};
const ATTRIBUTES = [
'DEBUGGING', 'PATIENCE', 'CHAOS', 'WISDOM', 'SNARK'
];
const ACCESSORIES = [
'crown', 'tophat', 'propeller', 'halo', 'wizard', 'beanie'
];
// 1% chance shinyGeneration Algorithm: User ID + Mulberry32 PRNG → deterministic but "random" pet
A.2 Internal Build Differences
| Feature | Internal Staff (USER_TYPE='ant') | External Users |
|---|---|---|
| Code Style | "Don't add features without permission" "Don't refactor without request" "Three similar lines beats premature abstraction" "Default to no comments" | "Get straight to the point" "Try the simplest approach" "Be concise" |
| Undercover Mode | Forced on | None |
| Internal Codenames | Visible | Filtered |
| Easter Eggs | All | Partial |
A.3 Bypassing Compliance Scanning Humor
To prevent detection of internal model codenames (like Capybara):
// Use String.fromCharCode dynamic assembly
const secretCodeName = String.fromCharCode(67, 97, 112, 121, 98, 97, 114, 97);
// Decodes to: "Capybara"Conclusion
Claude Code's architecture design reveals the true barriers for LLM application layer startups:
- Extreme Token Cost Optimization — Fine-grained Prompt Cache operations
- Multi-State Machine Streaming Coordination — Coordinator and Fork mechanisms
- Balance Between Safety Intervention and UX — YOLO Classifier dynamic permissions
- Deep OS Integration — Terminal automation, permission bridging
Industry Insight:
The era of simply piecing together prompts, stacking vector databases, and wrapping simple loop shells is over. True competitiveness is built on deep understanding of systems engineering.
Document based on leaked source code analysis, for educational and research purposes only.