Quick Answer

Citation Ready

In-depth analysis based on v2.1.88 source code. Comprehensive analysis of Claude Code architecture, covering Prompt caching, multi-agent coordination, security permissions, and performance optimization.

Claude Code Engineering Architecture Whitepaper

[email protected]4/1/26...About 6 minAI TechnologyArchitecture DesignClaude CodeAI AgentArchitecture DesignSource Code Analysis

Claude Code Engineering Architecture Whitepaper

Based on v2.1.88 source code in-depth analysis
Document Version: 1.0
Date: 2026-04-01

Executive Summary
Architecture Overview
Core Technology Stack
Prompt Caching Architecture
Tool System Design
Multi-Agent Coordination
Memory System Implementation
Security and Permission Architecture
Performance Optimization
Appendix: Special Features

Executive Summary

Claude Code is Anthropic's production-grade AI programming assistant, whose architecture design represents the engineering benchmark for AI Agent systems in 2026. This document provides an in-depth analysis of its core innovations based on the leaked v2.1.88 source code (~510,000 lines of TypeScript, 1,884 files).

Key Insights:

Excellent AI application layer development is essentially fine-grained exploitation of API caching systems
Architectural evolution from "monolithic thinking" to "cluster concurrent collaboration"
Dynamic permission design using small AI to regulate large AI

Architecture Overview

2.1 System Scale

Metric	Value
Source Files	1,884
Lines of Code	~510,000
Top-level Modules	~40
Built-in Tools	40+
Core Module Size	33 MB

2.2 Dual-Mode Architecture

┌─────────────────────────────────────────────────────────┐
│                    Claude Code                          │
├─────────────────────────┬───────────────────────────────┤
│     REPL Interactive     │        Headless/SDK Mode      │
├─────────────────────────┼───────────────────────────────┤
│ • React + Ink Terminal  │ • QueryEngine Class          │
│ • Human Developer Focus │ • JSON Stream Output         │
│ • Real-time Thinking    │ • Embeddable in IDE/Cursor   │
│ • Tool Progress/Diff    │ • CI/CD Integration          │
└─────────────────────────┴───────────────────────────────┘

2.3 Directory Structure

src/
├── main.tsx              # Entry point (concurrent startup)
├── commands/             # CLI commands (87 modules)
├── components/           # React components (103)
├── services/
│   ├── api/
│   │   └── claude.ts     # Core API (3,419 lines)
│   ├── mcp/              # Model Context Protocol
│   └── analytics/        # Telemetry & GrowthBook
├── tools/                # 40+ tool implementations
├── screens/
│   └── REPL.tsx          # Main UI (5,005 lines)
├── state/                # Zustand-style Store
├── memdir/               # Memory system
├── coordinator/          # Coordinator pattern
├── tasks/                # Multi-agent tasks
├── buddy/                # Tamagotchi easter egg
└── utils/
    ├── permissions/      # Permission system
    ├── swarm/            # Agent swarm
    └── undercover.ts     # Undercover mode

Core Technology Stack

3.1 Technology Selection Decisions

Layer	Technology	Rationale
Language	TypeScript	Type safety, maintainability for large projects
Runtime	Bun	Extreme performance, 3-4x faster than Node.js
CLI Framework	Commander	Mature command-line parsing
UI Layer	React + Ink	Declarative terminal rendering for complex streaming states
State Management	Custom Store	Zustand-style, lightweight

3.2 Why Use React for CLI Tools?

The answer is in screens/REPL.tsx (5,005 lines):

In scenarios with LLM streaming output + multi-tool concurrent execution, terminal UI state management is extremely complex:
Simultaneously rendering thought process
Tool call progress bars
Code diff preview
Multi-agent status monitoring

Declarative React + Minimal Store is the best practice for handling high-frequency partial refreshes.

Prompt Caching Architecture

4.1 Background: Anthropic Prompt Cache Mechanism

Uses prefix matching
Cache hit rate directly determines API costs
Small prompt changes cause cache misses

4.2 Segmented Caching Architecture

System Prompt
├── Static Segment           ← Global cache
│   ├── Model identity intro
│   ├── System safety rules
│   ├── Code style limits
│   └── Basic tool usage guide
│
├── Dynamic Boundary          ← Hard-coded marker
│   SYSTEM_PROMPT_DYNAMIC_BOUNDARY
│
└── Dynamic Segment          ← Session-level cache/no cache
    ├── Current Working Directory (CWD)
    ├── Git status
    ├── MCP instructions
    └── User config

4.3 Cache Optimization Strategies

Optimization	Implementation	Effect
Deterministic Sorting	Tool descriptions sorted alphabetically	Prevents order changes breaking cache
Hash Path Mapping	Config files use content hash as path	Avoids UUID randomness
State Externalization	Agent list moved to Attachments	Reduces 10.2% Cache Creation Tokens

4.4 Core Code Logic

// services/api/claude.ts
function systemPromptSection() {
  // Static segment: globally cacheable
  return asSystemPrompt([
    "You are Claude Code...",
    systemSafetyRules,
    codeStyleLimits,
    toolUsageGuide,
  ]);
}

// Dynamic boundary hard-coded
const DYNAMIC_BOUNDARY = "SYSTEM_PROMPT_DYNAMIC_BOUNDARY";

function dynamicPromptSection() {
  // Dynamic segment: high-frequency changes
  return [
    getCurrentWorkingDir(),
    getGitStatus(),
    getMcpInstructions(),
    getUserConfig(),
  ];
}

Tool System Design

5.1 Factory Pattern Architecture

All tools inherit the base Tool interface:

interface Tool {
  name: string;
  description: string;
  inputSchema: ToolInputJSONSchema;

  // Permission check
  checkPermissions(context: ToolPermissionContext): PermissionResult;

  // Input validation
  validateInput(input: unknown): ValidationResult;

  // Concurrency safety marker
  isConcurrencySafe(): boolean;

  // Execution
  execute(input: unknown, context: ToolContext): Promise<ToolResult>;
}

5.2 ToolSearch Deferred Loading

Problem: 40+ tool descriptions crammed into Prompt makes token costs unacceptable.

Solution:

// Non-core tools marked for deferred loading
const nonCoreTools = {
  defer_loading: true,  // Not exposed in initial Prompt
  searchable: true,     // Discoverable via ToolSearch
};

// Model only knows ToolSearch exists
const toolSearchSchema = {
  name: "ToolSearch",
  description: "Search and load additional tools",
  parameters: {
    query: "Tool keyword to search"
  }
};

Flow:

Model needs additional capability → calls ToolSearch
Dynamically loads target tool description
Subsequent calls to that tool

5.3 StreamingToolExecutor Concurrent Execution

Tool Call Request
    │
    ▼
┌─────────────────────┐
│   ToolOrchestration  │
│   (toolOrchestration.ts) │
└─────────────────────┘
    │
    ├─── Concurrent ───┬─── Sequential ───┐
    │                   │                  │
    ▼                   ▼                  ▼
 Read File A        Edit File X      Read File C
 Read File B    →   Edit File X      Write File D
 Web Search         (same file)      (dependency)
 (no dependency)

Concurrency Safety Determination:

isConcurrencySafe() === true: Can execute simultaneously
Operating on same resource: Forced sequential

5.4 Large Result Set Management

const MAX_RESULT_CHARS = 100000;  // 100KB budget

function handleLargeResult(result: string): ToolResult {
  if (result.length > MAX_RESULT_CHARS) {
    // Truncate and persist to temp file
    const tempFile = persistToTempFile(result);
    return {
      summary: result.slice(0, 1000) + "...",
      fullResultLocation: tempFile,
      size: result.length,
    };
  }
  return { content: result };
}

Multi-Agent Coordination

6.1 Fatal Flaws of Monolithic Agents

When executing complex tasks (cross-file bug investigation):

Model repeatedly reads wrong files
Attempts wrong commands
Garbage context rapidly pollutes main conversation
Leads to "schizophrenia" or forgetting initial goals

6.2 Coordinator Pattern

Architecture Refactoring:

┌─────────────────────────────────────────┐
│           Coordinator                    │
│  ┌─────────────────────────────────┐   │
│  │ Available Tools:                │   │
│  │ • Agent (spawn sub-agent)       │   │
│  │ • SendMessage                   │   │
│  │ • TaskStop                      │   │
│  │                                 │   │
│  │ Responsibility: Workflow        │   │
│  │ Research → Synthesis →          │   │
│  │ Implementation → Verification   │   │
│  └─────────────────────────────────┘   │
└─────────────────────────────────────────┘
           │
           │ Fork Sub-Agent
           ▼
┌─────────────────────────────────────────┐
│           Worker                         │
│  • Carries specific tool descriptions   │
│  • Executes in isolated context         │
│  • Shares parent Prompt Cache           │
│  • Destroy after use, only keep summary │
└─────────────────────────────────────────┘

6.3 Fork Inheritance Mechanism

// Child agent inherits parent cache
const childAgent = forkAgent(parentContext, {
  inheritCache: true,        // Share Prompt Cache, save costs
  isolatedContext: true,     // Isolate subsequent operations
  returnFormat: 'synthesis', // Only return distilled summary
});

// Return results via XML format
const result = await childAgent.run(task);
/*
<task-notification>
  <synthesis>
    Key finding: Issue in auth.ts line 145
    Related files: src/auth.ts, src/middleware.ts
    Suggested fix: ...
  </synthesis>
</task-notification>
*/

6.4 Swarm Cluster Mode (`utils/swarm/`)

in_process_teammate task type:

Main Process (Leader)
    │
    ├── Parallel Wake ──┬─── Teammate A ───┐
    │                   ├─── Teammate B ───┤
    │                   └─── Teammate C ───┘
    │
    ▼
Leader Permission Bridge (permissionSync.ts)
    │
    └── Unified handling of all permission prompts

Engineering Challenges & Solutions:

Challenge	Solution
Permission prompt conflicts	Leader permission bridge, unified interception
UI rendering chaos	iTerm2 AppleScript automatic pane splitting

Marks AI's official evolution from "monolithic thinking" to "cluster concurrent collaboration".

Memory System Implementation

7.1 Retro Architecture Design

Unexpected choice: Completely based on local file system, not vector database.

memdir/
├── MEMORY.md              # High-level index (≤ 200 lines / 25KB)
├── user_role.md
├── feedback_testing.md
├── project_deadline.md
└── reference_api_keys.md

Memory Types (Frontmatter format):

---
name: user_role
description: User role and preferences
type: user
---

User is senior backend engineer, new to frontend...

Type	Purpose
`user`	User role, preferences
`feedback`	User feedback and corrections
`project`	Project context, decisions
`reference`	External system pointers

7.2 KAIROS Assistant Mode (Unreleased)

Long-running (Daemon) Mode:

Daytime Runtime
    │
    └── Log append mode
        └── logs/2026/04/2026-04-01.md

Nighttime/Idle
    │
    └── DreamTask (dreaming agent)
        ├── Read daily logs
        ├── Summarize and distill
        └── Extract to structured long-term memory

Advantages:

Bypasses vector retrieval recall rate pain points
Represents edge AI evolving toward "always online, continuous learning"

Security and Permission Architecture

8.1 Multi-Layer Protection System

┌─────────────────────────────────────────┐
│ Layer 1: Sandbox Isolation              │
│ @anthropic-ai/sandbox-runtime           │
│ • File access restrictions              │
│ • Network access control                │
└─────────────────────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────┐
│ Layer 2: Dangerous Operation Blocking   │
│ • git push --force                      │
│ • rm -rf /                              │
│ • Sensitive API calls                   │
└─────────────────────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────┐
│ Layer 3: YOLO Classifier                │
│ Side query mechanism using small model  │
│ to assess risk                          │
└─────────────────────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────┐
│ Fallback: Graceful Degradation          │
│ Denial Tracking → Manual confirmation   │
└─────────────────────────────────────────┘

8.2 YOLO Classifier Dynamic Permissions

Problem: Static rules can't handle complex scenarios.

Solution: Side Query mechanism

// yoloClassifier.ts
async function classifyCommand(
  transcript: ConversationTranscript,
  command: string
): Promise<'Allow' | 'Deny'> {
  // Call smaller, cheaper LLM
  const result = await smallLLM.complete({
    prompt: `
      Conversation context: ${transcript.summary}
      About to execute: ${command}
      Assess risk, reply Allow or Deny
    `,
    model: 'claude-haiku',
  });
  return result.trim() as 'Allow' | 'Deny';
}

Denial Tracking:

Tracks frequency of automatic tool denials
Exceeds threshold → gracefully degrades to manual confirmation

8.3 Undercover Mode (`utils/undercover.ts`)

Target Scenario: Anthropic employees working on open/public repositories

Activation Conditions:

CLAUDE_CODE_UNDERCOVER=1 forced on
Or auto-detection: non-internal repo → auto-on
No force-OFF option

Constraints:

## UNDERCOVER MODE — CRITICAL

- Prohibit exposing model identity
- Prohibit internal codenames (Capybara, Tengu...)
- Prohibit version numbers (opus-4-7, sonnet-4-8)
- Prohibit "Claude Code" text
- Prohibit Co-Authored-By lines
- Write commits in human developer style

Performance Optimization

9.1 Startup Optimization

Concurrent strategy in main.tsx:

// 1. Mark entry time
profileCheckpoint('main_tsx_entry');

// 2. Parallel MDM config reading
startMdmRawRead();  // Subprocess

// 3. Parallel keychain prefetch
startKeychainPrefetch();  // OAuth + legacy API key

// 4. Main module loading (~135ms)
//    Above I/O operations run in parallel with this

Effect: Avoids 65ms of synchronous sequential blocking

9.2 Runtime Optimization

Strategy	Implementation
Prompt Cache	Byte-level fine-tuning, segmented caching
Tool Deferred Loading	ToolSearch mechanism, reduces initial tokens
Concurrent Execution	StreamingToolExecutor partitioned scheduling
Result Truncation	Large results persisted to file
Fork Reuse	Child agents share parent cache

Appendix: Special Features

A.1 Buddy System Tamagotchi

// buddy/sprites.ts
const SPECIES = [
  'duck', 'goose', 'blob', 'cat', 'dragon',
  'octopus', 'owl', 'penguin', 'axolotl', 'capybara',
  // ... 18 more
];

const RARITY = {
  common: 60%,
  uncommon: 25%,
  rare: 10%,
  epic: 4%,
  legendary: 1%,
};

const ATTRIBUTES = [
  'DEBUGGING', 'PATIENCE', 'CHAOS', 'WISDOM', 'SNARK'
];

const ACCESSORIES = [
  'crown', 'tophat', 'propeller', 'halo', 'wizard', 'beanie'
];

// 1% chance shiny

Generation Algorithm: User ID + Mulberry32 PRNG → deterministic but "random" pet

A.2 Internal Build Differences

Feature	Internal Staff (`USER_TYPE='ant'`)	External Users
Code Style	"Don't add features without permission" "Don't refactor without request" "Three similar lines beats premature abstraction" "Default to no comments"	"Get straight to the point" "Try the simplest approach" "Be concise"
Undercover Mode	Forced on	None
Internal Codenames	Visible	Filtered
Easter Eggs	All	Partial

A.3 Bypassing Compliance Scanning Humor

To prevent detection of internal model codenames (like Capybara):

// Use String.fromCharCode dynamic assembly
const secretCodeName = String.fromCharCode(67, 97, 112, 121, 98, 97, 114, 97);
// Decodes to: "Capybara"

Conclusion

Claude Code's architecture design reveals the true barriers for LLM application layer startups:

Extreme Token Cost Optimization — Fine-grained Prompt Cache operations
Multi-State Machine Streaming Coordination — Coordinator and Fork mechanisms
Balance Between Safety Intervention and UX — YOLO Classifier dynamic permissions
Deep OS Integration — Terminal automation, permission bridging

Industry Insight:

The era of simply piecing together prompts, stacking vector databases, and wrapping simple loop shells is over. True competitiveness is built on deep understanding of systems engineering.

Document based on leaked source code analysis, for educational and research purposes only.