Autonomous Agent Loop
24/7 background execution loop — pulls tasks from queue, spawns workers, monitors status, retries on failure, dead-letters when exhausted.
The agent loop lives in src/services/autonomous/agentLoop.ts — the single consumer that reads the task queue and manages the lifecycle of all worker processes.
Techniques & Principles
Why a Loop instead of a Message Queue?
Message queue systems like RabbitMQ/Kafka carry setup and maintenance overhead. The agent loop uses a JSON file as a persistent queue — zero external dependencies:
- Zero external deps — uses
fs.watchto detect queue file changes - File-based atomicity — single-file read/write under lock allows multiple processes to compete for consumption
- Self-debouncing —
ourWriteInProgressflag prevents self-triggered re-reads - Cross-platform — JSON files work everywhere, no broker installation needed
Lease-Based Concurrency
The classic distributed worker problem: two workers see the same task and duplicate work. Leases solve this without a distributed lock:
Main Loop Queue File (.json) Worker Process
│ │ │
│ getNextTask() │ │
│─────────────────────────────►│ │
│◄─── task {id, status:pending}│ │
│ │ │
│ leaseTask(id, agentId) │ │
│─────────────────────────────►│ │
│ (atomic: check no lease │ │
│ → write leaseOwner + │ │
│ leaseExpiresAt) │ │
│◄─────── true (leased) ──────│ │
│ │ │
│ spawnWorker(prompt) │ │
│─────────────────────────────────────────────────────────────►│
│◄──── WorkerSession {id, pid} │ │
│ │ │
│ â•â•â• LOOP â•â•╠│ │
│ while running: │ │
│ checkWorker(sessionId) │ │
│ ────────────────────────────────────────────────────────►│
│ ◄─── "running" / "completed" / "failed" │
│ │ │
│ [completed] → releaseLease │ │
│ → markCompleted │ │
│ │ │
│ [failed] → markFailed │ │
│ → retryTask() │ │
│ → stopWorker() │ │
│ │ │
│ [timeout 30m] → stopWorker │ │
│ → releaseLease│ │
│ → retryTask() │ │
Retry with Exponential Backoff
Retry attempt Backoff delay Cumulative wait
───────────── ───────────── ──────────────
1 base × 2¹ = 30s 30s
2 base × 2² = 60s 90s
3 base × 2³ = 120s 210s
4 base × 2ⴠ= 240s 450s
5 (max) base × 2ⵠ= 480s 930s (~15 min)
After max retries → dead_letter queue
Dead-letter preserves: title, description, lastError, errorLog, retryCount
- Exponential backoff — base = 15s, factor = 2
- Max retries — 5 per task (default)
- Dead-letter queue — exhausted tasks are moved to
dead_letterstatus with reason + error log
Worker Lifecycle + Concurrent Cap
- MAX_CONCURRENT_WORKERS = 3 — prevents resource exhaustion
- Worker timeout = 30 min — long-running tasks are killed to free resources
- Worker poll = 10s — status checks via supervisor IPC
- Loop sleep = 5s — idle interval when no tasks or workers full
Supervisor Integration
Workers are spawned through a Supervisor process (child_process) for crash isolation:
- Crash isolation — worker crash doesn't crash the loop
- Output capture — stdout/stderr saved to
~/.claude/daemon/jobs/{sessionId}/output.log - Health via IPC — loop checks status through supervisor, not raw PID (avoids PID reuse bugs)
Integration Points
- Peer todo listener — receives tasks from remote peers via
/peer-todoHTTP endpoint → adds to queue - Cron scheduler — fires scheduled tasks → adds to queue
- File watcher —
fs.watchon queue file detects new tasks from other processes (e.g. CLI/task add)
Crash Recovery
Previous Run Crash
│
â–¼
startLoop() called
│
├── loadQueue() — restore queue from disk
├── sleep(2000ms) — ensure old process is dead
├── expireLeases() — clear all stale leases
├── start heartbeat (every 60s)
├── start cron scheduler
├── start peer sharing
├── start file watcher
└── MAIN LOOP ──► getNextTask() → processTask() → loop
Task Lifecycle (State Machine)
┌──────────â”
│ pending │◄──────────────────────────────â”
└────┬─────┘ │
│ leaseTask() │
▼ │
┌──────────┠│
│in_progress│ │
└────┬─────┘ │
┌───────────┼───────────┠│
▼ ▼ ▼ │
┌──────────┠┌──────────┠┌──────────┠│
│completed │ │ failed │ │cancelled │ │
└──────────┘ └────┬─────┘ └──────────┘ │
│ retryTask() │
├── retryCount < max → backoff → ────┘
│
└── retryCount ≥ max
│
â–¼
┌──────────────â”
│ dead_letter │
│ (preserved │
│ for review) │
└──────────────┘
Related Files
| File | Role |
|---|---|
src/services/autonomous/agentLoop.ts | Main loop — start, stop, processTask, worker lifecycle |
src/services/autonomous/taskQueue.ts | Queue CRUD, lease management, retry, dead-letter, file watcher |
src/services/autonomous/daemonMode.ts | Daemon entry point — calls startLoop/stopLoop |
src/Task.ts | Task type definitions, state machine, task ID generation |
src/tasks/LocalAgentTask/ | Local worker task — UI + lifecycle |
src/tasks/RemoteAgentTask/ | Remote worker task — UI + lifecycle |
src/components/AutonomousExecutionAccordion.tsx | UI component for task queue display in REPL |