# Smithers — full documentation

> Durable AI workflow orchestration as a JSX runtime.
> Repo: github.com/smithersai/smithers · Package: smithers-orchestrator (npm)

This is the complete agent-facing Smithers documentation in one file. It is the concatenation of every fragment listed in /llms.txt.

Audience split: humans should read the For Humans Guide on the docs site and talk to their coding agent. Agents should read this file, operate Smithers for the human, verify the run, and report evidence back.

The everyday agent surface (runtime, JSX, CLI, components, recipes, types, errors) is the first section below; the opt-in topics follow. Only /llms.txt and /llms-full.txt are served on the docs site, so read this file rather than fetching per-topic fragment URLs.

Sections included in this file:
  1. Core: runtime, JSX, CLI, components, recipes, types
  2. Memory: cross-run memory
  3. OpenAPI tools: tool generation from a spec
  4. Observability: HTTP server, gateway, MCP, OpenTelemetry
  5. Effect: low-level Effect-ts integration
  6. Integrations: agent runtimes, IDE, CI, ecosystem
  7. Events: full SmithersEvent discriminated union

Changelogs are not included; see /docs/changelogs/ on the docs site.

===============================================================================
# Smithers

> Smithers — durable AI workflow orchestration as a JSX runtime.
> Repo: github.com/smithersai/smithers · Package: smithers-orchestrator (npm)

This file is the agent-facing core Smithers documentation. It is for Claude, Codex, and other AI harnesses operating Smithers for a human. Read top to bottom for the runtime, agent operating playbook, JSX surface, CLI, and components.

Human-facing docs live on the website under the For Humans Guide. Humans ask their agent for outcomes; agents consume these llms files and operate Smithers.

Opt-in topics cover features most users do not need. They are also sections of the full bundle at /llms-full.txt (only /llms.txt and /llms-full.txt are served on the docs site):
  - Memory (cross-run state)
  - OpenAPI tools
  - Observability + HTTP server
  - Effect-ts authoring API
  - Integrations + CLI agents
  - Event types (full union)

Changelogs are not included; see /docs/changelogs/ on the docs site.

---

## Smithers

> Orchestrate agents at scale with composable workflows.

<div style={{ fontSize: "1.35rem", fontWeight: 600, lineHeight: 1.4, marginBottom: "0.5rem" }}>
  Orchestrate agents at scale with composable workflows.
</div>

Smithers is a durable runtime for long-running AI coding agents. Install a skill and your
agent writes the workflow, a composable TypeScript tree, then runs it for minutes or
days with crash recovery, retries, human approvals, replay, and full observability across
any agent, any model, and any machine.

<Frame caption="Models change weekly and agent topologies change quarterly, but the durable orchestration layer underneath does not. That layer is Smithers.">
  <img src="/images/why/three-layer-stack.svg" alt="Three stacked layers: a volatile model layer, a fluid agent-topology layer, and a stable Smithers orchestration layer underneath" />
</Frame>

<Note>
These docs are split by audience, so pick your path:

1. **Use Smithers through an agent** (you are a human): the **For Humans** guide. Ask your
   coding agent for an outcome and it operates Smithers for you. No CLI or TypeScript.
2. **Operate Smithers as an agent** (you are Claude, Codex, or another harness): the **For
   Agents** reference, the `llms.txt` and `llms-full.txt` bundles, and the
   Agent Operating Playbook.
3. **Author workflows** (you are a developer): the JSX surface and
   Components to build or fork your own.
</Note>

<CardGroup cols={2}>
  <Card title="For Humans: start here" icon="compass" href="/guide/what-is-smithers">
    The plain-language tour for people using Smithers by talking to their coding agent,
    no CLI or TypeScript required.
  </Card>
  <Card title="For Agents: read this" icon="robot" href="/llms.txt">
    The LLM-facing docs index. Agents should read it, operate Smithers for the human,
    verify the run, and report evidence back.
  </Card>
  <Card title="Human prompt examples" icon="sparkles" href="/guide/what-you-can-do">
    Concrete outcomes a human can ask for, from focused changes to long-horizon missions.
  </Card>
  <Card title="Agent operating playbook" icon="clipboard-check" href="/guides/agent-operating-playbook">
    The agent contract: translate human prompts into workflows, backpressure, observability,
    assumption tests, and reports.
  </Card>
</CardGroup>

Here is what your agent writes and runs for you. As a human you never author this by
hand; a workflow is a small TypeScript tree, and the rest of this page is the surface your
agent operates on your behalf.

```tsx
/** @jsxImportSource smithers-orchestrator */
import { createSmithers, Sequence, Task } from "smithers-orchestrator";
import { z } from "zod";

const { Workflow, smithers, outputs } = createSmithers({
  hello: z.object({ message: z.string() }),
});

export default smithers((ctx) => (
  <Workflow name="hello">
    <Sequence>
      <Task id="greet" output={outputs.hello}>
        {{ message: `Hello, ${ctx.input.name}` }}
      </Task>
    </Sequence>
  </Workflow>
));
```

```bash
bunx smithers-orchestrator init
bunx smithers-orchestrator up workflow.tsx --input '{"name":"world"}'
```

Outputs are validated by Zod, persisted to SQLite, and survive crashes. Resume with
`--resume true`, or drive everything from your coding agent with the
`smithers` skill.

## What makes Smithers different

<CardGroup cols={3}>
  <Card title="Composable workflows" icon="puzzle-piece">
    Sequence, fan out, branch, and loop tasks into workflows shaped to your task and your
    project, not one giant one-size-fits-all agent.
  </Card>
  <Card title="Model- & harness-agnostic" icon="plug">
    Claude Code, Codex, Pi, Antigravity, and more, plus any model through the AI SDK.
    Swap the harness without rewriting the workflow.
  </Card>
  <Card title="Robust by default" icon="shield-check">
    Durable execution, retries, replay, time-travel, evals, human approvals, and
    Prometheus metrics. Guarantees no single agent gives you.
  </Card>
</CardGroup>

## Why Smithers

We build Smithers to put power in the hands of builders. You shouldn't have to wait and
see what the model companies decide to ship next. With composable, model- and
harness-agnostic workflows, you can build the future you want to see today, on whatever
model and harness is best this week.

Every decision in Smithers is about making builders **more** powerful, not replacing
them. Where other tools race to swap human craftsmanship for slop, Smithers is built to
get **higher-quality output** out of agents, with the review loops, approvals, evals,
and structure that real work demands.

And we don't believe in one-size-fits-all orchestration. The best results come from
**task-specific and project-specific workflows**, so Smithers ships dozens of them ready
to run, and treats having your agent author new ones as a first-class path.

## Built-in workflows

`bunx smithers-orchestrator init` scaffolds a `.smithers/` folder preloaded with
production-ready workflows. Point your agent at one and go with
`bunx smithers-orchestrator workflow run WORKFLOW_ID --prompt "..."`:

```bash
# break a request into tickets, then implement them on a board
bunx smithers-orchestrator workflow run tickets-create --prompt "add rate limiting and audit logging"
bunx smithers-orchestrator workflow run kanban
```

<CardGroup cols={2}>
  <Card title="Build" icon="hammer" href="/workflows/implement">
    `implement` · `research-plan-implement` · `mission` · `kanban` · `ralph`. Turn a
    request into validated, reviewed code, from a focused change to a long-horizon mission.
  </Card>
  <Card title="Plan" icon="map" href="/workflows/plan">
    `research` · `plan` · `grill-me` · `ticket-create` · `tickets-create`.
    Gather context and shape the work before any code is written.
  </Card>
  <Card title="Quality" icon="circle-check" href="/workflows/review">
    `review` · `debug` · `improve-test-coverage` · `feature-enum` · `audit`. Review
    changes, fix bugs, raise coverage, and audit for gaps.
  </Card>
  <Card title="Browse the catalog" icon="list" href="/workflows/overview">
    Every bundled workflow is a normal Smithers TSX file: run it as-is, edit it for your
    repo, or fork it as a starting point.
  </Card>
</CardGroup>

## Examples

The examples cover real patterns you can copy: multi-agent
panels and debates, fan-out/fan-in, review and coverage loops, migrations, PR shepherding,
canary judging, SLO breach explainers, repo janitors, and more.

<CardGroup cols={2}>
  <Card title="Multi-agent review" icon="users" href="/examples/multi-agent-review">
    N specialists review in parallel; a moderator synthesizes the verdict.
  </Card>
  <Card title="Approval gates" icon="hand" href="/examples/approval-gate">
    Pause a run for a human decision, then resume exactly where it left off.
  </Card>
  <Card title="Dynamic plans" icon="diagram-project" href="/examples/dynamic-plan">
    Let an agent plan the work, then fan the plan out across worker tasks.
  </Card>
  <Card title="Loops" icon="repeat" href="/examples/loop">
    Iterate until tests pass, coverage hits target, or a reviewer approves.
  </Card>
</CardGroup>

## Any agent, any model

Smithers doesn't bet on one lab or one harness. Point each task at whichever agent is best
for the job, and switch freely:

- **CLI agents**: Claude Code, Codex,
  Pi, Antigravity, Kimi, and more, driven through their
  own runtimes.
- **SDK agents**: any model the AI SDK supports, with tools,
  structured output, and MCP.
- **Mix them in one workflow**: a frontier model plans, a fast model fans out, a
  specialized harness does the edits. The workflow doesn't change when the model does.

## Durable and observable

Runs survive crashes, restarts, and flaky tools. Every completed step is persisted the
moment it finishes, so the runtime always knows what's done and what to run next, and you
can rewind, fork, and replay any run.

```bash
bunx smithers-orchestrator up workflow.tsx --run-id abc123 --resume true   # resume after a crash
bunx smithers-orchestrator rewind abc123 --frame 4                         # time-travel to an earlier frame
bunx smithers-orchestrator observability up                                # Grafana + Prometheus + Tempo + OTLP
bunx smithers-orchestrator up workflow.tsx --serve --metrics               # HTTP API, SSE events, and /metrics
```

Every run emits Prometheus metrics and OpenTelemetry traces, so you can watch token spend,
task latency, retries, and failures across thousands of runs. See
How It Works for the render → execute → persist loop.

## Scale across machines

Most workflows run fine on your laptop. When you need isolation, parallelism, or
horizontal scale, the same `<Sandbox>` primitive runs a child
workflow through an injectable provider (local or remote) with no change to the
workflow:

- **Local**: run agents in an isolated sandbox on your own machine.
- **Remote**: gVisor, Kubernetes, freestyle.sh,
  Daytona, and Cloudflare.

## Read next

<CardGroup cols={2}>
  <Card title="For Humans" icon="compass" href="/guide/what-is-smithers">
    The human-first introduction: what to ask your agent and what to expect back.
  </Card>
  <Card title="For Agents" icon="robot" href="/llms.txt">
    The generated LLM docs index agents should consume before operating Smithers.
  </Card>
  <Card title="Tour" icon="route" href="/tour">
    A 6-step worked example that introduces every core feature.
  </Card>
  <Card title="How It Works" icon="gears" href="/how-it-works">
    The render → execute → persist loop that makes runs durable.
  </Card>
  <Card title="Components" icon="cubes" href="/components/workflow">
    The full JSX surface: tasks, sequences, loops, approvals, and more.
  </Card>
</CardGroup>

---

## Introduction

> What Smithers is and when to use it.

Smithers orchestrates AI coding agents at scale with composable, model- and harness-agnostic workflows. Most workflow systems fail quietly. A crash mid-run means lost work and a manual restart from scratch. Smithers solves this with a render-loop execution model: every completed step is persisted immediately, so the runtime always knows exactly what has finished and what to run next.

<Note>
This page is in the **For Agents** track. It is for AI harnesses and workflow authors who
need the runtime model. Human-facing docs live in the **For Humans** Guide, starting at
What Smithers Is.
</Note>

Your agent writes the workflow as a JSX tree, and Smithers repeatedly renders it every "frame". Each render answers: given what has already finished, what can run now? Tasks produce outputs validated by Zod schemas; the runtime persists them to SQLite. Crashes, restarts, and approvals are first-class, and the runtime resumes from the last persisted state without re-running completed work.

```tsx
<Workflow name="review">
  <Sequence>
    <Task id="analyze" output={outputs.analysis} agent={analyst}>
      {`Review ${ctx.input.repo}`}
    </Task>
    {analysis ? (
      <Task id="fix" output={outputs.fix} agent={fixer}>
        {`Fix these issues:\n${analysis.issues.map(i => `- [${i.severity}] ${i.file}:${i.line} - ${i.description}`).join("\n")}`}
      </Task>
    ) : null}
  </Sequence>
</Workflow>
```

Use Smithers when:

- order matters across multiple AI or compute steps
- you need crash recovery
- humans must approve or answer questions mid-run
- different tasks need different models, tools, or policies
- operators need the Gateway API to launch, stream, and approve runs programmatically

Don't use it for a single prompt → single response. Use your model provider's SDK directly; Smithers adds no value there.

## Read next

- Tour for a working code-review example.
- How It Works for the execution model.
- Why React? for the rationale behind the JSX runtime.

---

## Installation

> Install smithers-orchestrator with the workflow pack, or manually for standalone JSX workflow projects.

Most teams should start with the workflow pack. It gives you a working `.smithers/` directory with seeded workflows, prompts, and agent configuration instead of assembling the project structure by hand.

## Always Run with `bunx`

<Warning>
  Every CLI invocation in these docs is `bunx smithers-orchestrator <command>`. Do **not**
  install Smithers globally and do **not** use the bare `smithers` or `bunx smithers`
  shorthand. The bare name `smithers` is a different npm package, so `bunx smithers` runs
  something else entirely.
</Warning>

- `bunx` resolves the package locally if your project depends on it, otherwise it pulls and runs the latest published version.
- The published npm package is [`smithers-orchestrator`](https://www.npmjs.com/package/smithers-orchestrator).
- A global install creates version-drift problems across machines, CI, and contributors. With `bunx`, every project pins Smithers through its own `package.json`.

If you previously ran `npm i -g smithers-orchestrator`, uninstall it (`npm rm -g smithers-orchestrator`) and switch to `bunx`.

## Recommended: Install the Workflow Pack

```bash
bunx smithers-orchestrator init
```

That scaffolds `.smithers/` with files such as:

| Directory / File | Contents |
|---|---|
| `.smithers/workflows/` | Pre-built workflows (`implement`, `review`, `plan`, `ralph`, `debug`, ...) |
| `.smithers/prompts/` | Shared MDX prompt templates |
| `.smithers/components/` | Reusable TSX components (`Review`, `ValidationLoop`, ...) |
| `.smithers/package.json` | Local workflow project manifest with `smithers-orchestrator` dependency |
| `.smithers/tsconfig.json` | TypeScript config for JSX workflow authoring |
| `.smithers/bunfig.toml` | Bun preload config for MDX workflow prompts |
| `.smithers/preload.ts` | Registers the MDX preload plugin |
| `.smithers/agents/` | User-owned agent config (`claude-code.ts`, `codex.ts`, `opencode.ts`, `antigravity.ts`, `index.ts`), edit to pin models/cwd/systemPrompt; preserved across re-inits |
| `.smithers/agents.ts` | Auto-detected agent configuration (regenerated on each `init`) |
| `.smithers/smithers.config.ts` | Repo-level config (lint, test, coverage commands) |
| `.smithers/tickets/` | Ticket workspace used by ticket-oriented workflows |
| `.smithers/executions/` | Execution artifacts directory preserved across re-inits |
| `.smithers/.gitignore` | Ignore rules for generated workflow state |

To overwrite an existing scaffold:

```bash
bunx smithers-orchestrator init --force
```

## Install the Agent Skill

Smithers is driven by an AI agent (Claude Code, Codex, and friends) not a GUI you
click. Your agent runs Smithers on your behalf: scaffolding workflows, starting
runs, watching them, and clearing approvals. The `smithers` skill makes your agent
fluent in that without making it read the whole docs site first, so you reach the
aha moment faster.

```bash
mkdir -p ~/.claude/skills/smithers
curl -fsSL https://raw.githubusercontent.com/smithersai/smithers/main/skills/smithers/SKILL.md \
  -o ~/.claude/skills/smithers/SKILL.md
curl -fsSL https://smithers.sh/llms-full.txt \
  -o ~/.claude/skills/smithers/llms-full.txt
```

The skill ships the full docs bundle (`llms-full.txt`) next to its `SKILL.md`, so
the agent can read the exact API on demand. Once installed, just ask for the
outcome, *"orchestrate an agent to add rate limiting and keep iterating until the
tests pass"*, and the agent reaches for Smithers itself.

For agents without a skills directory, point them at
`bunx smithers-orchestrator docs-full` (prints the same bundle) or
`bunx smithers-orchestrator ask "<question>"`.

To install the skill **and** register the MCP server into every coding agent on
your machine at once, see Agent Support. It covers Claude
Code, Codex, Cursor, Copilot, Pi, Hermes, OpenClaw, and ~20 more.

## When to Use Manual Installation

Use manual installation when embedding Smithers into an existing TypeScript codebase to author a standalone workflow project from scratch.

See JSX Installation for the package list, TypeScript configuration, and optional MDX prompt setup.

## Requirements

- [Bun](https://bun.sh) >= 1.3
- TypeScript >= 5
- Model or provider credentials (e.g. [Anthropic](https://docs.anthropic.com) `ANTHROPIC_API_KEY`)
- A version control system for snapshotting and isolating agent work: [jj (Jujutsu)](https://github.com/jj-vcs/jj) or [git](https://git-scm.com). jj is preferred and powers durability, time-travel, and per-task worktrees.

### Version control

Smithers bundles jj. The optional `@smithers-orchestrator/jj-<platform>` package installs a vendored jj binary for your platform, so a fresh install works with no system jj. Resolution order is:

1. `SMITHERS_JJ_PATH`: point this at a jj binary to override everything.
2. The bundled binary for your platform.
3. `jj` on your `PATH`.

If no bundled binary installed (an unsupported platform, or `--no-optional`) and neither `jj` nor `git` is on `PATH`, runs that need a worktree fail with a message telling you to install one. Check what Smithers found with `bunx smithers-orchestrator workflow doctor` (the `vcs` section reports the resolved jj and git).

## After Installation

<CardGroup cols={2}>
  <Card title="Quickstart" icon="rocket" href="/quickstart">
    Run a seeded workflow immediately.
  </Card>
  <Card title="Set up in your harness" icon="plug" href="/agents/setup">
    Wire Smithers into your agent and grab a copy-paste setup prompt.
  </Card>
  <Card title="Install the agent skill" icon="robot" href="#install-the-agent-skill">
    Make your coding agent fluent in Smithers.
  </Card>
  <Card title="CLI Quickstart" icon="terminal" href="/cli/quickstart">
    The operational command cheatsheet.
  </Card>
  <Card title="JSX installation" icon="code" href="/jsx/overview">
    Manual TSX authoring setup.
  </Card>
  <Card title="Project structure" icon="folder-tree" href="/guides/project-structure">
    How a standalone workflow project fits together.
  </Card>
  <Card title="Tools integration" icon="screwdriver-wrench" href="/integrations/tools">
    The built-in tool sandbox.
  </Card>
</CardGroup>

---

## Quickstart

> Scaffold and run a Smithers workflow in two commands.

<Steps>
  <Step title="Scaffold a project">
    `init` creates `.smithers/` (workflows, prompts, components, agent config). Add
    `--template <id>` when you want the result to include a starter command and
    follow-up notes; run `bunx smithers-orchestrator starters` to browse template IDs.

    ```bash
    bunx smithers-orchestrator init
    bunx smithers-orchestrator init --template idea-to-tickets
    ```
  </Step>
  <Step title="Run a workflow">
    Point a seeded workflow at a prompt and Smithers starts a durable run.

    ```bash
    bunx smithers-orchestrator workflow run implement --prompt "Add rate limiting"
    ```
  </Step>
  <Step title="Inspect the run">
    Watch run state, tail the event log, and read structured output as it executes.

    ```bash
    bunx smithers-orchestrator ps
    bunx smithers-orchestrator inspect RUN_ID
    bunx smithers-orchestrator logs RUN_ID --tail 20
    ```
  </Step>
  <Step title="Resume after a crash">
    Every completed step is persisted, so a crashed or stopped run resumes from its
    last checkpoint.

    ```bash
    bunx smithers-orchestrator up workflow.tsx --run-id RUN_ID --resume true
    ```
  </Step>
</Steps>

<Tip>
  Prefer to let your agent drive? Install the agent skill
  and just ask for the outcome. Your coding agent runs Smithers for you.
</Tip>

For a full worked example, see the Tour. For every CLI command see the CLI catalog.

---

## Starters

> Choose a plain-English outcome and copy the Smithers command.

Smithers ships with starter workflows for when you want a result before your agent writes any workflow code. Browse the starter gallery from any repo:

```bash
bunx smithers-orchestrator starters
```

First-time setup:

```bash
bunx smithers-orchestrator init --add-agents
```

Or initialize with guided next steps for one template:

```bash
bunx smithers-orchestrator init --template idea-to-tickets
```

Template IDs:

Pass any ID from this table to `init --template <id>` to scaffold that starter.

| Starter | Best for | Workflow | Command |
| --- | --- | --- | --- |
| `idea-to-prd` | founders, product | `write-a-prd` | `bunx smithers-orchestrator init --template idea-to-prd` |
| `idea-to-tickets` | founders, product, operations, engineering | `tickets-create` | `bunx smithers-orchestrator init --template idea-to-tickets` |
| `launch-checklist` | launch owners and operators | `plan` | `bunx smithers-orchestrator init --template launch-checklist` |
| `customer-incident` | support escalations | `debug` | `bunx smithers-orchestrator init --template customer-incident` |
| `nontechnical-research` | before-build decisions | `research` | `bunx smithers-orchestrator init --template nontechnical-research` |
| `requirements-interview` | vague stakeholder requests | `grill-me` | `bunx smithers-orchestrator init --template requirements-interview` |
| `quality-audit` | release readiness | `audit` | `bunx smithers-orchestrator init --template quality-audit` |
| `test-coverage` | regression prevention | `improve-test-coverage` | `bunx smithers-orchestrator init --template test-coverage` |
| `ship-a-change` | focused product improvements | `research-plan-implement` | `bunx smithers-orchestrator init --template ship-a-change` |
| `mission-mode` | larger approved milestones | `mission` | `bunx smithers-orchestrator init --template mission-mode` |

Each detailed starter prints:

- What outcome to expect
- What context to gather before running
- The exact `bunx smithers-orchestrator workflow run ...` command
- Useful follow-up commands
- When not to use that starter

Filter by audience or goal:

```bash
bunx smithers-orchestrator starters --audience product
bunx smithers-orchestrator starters --goal quality
bunx smithers-orchestrator starters --workflow debug
```

Use JSON when another tool needs the catalog:

```bash
bunx smithers-orchestrator starters --format json
```

---

## Tour

> Build a code-review workflow in six steps. Every core feature shows up.

A code-review workflow built one capability at a time. Each step is a diff against the previous. Reading time: 15 minutes.

## 1. Install and scaffold

```bash
bunx smithers-orchestrator init
bun add smithers-orchestrator ai @ai-sdk/anthropic zod
bun add -d typescript @types/bun
export ANTHROPIC_API_KEY="sk-ant-..."
```

`init` creates `.smithers/` with seeded workflows, prompts, and components. The bun deps add the AI SDK, Anthropic provider, and Zod (schemas).

A minimal `tsconfig.json`:

```json
{
  "compilerOptions": {
    "target": "ESNext",
    "module": "ESNext",
    "moduleResolution": "bundler",
    "jsx": "react-jsx",
    "jsxImportSource": "smithers-orchestrator",
    "strict": true,
    "noEmit": true,
    "skipLibCheck": true
  }
}
```

`jsxImportSource` is the only line specific to Smithers; it routes JSX through the workflow runtime instead of React DOM.

## 2. One-task workflow

```tsx
import { createSmithers, Sequence, Task } from "smithers-orchestrator";
import { z } from "zod";

const { Workflow, smithers, outputs } = createSmithers({
  greeting: z.object({ message: z.string() }),
});

export default smithers((ctx) => (
  <Workflow name="hello">
    <Sequence>
      <Task id="greet" output={outputs.greeting}>
        {{ message: `Hello, ${ctx.input.name}` }}
      </Task>
    </Sequence>
  </Workflow>
));
```

`createSmithers` registers Zod schemas; each becomes a SQLite table. `outputs.greeting` is the typed reference for the `greeting` schema; using it as the `output` prop gives compile-time checks (typo `outputs.greting` is a type error).

This Task has no `agent`, just a literal value. Run it.

```bash
bunx smithers-orchestrator up workflow.tsx --input '{"name":"world"}'
```

Inspect:

```bash
bunx smithers-orchestrator ps                  # find the run id
bunx smithers-orchestrator inspect RUN_ID    # structured state
sqlite3 smithers.db "SELECT * FROM greeting;"  # the persisted output
```

## 3. Add an agent task

Replace the literal Task with an agent Task whose output is structured.

```tsx
import { createSmithers, Sequence, Task, AnthropicAgent } from "smithers-orchestrator";
import { z } from "zod";

const { Workflow, smithers, outputs } = createSmithers({
  analysis: z.object({
    summary: z.string(),
    issues: z.array(z.object({
      file: z.string(),
      line: z.number(),
      severity: z.enum(["low", "medium", "high"]),
      description: z.string(),
    })),
  }),
});

const analyst = new AnthropicAgent({
  model: "claude-sonnet-4-20250514",
  instructions: "You are a senior code reviewer. Return structured JSON.",
});

export default smithers((ctx) => (
  <Workflow name="review">
    <Sequence>
      <Task id="analyze" output={outputs.analysis} agent={analyst}>
        {`Review the code in ${ctx.input.repo} and return analysis as JSON.`}
      </Task>
    </Sequence>
  </Workflow>
));
```

The runtime injects a JSON-schema description of `outputs.analysis` into the prompt, parses the agent's response, validates against Zod, and persists. Validation failure triggers a retry.

## 4. A second task that depends on the first

Tasks see each other's outputs through `ctx.outputMaybe(...)`. An incomplete upstream returns `undefined`; on the next render frame the upstream output appears and the downstream Task mounts.

```tsx
import { createSmithers, Sequence, Task, AnthropicAgent } from "smithers-orchestrator";
import { z } from "zod";

const AnalysisSchema = z.object({
  summary: z.string(),
  issues: z.array(z.object({
    file: z.string(),
    line: z.number(),
    severity: z.enum(["low", "medium", "high"]),
    description: z.string(),
  })),
});

const { Workflow, smithers, outputs } = createSmithers({
  analysis: AnalysisSchema,
  fix: z.object({
    patch: z.string(),
    filesChanged: z.array(z.string()),
  }),
});

const analyst = new AnthropicAgent({
  model: "claude-sonnet-4-20250514",
  instructions: "You are a senior code reviewer. Return structured JSON.",
});

const fixer = new AnthropicAgent({
  model: "claude-sonnet-4-20250514",
  instructions: "Write minimal, correct fixes as a unified diff.",
});

export default smithers((ctx) => {
  const analysis = ctx.outputMaybe(outputs.analysis, { nodeId: "analyze" });

  return (
    <Workflow name="review">
      <Sequence>
        <Task id="analyze" output={outputs.analysis} agent={analyst}>
          {`Review ${ctx.input.repo}`}
        </Task>
        {analysis ? (
          <Task id="fix" output={outputs.fix} agent={fixer}>
            {`Fix these issues:\n${analysis.issues.map(i =>
              `- [${i.severity}] ${i.file}:${i.line} - ${i.description}`
            ).join("\n")}`}
          </Task>
        ) : null}
      </Sequence>
    </Workflow>
  );
});
```

Render 1: only `analyze` is mounted. Render 2 (after `analyze` finishes): `analysis` is populated, `fix` mounts and runs. That is the entire reactivity story: no hooks, no subscriptions, JSX conditionals over persisted state.

Same shape works for branching, parallel groups, and loops:

```tsx
<Parallel maxConcurrency={3}>
  <Task id="lint"  output={outputs.lint}  agent={linter}>...</Task>
  <Task id="test"  output={outputs.test}  agent={tester}>...</Task>
  <Task id="audit" output={outputs.audit} agent={auditor}>...</Task>
</Parallel>

<Loop until={!!review?.approved} maxIterations={5}>
  <Task id="implement" output={outputs.impl} agent={implementer}>...</Task>
  <Task id="review"    output={outputs.review} agent={reviewer}>...</Task>
</Loop>
```

## 5. An approval gate

Pause for a human. The runtime persists the pending decision and exits cleanly; an operator approves or denies through the CLI; resume picks up from the gate.

```tsx
import { Approval } from "smithers-orchestrator";

{analysis ? (
  <Approval
    id="confirm-fix"
    output={outputs.confirmFix}
    request={{
      title: `Apply fixes for ${analysis.issues.length} issues?`,
      summary: analysis.summary,
    }}
    onDeny="skip"
  >
    {/* children rendered after approval */}
  </Approval>
) : null}

{ctx.outputMaybe(outputs.confirmFix, { nodeId: "confirm-fix" })?.approved ? (
  <Task id="fix" output={outputs.fix} agent={fixer}>
    {`Apply patches`}
  </Task>
) : null}
```

Operator side:

```bash
bunx smithers-orchestrator ps --status waiting-approval     # find paused runs
bunx smithers-orchestrator inspect RUN_ID                  # see the request
bunx smithers-orchestrator approve RUN_ID --node confirm-fix --by alice
bunx smithers-orchestrator up workflow.tsx --run-id RUN_ID --resume true
```

`onDeny` controls behavior on rejection: `"fail"` aborts the run, `"continue"` proceeds without the approved branch, `"skip"` skips the gated tasks.

## 6. Crash, then resume

Every completed task's output sits in SQLite. A crash, kill, or restart loses no work; the next run with `--resume true` skips finished tasks.

```bash
bunx smithers-orchestrator up workflow.tsx --input '{"repo":"."}' --run-id review-1
# ...analyze finishes, fix is mid-flight, you Ctrl+C

bunx smithers-orchestrator up workflow.tsx --run-id review-1 --resume true
# analyze is skipped (already in DB), fix re-runs from scratch (was incomplete)
```

<Frame caption="The same crash-and-resume mechanic: a run is killed mid-task, then resuming skips the finished work and re-runs only the interrupted task from its last persisted frame.">
  <img src="/images/why/crash-resume.gif" alt="A Smithers run is killed partway through, then resumes: the completed task is skipped, the in-flight task re-runs as a new attempt, and the run finishes" />
</Frame>

In-flight attempts are marked stale and re-tried; finished tasks are not. Resume is deterministic: same input + same code = same task IDs.

For unattended recovery, run the supervisor:

```bash
bunx smithers-orchestrator supervise --interval 30s --stale-threshold 1m
```

It auto-resumes runs whose owner process died.

## What you skipped (and where to find it)

- **Time travel** (replay a frame, fork a run, diff two runs): `bunx smithers-orchestrator replay|fork|diff|timeline`. See How It Works → Time travel.
- **Scorers** (attach evaluators to Tasks): see Recipes → Scoring tasks.
- **Memory** (cross-run facts and message history): see How It Works → Memory.
- **RAG**, **voice**, **OpenAPI tools**: opt-in fragments. See the index in llms.txt.
- **Tool sandbox** (read/grep/bash with path containment): see Recipes → Tools.

## Read next

- How It Works: the render → execute → persist loop.
- Components: JSX surface reference.
- CLI: every command in one table.
- Recipes: patterns from production workflows.

---

## How It Works

> The render → execute → persist loop, in one page.

Smithers is a React reconciler whose host elements are tasks instead of DOM nodes. Each render produces a snapshot of the workflow plan; the runtime extracts ready tasks from that plan, executes them, persists their outputs, and re-renders. The plan evolves because each render reads the persisted state.

<Frame caption="The whole runtime in one loop: render the tree, extract the ready tasks, execute them, persist their outputs, then re-render against the new state. State is the source of truth; the plan is a pure function of state.">
  <img src="/images/why/render-loop.svg" alt="A four-stage loop: render the workflow tree, extract ready tasks, execute them, persist outputs to SQLite, then re-render against the new state" />
</Frame>

That loop is the entire model. Everything below, including branching, loops, approvals, resume, and time travel, is either a JSX construct that affects rendering or a CLI surface over the persisted state.

## The render loop in detail

1. **Render**. The runtime calls your `smithers((ctx) => ...)` builder. The returned JSX tree is reconciled by React; the reconciler emits a graph of host elements (`smithers:workflow`, `smithers:task`, `smithers:sequence`, `smithers:parallel`, `smithers:branch`, `smithers:loop`, `smithers:approval`, etc.).
2. **Extract**. The runtime walks the tree to produce a `GraphSnapshot`, a flat list of `TaskDescriptor`s. Each descriptor captures: node id, ordinal, dependencies, output schema, agent, retries, timeouts.
3. **Schedule**. The scheduler computes the ready set: tasks whose dependencies have completed, whose enclosing sequence has reached them, whose enclosing branch resolved them, and which fit within `maxConcurrency`.
4. **Execute**. Each ready task runs. Three modes: agent (call the LLM, validate output against the Zod schema, retry on failure), compute (run the function), static (write the literal value).
5. **Persist**. Validated outputs are written to per-schema SQLite tables. Internal `_smithers_*` tables capture node state, attempts, frame snapshots, events, and durable approval/signal state.
6. **Re-render**. The next frame begins with `ctx` reading the updated outputs. Tasks that depended on now-completed outputs mount on this frame and become eligible to run.

The frame is the unit of progress. Time travel, observability, hot reload, and resume all key off the frame number.

## The `ctx` API

`ctx` is the only way the workflow body talks to the runtime.

| Method | Returns | Use for |
|---|---|---|
| `ctx.input` | `T` | The immutable input passed to `runWorkflow`. |
| `ctx.outputMaybe(schema, { nodeId })` | `Row \| undefined` | Conditional rendering; returns `undefined` until the upstream task completes. |
| `ctx.output(schema, { nodeId })` | `Row` | Same, but throws if missing. Use inside a Task body where the dep is guaranteed. |
| `ctx.latest(schema, nodeId)` | `Row \| undefined` | Highest iteration of a node, used inside `<Loop>` to read the previous iteration's output. |
| `ctx.iterationCount(schema, nodeId)` | `number` | Number of completed iterations for a loop node. |
| `ctx.runId` / `ctx.iteration` | `string` / `number` | Identifiers for logging. |
| `ctx.auth` | `RunAuthContext \| null` | Auth context passed via `RunOptions.auth`. |

Outputs are keyed by `(runId, nodeId, iteration)`. `iteration` is `0` outside loops; inside `<Loop>` each pass writes a new row at the next iteration index.

## Tasks: three modes

```tsx
// Agent: call an LLM. Children become the prompt; output validated against schema.
<Task id="analyze" output={outputs.analysis} agent={analyst}>
  {`Review ${ctx.input.repo}`}
</Task>

// Compute: children is a function. Runs at execution time.
<Task id="count" output={outputs.count}>
  {() => fs.readdirSync(ctx.input.dir).length}
</Task>

// Static: children is a plain value. Persisted directly.
<Task id="config" output={outputs.config}>
  {{ region: "us-east-1", retries: 3 }}
</Task>
```

Agent output validation: the runtime injects a JSON-schema description of the output Zod schema into the prompt, parses the response, validates, and persists. Validation failure feeds the error back into a retry attempt, so agents self-correct on schema drift.

Agents can be a fallback chain: `agent={[primary, fallback]}` tries `primary` first and falls through on failure.

## Control flow

Four primitives. Compose freely.

```tsx
<Sequence>            // children execute top-to-bottom; default for <Workflow>
<Parallel maxConcurrency={3}>  // children execute concurrently
<Branch if={cond} then={<A/>} else={<B/>}>
<Loop until={done} maxIterations={5} onMaxReached="return-last">
```

`<Workflow>` implicitly sequences its children. An explicit `<Sequence>` is only needed when nesting sequential groups inside `<Parallel>` or another control-flow primitive.

Use `.map()` and ternaries when the *number* or *presence* of tasks depends on state. Use `<Parallel>` and `<Branch>` for fixed task sets whose execution shape depends on state.

`<Loop>` is the one primitive that re-renders the same body repeatedly: it runs the body, reads the result on the next frame, and re-mounts until `until` holds or `maxIterations` is reached. That cycle is what turns a one-shot agent into one that keeps swinging until the tests are green.

```mermaid
flowchart LR
  R[Render loop body] --> X[Execute tasks]
  X --> Q{until condition met?}
  Q -- no --> R
  Q -- yes --> D[Exit · return last output]
  Q -. maxIterations hit .-> D
  style Q fill:#fef,stroke:#a3a
  style D fill:#dfe,stroke:#3a3
```

## Data flow is unidirectional

Workflow state lives in SQLite. The render function is a pure function of `ctx` (which reads SQLite). Tasks emit outputs; the runtime persists them; the next render reads them. No mutation, no refs, no `useState` for durable values.

This is the same shape as React rendering UI from props/state, except:
- the "DOM" is the task graph
- "events" are task completions
- "state updates" are output writes that the runtime triggers

<Frame caption="State is the only source of truth. Task completions update it, and the next plan is a pure function of that state, so you never mutate the plan by hand.">
  <img src="/images/unidirectional-dataflow.jpg" alt="Unidirectional data flow: action events update state, state maps forward into the execution plan, and the plan registers the next action handlers" />
</Frame>

Three consequences:
- The plan is a **derived value**. Re-render after any state change automatically computes the new plan; you never manually mutate the plan.
- **Time travel works** because every frame is a snapshot of (state → plan).
- **Hot reload works** because reloading the workflow code with the same persisted state produces a new plan; the runtime diffs the two and continues from where you left off.

## Reactivity & React patterns

Smithers JSX is real React. Components, props, children, composition, context, hooks, custom hooks: all work.

```tsx
function useReviewState(ticketId: string) {
  const ctx = useCtx();
  const claudeReview = ctx.latest("review", `${ticketId}:review-claude`);
  return { claudeReview, allApproved: !!claudeReview?.approved };
}
```

`useState` and `useMemo` are process-local; they reset on every render frame. Use them for ephemeral render-time state. **Anything the workflow must remember across crashes goes through `ctx` and a Task output.**

Conditional mounting matters: a Task that doesn't render is not in the plan. No "skipped" placeholder unless you use `<Branch>` or `skipIf`. That's what lets `{analysis ? <Task .../> : null}` work as a clean dependency check.

## Approvals & human-in-the-loop

Two surfaces.

`needsApproval` on a Task is a **gate**: pause before execution, no decision data:

```tsx
<Task id="deploy" output={outputs.deployResult} agent={deployer} needsApproval>
  Deploy to production.
</Task>
```

`<Approval>` is a **decision node**: it produces a typed `ApprovalDecision` row that downstream rendering can branch on:

```tsx
<Approval
  id="ship-decision"
  output={outputs.shipDecision}
  request={{ title: "Ship release v1.4?", summary }}
  onDeny="continue"
/>

{ctx.outputMaybe(outputs.shipDecision, { nodeId: "ship-decision" })?.approved
  ? <Task id="release" .../>
  : <Task id="rollback" .../>}
```

Three denial policies: `"fail"` (abort the run), `"continue"` (proceed without the gated branch), `"skip"` (skip the gated tasks but continue siblings).

```mermaid
flowchart TD
  A[Approval node] --> Q{your decision}
  Q -- approved --> G[Gated branch runs]
  Q -- denied --> P{onDeny}
  P -- fail --> F[Abort the run]
  P -- continue --> C[Proceed without the gated branch]
  P -- skip --> S[Skip gated tasks · run siblings]
  style F fill:#fde,stroke:#c33
  style C fill:#dfe,stroke:#3a3
  style S fill:#def,stroke:#36c
```

Operator side is identical for both:

```bash
bunx smithers-orchestrator ps --status waiting-approval
bunx smithers-orchestrator approve RUN_ID --node ship-decision --by alice
bunx smithers-orchestrator up workflow.tsx --run-id RUN_ID --resume true
```

`<HumanTask>` is for richer interaction: a human submits arbitrary structured JSON. `<EscalationChain>` and `<ApprovalGate>` are higher-level patterns built from these.

## Durability & resume

The contract: **a completed task is never re-executed.** Resume loads persisted state, validates the environment (workflow source hash + VCS revision must match the original run), cleans stale in-progress attempts (>15 min without a heartbeat are abandoned), re-renders, and continues.

```bash
bunx smithers-orchestrator up workflow.tsx --run-id RUN_ID --resume true
```

<Frame caption="A run is killed mid-flight: the completed task is skipped on resume, the interrupted one re-runs as a fresh attempt, and the run finishes from the last persisted frame instead of starting over.">
  <img src="/images/why/crash-resume.gif" alt="A Smithers run crashes partway through, then resumes: the finished task is skipped, the in-flight task re-runs as a new attempt, and the remaining tasks execute" />
</Frame>

For resume to work, **task IDs must be stable across renders.** Derive them from data, not from indices or timestamps:

```tsx
{tickets.map((t) => <TicketPipeline key={t.id} id={`${t.id}:work`} .../>)}
// NOT id={`work-${i}`} or id={`work-${Date.now()}`}
```

Same rule as React keys. A changed task ID looks like a new task to the runtime, and an old task whose ID disappeared is dropped from the plan.

The supervisor auto-resumes runs whose owner process died:

```bash
bunx smithers-orchestrator supervise --interval 30s --stale-threshold 1m
```

## Session snapshots & fork

Every agent task persists its conversation as a durable session snapshot alongside its output. A later task can start from a **copy** of that context with `fork`:

```tsx
<Task id="plan" agent={claude} output={outputs.plan}>Make a plan.</Task>
<Task id="implement" agent={claude} fork="plan" output={outputs.patch}>Implement the plan.</Task>
```

`fork` is immutable: it copies the source conversation into a fresh, independent session and submits the new prompt. The source is never mutated, so many tasks can fork the same source in parallel, and a forked task can itself be forked. Because the snapshot is read from persisted state on each attempt, fork is resume-safe and the source is never re-executed. Inside a `<Loop>`, `fork` resolves to the latest completed snapshot for that task id. See `<Task>` fork.

## Caching

Per-Task caching with explicit invalidation:

```tsx
<Task
  id="expensive-analysis"
  output={outputs.analysis}
  agent={analyst}
  cache={{
    by: (ctx) => ({ repo: ctx.input.repo, version: "v3" }),
    version: "v3",
  }}
>
  Analyze {ctx.input.repo}
</Task>
```

Cache key = `cache.by(ctx)` + `cache.version` + the schema signature (SHA-256 of the table structure). A schema change invalidates the cache automatically.

Don't cache side-effect tasks (deploys, emails, mutations). Caching is for pure work that's expensive to recompute.

## Time travel

Every frame commit produces a `GraphSnapshot`.

```bash
bunx smithers-orchestrator timeline RUN_ID           # frames + forks
bunx smithers-orchestrator diff RUN_ID NODE_ID     # node DiffBundle
bunx smithers-orchestrator fork workflow.tsx --run-id RUN_ID --frame 5 --reset-node analyze
bunx smithers-orchestrator replay workflow.tsx --run-id RUN_ID --frame 5 --restore-vcs
```

Replay with `--restore-vcs` checks out the jj revision the snapshot was taken at, so re-execution sees the same source code as the original run.

## Scorers (evals)

Attach evaluators to a Task. They run **after** completion and never block.

```tsx
import { schemaAdherenceScorer, latencyScorer } from "smithers-orchestrator/scorers";

<Task
  id="analyze"
  output={outputs.analysis}
  agent={analyst}
  scorers={{
    schema: { scorer: schemaAdherenceScorer() },
    latency: { scorer: latencyScorer({ targetMs: 5000 }) },
  }}
>
  Analyze...
</Task>
```

Five built-ins: `schemaAdherenceScorer`, `latencyScorer`, `relevancyScorer`, `toxicityScorer`, `faithfulnessScorer`. Sampling: `all` / `ratio` / `none`. Custom scorers and LLM-judge scorers with `createScorer` and `llmJudge`.

The public scorer surface also exports `runScorersAsync`, `runScorersBatch`, `aggregateScores`, the `smithersScorers` table, and scorer metrics (`scorersStarted`, `scorersFinished`, `scorersFailed`, `scorerDuration`).

```bash
bunx smithers-orchestrator scores RUN_ID
```

## Memory (cross-run state)

Memory is **state that survives across runs** -- namespaced facts and message history. Not the same as task outputs (which are per-run).

Three layers, four namespaces (`workflow`, `agent`, `user`, `global`). Three processors (`TtlGarbageCollector`, `TokenLimiter`, `Summarizer`). See the full docs bundle for the full surface.

## Tools & sandboxing

Five built-in tools (`read`, `write`, `edit`, `grep`, `bash`) sandboxed to `rootDir`. Symlinks, network, and timeouts are denied by default; `--allow-network` opens bash to the network.

Least-privilege per task:

```tsx
import { AnthropicAgent } from "smithers-orchestrator";
const analyst     = new AnthropicAgent({ model, instructions: "..." }); // no tools
const reviewer    = new AnthropicAgent({ model, instructions: "...", tools: { read, grep } });
const implementer = new AnthropicAgent({ model, instructions: "...", tools: { read, write, edit, bash } });
```

`defineTool` builds custom tools. Mark side-effecting ones with `sideEffect: true` and use `ctx.idempotencyKey` so retries don't double-fire.

## Common gotchas

- **Stable task IDs.** `id="implement-${i}"` or `id={Math.random()}` breaks resume. Derive from data.
- **`useState` is not durable.** Resets on every render. Persist via `ctx` and a Task.
- **Input is immutable.** Resuming with different `--input` is an error; the input is persisted at first run.
- **Code changes block resume.** A workflow source change = a different workflow. Hot reload applies changes within a running frame; resume validates the source hash of the original run. If you stop a run and the source has changed, resume is blocked. Start a new run, don't resume across edits.
- **Cached output is re-validated.** Schema drift after caching is caught (the validator rejects the stale row), so the cache misses safely.
- **Side-effect tasks should not be cached.** Pure work only.

The full list, with the fix for each, is in Common Footguns.

## Read next

- Components: JSX surface reference.
- CLI: every command.
- Recipes: patterns from production workflows.
- Types: public TypeScript surface.

---

## Agent Operating Playbook

> How an AI harness should translate human requests into Smithers workflows, verification, observability, and evidence reports.

This page is for the AI agent operating Smithers on a human's behalf.

It belongs to the **For Agents** docs set. The human-facing docs are the
**For Humans** Guide, starting at What Smithers Is.
Agents should read this page and the generated `llms.txt` /
`llms-full.txt` bundles before driving Smithers for a user.

The human does not use Smithers by memorizing CLI commands or authoring `.tsx`
workflows. The human talks to you. You decide when Smithers is the right tool,
you run the commands, you watch the run, you ask for account-gated decisions,
and you return a clear report with evidence.

If you remember one rule, remember this:

> Do not ask the human to run Smithers commands. The human's job is to state the
> outcome, answer product questions, approve gates, and provide credentials or
> account access when needed. Your job is to operate the harness.

And one more rule that is just as important:

> You are an **orchestrator, not an implementer.** Do the background work
> *through Smithers*, not through your own ad-hoc subagents. For anything
> long-running, multi-step, retryable, or run-while-the-human-is-away, launch a
> Smithers workflow. Smithers spawns the worker agents and persists every step.
> Spend your time observing the run, clearing gates, and reporting. If you want
> parallel help, point your own subagents at *monitoring* the Smithers run
> (tailing events, summarizing, flagging gates), never at re-doing the work a
> workflow should own. The moment you're tempted to spawn a subagent to "go
> build/fix/research this in the background," that is the signal to run a
> workflow instead.

## The operating loop

Use this loop for broad, ambiguous, risky, long-running, or multi-agent work:

1. Capture the word barf. Let the human describe the outcome in messy language.
2. Grill for missing context. Ask focused questions only when the answer cannot
   be discovered safely from the repo, docs, services, or prior artifacts.
3. Convert the request into a goal-based spec. Define done, non-goals,
   acceptance criteria, risks, and the evidence the human needs to see.
4. Design the Smithers run. Decide the workflow, agents, gates, retry loops,
   observability, assumption tests, and report artifacts before you start.
5. Validate the workflow shape. Render the graph or dry-run evals before
   launching expensive or destructive work.
6. Run with observability. Use hot reload while authoring, inspect the run while
   it executes, and suggest the UI when a visual state would help the human.
7. Report with evidence. Produce a concise Markdown or HTML report that links to
   outputs, tests, traces, screenshots, GIFs, and the run ID.

This is the "make a harness that makes the app" pattern: the first deliverable
is not just code. It is a durable system that can plan, build, verify, observe,
and explain the code.

## Translate human prompts into Smithers work

| Human prompt | What you should do |
| --- | --- |
| "Build this product idea start to finish. I have thoughts but not a spec." | Run an interview or `grill-me` flow first. Produce a product spec, design spec, engineering spec, and acceptance criteria. Add a gate before implementation. |
| "Add rate limiting and don't stop until it is production-ready." | Run an implementation workflow with a test and review loop. Define production-ready as passing tests, reviewer approval, docs updates, and an evidence report. |
| "Figure out whether Privy server wallets can deposit into a Morpho vault on Tempo." | Treat it as an assumption-probe workflow. Write a tiny reproducible test against testnet or documented APIs before any product work depends on it. Report exact evidence and remaining unknowns. |
| "Make the UI look like the design and show me it actually works." | Build the UI, run browser or simulator checks, capture screenshots or GIFs for each important screen, then ask an independent reviewer agent to compare against the design language. |
| "Keep working on flaky tests while I am away." | Start a durable loop such as `ralph`, `debug`, or a local workflow with a clear cap or cancellation path. Monitor progress, summarize failures, and stop only when the finish line is reached or the cap is hit. |
| "Migrate this subsystem, but show me the plan first." | Run research and planning first, then pause on an approval gate. After approval, execute milestones in worktrees and merge only validated chunks. |
| "Something went wrong in the run. What happened?" | Run `why`, inspect events and node output, summarize the blocker, propose options, and continue operating. Do not ask the human to debug from the terminal. |

When the user gives a narrow one-shot request, Smithers may be unnecessary. But
if the task has phases, risk, loops, approvals, third-party dependencies,
parallel work, or a need to run while the human is away, use a workflow.

## Context engineering

Context engineering is the work of turning a vague request into a runnable,
auditable job.

Start by writing down:

- Outcome: what should exist when the run is done.
- Finish line: how you will know the work is done.
- Evidence: what the human needs to see to trust the result.
- Constraints: files, platforms, budgets, style, deadlines, and non-goals.
- Unknowns: assumptions that must be proven before you build on them.

Then gather context before executing:

- Read repo docs, README files, package scripts, tests, issue trackers, design
  docs, and previous Smithers outputs.
- Inspect relevant source files and architecture before making a plan.
- Read third-party docs or APIs when behavior could have changed.
- Prefer small probes over confident guesses for external services.
- Store the resulting spec somewhere durable, such as `.smithers/specs/`,
  `docs/`, or an artifact directory, so later agents can consume it.

Good Smithers prompts are goal-based, not instruction soup:

```text
Implement account-level rate limiting for API routes.

Finish line:
- Existing tests pass.
- New tests prove per-account and per-IP limits.
- A reviewer agent approves the diff.
- The final report explains changed files, behavior, and rollout risks.

Verification:
- Run lint/typecheck/unit tests.
- Add an assumption test if the existing rate-limit library behavior is unclear.
- Capture failure output and feed it back into the next iteration.
```

Use explicit stop conditions. "Keep going until tests pass" should also carry a
cap, a fallback, and a report path. Infinite effort is not a finish line.

## Backpressure verification

Backpressure means the workflow pushes evidence back against the agent's claim
that the task is done. Do not accept "looks good" as verification. Encode checks
that can fail.

Use these Smithers patterns:

- `<CheckSuite>` for parallel command or agent checks with one pass/fail verdict.
- `<ScanFixVerify>` for scan -> fix -> verify -> report loops.
- `<ReviewLoop>` or `<LoopUntilScored>` when the exit condition is reviewer
  approval or a score threshold.
- Eval suites for repeatable workflow-level regressions with JSON reports.
- Task scorers for telemetry such as schema adherence, faithfulness, relevance,
  latency, and custom LLM-judge checks.

A strong run defines backpressure before execution:

```text
Before implementing:
- Identify which tests should fail before the fix.
- Add or update the smallest regression test that proves the behavior.
- Define an independent reviewer prompt that can reject the diff.
- Define a report schema: changed files, commands run, failures, fixes, evidence.
```

Backpressure should be independent where possible. The agent that wrote the code
should not be the only judge. Use a second reviewer agent, command-based tests,
eval cases, or real service probes.

## Assumption tests

Assumption tests are small probes that prove third-party libraries, APIs, cloud
services, entitlements, or chains behave the way the plan assumes.

Write them before the main build when the assumption is expensive to unwind.

Examples:

| Assumption | Probe before building on it |
| --- | --- |
| "This SDK supports the chain we need." | Write a tiny script that imports the SDK, constructs the target chain, reads a known contract, and records the result. |
| "The testnet faucet funds the account we will use." | Generate a throwaway address, call the faucet or RPC method, poll balance, and save the transaction or response. |
| "A vault exists with real liquidity." | Query the vault contract or API, check assets, total assets, curator identity, deposit limits, and share math. |
| "The mobile entitlement allows this alarm behavior." | Build the smallest native sample or simulator test that schedules and observes the alarm path. |
| "The payment provider gives us idempotent retries." | Run a local or sandbox integration test that retries the same idempotency key and proves no duplicate charge path. |
| "The media API can generate the assets we need." | Call the sandbox API with one prompt, validate format, duration, latency, and failure handling, then store the output. |

Keep assumption probes narrow. They should answer one question and produce
evidence. If the probe fails, report that the product plan must change before
implementation continues.

## Observability-first runs

If you cannot see the run, you cannot operate it well.

For local and development work, use the CLI surfaces yourself:

```bash
bunx smithers-orchestrator ps
bunx smithers-orchestrator inspect RUN_ID --watch
bunx smithers-orchestrator events RUN_ID --watch
bunx smithers-orchestrator node RUN_ID NODE_ID
bunx smithers-orchestrator scores RUN_ID
bunx smithers-orchestrator why RUN_ID
```

Use serve mode when you need HTTP status, SSE events, remote approvals, or
Prometheus metrics:

```bash
bunx smithers-orchestrator up workflow.tsx --serve --metrics --port 7331
```

Use the observability stack when the work needs traces, metrics, dashboards, or
a reviewer evidence bundle:

```bash
bunx smithers-orchestrator observability
```

Enable OpenTelemetry export when you need trace-level proof, then include the
Grafana, Loki, Tempo, or Prometheus links and query results in the final report.
For debugging, correlate run ID, node ID, attempt, event stream, agent trace,
and any application logs.

When the human would benefit from seeing the work, suggest the UI and operate it
for them:

- `bunx smithers-orchestrator gui <path>` opens the workspace view.
- `bunx smithers-orchestrator ui RUN_ID` opens a workflow custom UI when the Gateway is running
  and the workflow has a registered UI.
- Gateway and custom UI streams expose run state, frames, approvals, node output,
  and DevTools snapshots for richer visual monitoring.

Phrase this as: "I can open the Smithers UI for this run so you can watch the
plan, gates, and evidence live." Do not phrase it as homework for the human.

## Hot validation loop

Use hot mode while authoring or tuning a workflow:

```bash
bunx smithers-orchestrator graph workflow.tsx
bunx smithers-orchestrator up workflow.tsx --hot true --input '{"prompt":"..."}'
```

The graph command validates the rendered shape without executing the whole job.
Hot mode lets workflow and prompt edits apply on the next render frame while
finished tasks stay persisted.

Rules of thumb:

- Use `--hot true` for prompt wording, task body, and non-schema workflow edits.
- Restart fresh when output schemas or task ID shapes change.
- Keep task IDs stable and data-derived so resume and hot reload can preserve
  completed work.
- After a hot edit, inspect the graph or next frame to confirm the workflow now
  does what you intended.

Do not treat hot reload as magic. Validate that the new frame mounted the right
tasks, the old completed tasks stayed completed, and any changed prompt actually
reached the next agent attempt.

## Reports for the human

End every substantial Smithers run with a human-readable report. Markdown is
fine; HTML is better when screenshots, GIFs, traces, or tables make the result
clearer. Write it as an artifact, for example:

```text
artifacts/smithers-report.md
artifacts/smithers-report.html
artifacts/screenshots/
artifacts/gifs/
artifacts/evals/
artifacts/traces/
```

The report should include:

- Summary: what changed, what shipped, and what did not.
- Run metadata: workflow name, run ID, branch or worktree, key node IDs.
- Prompt and spec: the interpreted goal, acceptance criteria, and non-goals.
- Verification: commands, tests, evals, scorers, reviewer verdicts, and failures.
- Assumption tests: probes run, outputs captured, and open risks.
- Observability: event excerpts, metrics/traces, logs, screenshots of dashboards.
- Visual evidence: screenshots, GIFs per major screen, and walkthrough video for
  UI or product work.
- Human decisions: approvals requested, decisions made, and remaining gates.
- Next steps: exact options, tradeoffs, and what you recommend.

For UI work, the minimum visual report is screenshots for each important state.
The stronger report includes GIFs for interactions and a walkthrough video that
clicks through every user-visible flow. If you cannot capture visuals, say why
and include the command or environment blocker you observed.

## Failure protocol

When a run fails or pauses unexpectedly, stay in the operator role:

1. Inspect the run with `why`, `inspect`, `events`, `node`, and logs.
2. Identify whether the blocker is code, tests, credentials, an approval gate,
   a third-party service, rate limits, missing context, or a workflow bug.
3. If it is fixable by you, fix it or resume from the correct frame.
4. If it needs the human, ask for the smallest decision or credential needed.
5. Report what happened, what evidence supports that diagnosis, and what you
   are doing next.

Bad response:

```text
Run smithers inspect and tell me what it says.
```

Good response:

```text
The run is paused at the deployment approval gate. I inspected the node output:
tests passed, the reviewer approved, and the only remaining action is your
approval to deploy. I recommend approving because the diff is limited to the
rate-limit middleware and the rollback path is unchanged.
```

The human should feel like they are talking to a careful operator, not like they
were handed a control plane manual.

## Minimal checklist

Before launching:

- Outcome, finish line, and evidence are written down.
- Missing context has been researched or asked for.
- Third-party assumptions have probes or are explicitly marked as risks.
- Workflow graph or eval dry-run has been checked.
- Backpressure checks exist and can fail.
- Observability path is chosen.
- Report artifact path is chosen.

While running:

- Watch the run.
- Use the UI when visual state, approvals, or steering would help.
- Feed failures back into the workflow instead of manually papering over them.
- Keep the human updated in plain English.

Before closing:

- Regenerate or collect the final evidence.
- Write the report.
- Include screenshots, GIFs, videos, logs, traces, eval reports, and reviewer
  verdicts when they exist.
- Explain remaining risk honestly.
- Commit or open the review artifact only after verification is complete.

---

## JSX API

> Author workflows as JSX trees. Smithers renders the tree, dispatches ready tasks, persists outputs, and re-renders. Branching and parallelism are plain JSX conditionals.

Workflows are JSX trees. Smithers renders the tree, extracts ready tasks, executes them, persists their outputs, and re-renders. Branching, looping, and parallelism are normal JSX.

## Setup

Most projects should use `bunx smithers-orchestrator init`; it scaffolds everything below.

To embed into an existing codebase:

```bash
bun add smithers-orchestrator react react-dom zod
bun add -d typescript @types/react @types/react-dom @types/node
```

Minimal `tsconfig.json`:

```json
{
  "compilerOptions": {
    "target": "ESNext",
    "module": "ESNext",
    "moduleResolution": "bundler",
    "jsx": "react-jsx",
    "jsxImportSource": "smithers-orchestrator",
    "strict": true,
    "noEmit": true,
    "skipLibCheck": true
  }
}
```

`jsxImportSource` is the only non-standard line; it routes JSX through `smithers-orchestrator/jsx-runtime` instead of React DOM.

Optional MDX prompts: add `bun add -d @types/mdx` and a `preload.ts` that calls `mdxPlugin()`, register it in `bunfig.toml` as `preload = ["./preload.ts"]`.

Verify with `bunx tsc --noEmit` and `bunx smithers-orchestrator --help`.

## A minimal workflow

```tsx
// @jsxImportSource smithers-orchestrator (only needed if not set in tsconfig.json)
import { createSmithers, Sequence, Task } from "smithers-orchestrator";
import { z } from "zod";

const { Workflow, smithers, outputs } = createSmithers({
  analysis: z.object({ summary: z.string() }),
});

export default smithers((ctx) => (
  <Workflow name="analyze">
    <Sequence>
      <Task id="analyze" output={outputs.analysis}>
        {{ summary: `Analyze ${ctx.input.repo}` }}
      </Task>
    </Sequence>
  </Workflow>
));
```

`outputs.analysis` is the typed reference for the Zod schema, so typos are compile errors. The task body is a JSX expression (`{...}`) whose value is a plain object, with no LLM call here, just a static return. Real tasks pass a `run` prop or an AI model. See the Task component reference.

## Reactivity

The tree re-renders on every frame, so branching is a normal JSX conditional. Inside a workflow function, `ctx` exposes `ctx.input` and `ctx.outputMaybe(ref, { nodeId })`. The latter returns the output of a completed task, or `undefined` if it hasn't run yet:

```tsx
const analysis = ctx.outputMaybe(outputs.analysis, { nodeId: "analyze" });
{analysis ? <Task id="report" output={outputs.report} agent={writer}>...</Task> : null}
```

The `report` Task doesn't exist in the plan until `analysis` completes. No placeholder, no skipped node. The conditional IS the dependency. Unlike static DAG tools that require you to declare optional nodes upfront, the JSX conditional is evaluated fresh each frame: if `analysis` is undefined, the `report` task simply doesn't exist in that frame's plan.

## Read next

- Tour: six-step worked example with agents, schemas, approvals, resume.
- How It Works: the render → execute → persist loop.
- Components: full prop surface for every JSX element.

---

## CLI

> Every Smithers CLI command in one structured catalog (TOON format).

Always invoke as `bunx smithers-orchestrator <command>` (see Installation for why). Use `--help` on any command for the canonical option list.

## Conventions

- Persisted state lives in the nearest `smithers.db` (walk up from the working directory). Most read commands fail with a friendly message if no DB is found.
- Boolean flags accept either bare form (`--watch`) or explicit `--watch true|false`.
- Global options: `--format toon|json|yaml|md|jsonl`, `--filter-output <key.path>`, `--full-output`, `--token-count`, `--token-limit N`, `--token-offset N`, `--schema`, `--llms`, `--llms-full`, `--mcp`, `--help`, `--version`.
- MCP stdio mode: pass `--mcp` to start Smithers as an MCP server. Add `--surface semantic|raw|both` to choose the exposed tool surface.
- Workflow resolution: `up`, `graph`, `revert`, `replay`, `fork`, `retry-task`, and `timetravel` take a workflow file path. `eval` accepts either a workflow path or a discovered workflow ID. `workflow run <name>` resolves IDs from the nearest local `.smithers/workflows/<name>.tsx` (walking up from the working directory) and from the global `~/.smithers/workflows/` pack. Local workflows take precedence: on an id collision the local file wins. The global pack honors `SMITHERS_HOME`. So global workflows run from any directory, while a repo's own pack can override them by name.
- Rewrites: `bunx smithers-orchestrator workflow WORKFLOW_ID` runs a discovered workflow when `<id>` resolves; `bunx smithers-orchestrator workflow.tsx` behaves like `bunx smithers-orchestrator up workflow.tsx`; `bunx smithers-orchestrator chat create` behaves like `bunx smithers-orchestrator chat-create`.
- JSON arguments are preflighted before workflow modules load. `--input` and `--annotations` accept an inline JSON value or `-` to read JSON from stdin, capped at 1 MiB.

## Exit codes

```text
0   success
1   execution failure
2   run cancelled / cancel succeeded
3   `up` ended in waiting-approval, waiting-event, or waiting-timer
4   invalid arguments / user-correctable input error
130 SIGINT
143 SIGTERM
```

## Command catalog (TOON)

Commands listed by dotted name. `human` and `alerts` use an action positional instead of nested subcommands.

```toon
commands[66]:
  - name: init
    purpose: Install the workflow pack into .smithers/ (local) or ~/.smithers/ (--global)
    flags[6]{name,short,type,default,desc}:
      force,,boolean,false,Overwrite existing scaffold files
      agents-only,,boolean,false,Only create .smithers/agents/ and leave workflow pack untouched
      install,,boolean,true,Run bun install inside the pack after scaffolding
      add-agents,,boolean,false,Launch the account registration wizard after scaffolding
      global,,boolean,false,Scaffold the global pack in ~/.smithers (honors SMITHERS_HOME) instead of ./.smithers
      template,,string,,Show next steps for a canonical starter template ID after init
  - name: starters
    purpose: Show plain-English starter workflows with copy-paste commands
    args[1]{name,type,required,desc}:
      id,string,false,Starter ID or alias
    flags[4]{name,short,type,default,desc}:
      audience,,string,,Filter by audience such as product, support, or founder
      goal,,string,,Filter by goal such as plan, build, debug, or quality
      workflow,,string,,Filter by seeded workflow ID
      tag,,string,,Filter by starter tag
  - name: up
    purpose: Start or resume a workflow execution
    args[1]{name,type,required,desc}:
      workflow,string,true,Workflow file path
    flags[28]{name,short,type,default,desc}:
      detach,d,boolean,false,Background mode; print runId/pid/logFile and exit
      run-id,r,string,,Explicit run ID
      max-concurrency,c,number,4,Maximum parallel tasks
      root,,string,,Tool sandbox root directory
      log,,boolean,true,Enable NDJSON event log file output
      log-dir,,string,,NDJSON event logs directory
      allow-network,,boolean,false,Allow bash tool network requests
      max-output-bytes,,number,,Max bytes a single tool call can return
      tool-timeout-ms,,number,,Max wall-clock time per tool call in ms
      hot,,boolean,false,Hot reload for .tsx workflows
      input,i,string,,Input data as JSON string or '-' to read JSON from stdin
      annotations,,string,,Run annotations as flat JSON string/number/boolean object or '-' to read JSON from stdin
      resume,,boolean|string,false,Resume an existing run; may be true or a run ID
      force,,boolean,false,Resume even if still marked running
      resume-claim-owner,,string,,Internal durable resume claim owner
      resume-claim-heartbeat,,number,,Internal durable resume claim heartbeat
      resume-restore-owner,,string,,Internal durable resume restore owner
      resume-restore-heartbeat,,number,,Internal durable resume restore heartbeat
      serve,,boolean,false,Start an HTTP server alongside the workflow
      supervise,,boolean,false,Run stale-run supervisor loop with --serve
      supervise-dry-run,,boolean,false,With --supervise; detect without resuming
      supervise-interval,,string,10s,Supervisor poll interval
      supervise-stale-threshold,,string,30s,Heartbeat staleness threshold
      supervise-max-concurrent,,number,3,Max runs resumed per poll
      port,,number,7331,HTTP server port when --serve
      host,,string,127.0.0.1,HTTP bind address when --serve
      auth-token,,string,,Bearer token for HTTP auth or SMITHERS_API_KEY env
      metrics,,boolean,true,Expose /metrics Prometheus endpoint when --serve
  - name: eval
    purpose: Run a workflow over JSON/JSONL cases and write a regression report
    args[1]{name,type,required,desc}:
      workflow,string,true,Workflow file path or discovered workflow ID
    flags[16]{name,short,type,default,desc}:
      cases,c,string,,JSON array, { cases: [...] }, or JSONL case file
      suite,s,string,,Stable suite ID used in run IDs and report paths
      run-label,,string,current UTC timestamp + nonce,Label appended to eval run IDs
      dry-run,n,boolean,false,Plan run IDs without launching workflows
      concurrency,j,number,1,Number of eval cases to run at once
      max-cases,,number,,Run only the first N cases
      report,r,string,.smithers/evals/<suite>.json,Report path
      force,,boolean,false,Overwrite an existing report
      include-output,,boolean,true,Include workflow outputs in the report
      max-concurrency,,number,,Per-workflow task concurrency
      root,,string,,Tool sandbox root directory
      log,,boolean,true,Enable NDJSON event log file output
      log-dir,,string,,NDJSON event logs directory
      allow-network,,boolean,false,Allow bash tool network requests
      max-output-bytes,,number,,Max bytes a single tool call can return
      tool-timeout-ms,,number,,Max wall-clock time per tool call in ms
      optimization,,string,,Apply a Smithers optimization artifact while running the eval suite
  - name: optimize
    purpose: Run GEPA prompt optimization over a workflow eval suite and write an optimized prompt artifact
    args[1]{name,type,required,desc}:
      workflow,string,true,Workflow file path or discovered workflow ID
    flags[16]{name,short,type,default,desc}:
      cases,c,string,,JSON array, { cases: [...] }, or JSONL case file
      suite,s,string,,Stable suite ID used in run IDs and report paths
      provider,p,enum,cerebras,GEPA patch generator provider
      model,m,string,,Optimizer model for provider-backed GEPA
      artifact,a,string,,Write the optimized prompt artifact to this path
      report-dir,,string,,Directory for baseline and optimized eval reports
      min-improvement,,number,0.000001,Minimum required absolute score improvement
      max-cases,,number,,Run only the first N cases
      concurrency,j,number,1,Number of eval cases to run at once
      max-concurrency,,number,,Per-workflow task concurrency
      root,,string,,Tool sandbox root directory
      log,,boolean,true,Enable NDJSON event log file output
      log-dir,,string,,NDJSON event logs directory
      allow-network,,boolean,false,Allow bash tool network requests
      max-output-bytes,,number,,Max bytes a single tool call can return
      tool-timeout-ms,,number,,Max wall-clock time per tool call in ms
  - name: supervise
    purpose: Watch for stale running runs and auto-resume them
    flags[4]{name,short,type,default,desc}:
      dry-run,n,boolean,false,Detect stale runs without resuming
      interval,i,string,10s,Poll interval
      stale-threshold,t,string,30s,Heartbeat staleness threshold before resume
      max-concurrent,c,number,3,Max runs resumed per poll
  - name: ps
    purpose: List active, paused, and recently completed runs
    flags[5]{name,short,type,default,desc}:
      status,s,string,,"Filter: running|waiting-approval|waiting-event|waiting-timer|continued|finished|failed|cancelled"
      limit,l,number,20,Max rows
      all,a,boolean,false,Include all statuses
      watch,w,boolean,false,Refresh continuously
      interval,i,number,2,Watch refresh seconds
  - name: logs
    purpose: Tail lifecycle events for a run
    args[1]{name,type,required,desc}:
      runId,string,true,Run ID
    flags[4]{name,short,type,default,desc}:
      follow,f,boolean,true,Poll for new events while run is active
      since,,number,,Start after event sequence number
      tail,n,number,50,Last N events
      follow-ancestry,,boolean,false,Include ancestor run events root-to-current
  - name: events
    purpose: Query run event history with filters, grouping, and NDJSON output
    args[1]{name,type,required,desc}:
      runId,string,true,Run ID
    flags[8]{name,short,type,default,desc}:
      node,n,string,,Filter by node ID
      type,t,string,,"Category: agent|approval|frame|memory|node|openapi|output|revert|run|sandbox|scorer|snapshot|supervisor|timer|token|tool-call|workflow"
      since,s,string,,Recent duration window such as 5m or 2h
      limit,l,number,1000,Max events; capped at 100000
      json,j,boolean,false,Emit NDJSON
      group-by,,string,,"node | attempt"
      watch,w,boolean,false,Append new events as they arrive
      interval,i,number,2,Watch poll seconds
  - name: chat
    purpose: Show agent chat output for the latest run or a specific run
    args[1]{name,type,required,desc}:
      runId,string,false,Run ID; latest run if omitted
    flags[4]{name,short,type,default,desc}:
      all,a,boolean,false,Show every agent attempt
      follow,f,boolean,false,Watch for new output
      tail,n,number,,Last N chat blocks
      stderr,,boolean,true,Include agent stderr
  - name: chat-create
    purpose: Create and start a one-task auto-hijacked chat run
    flags[2]{name,short,type,default,desc}:
      agent,,enum,,claude-code|codex|antigravity|gemini
      cwd,,string,.,Working directory for the chat session
  - name: hijack
    purpose: Hand off the latest resumable agent session or Smithers-managed conversation
    args[1]{name,type,required,desc}:
      runId,string,true,Run ID
    flags[3]{name,short,type,default,desc}:
      target,,string,,"Expected engine such as claude-code or codex"
      timeout-ms,,number,30000,Wait time for live handoff
      launch,,boolean,true,Open session immediately
  - name: inspect
    purpose: Output detailed state of a run: steps, agents, approvals, timers, loops, outputs
    args[1]{name,type,required,desc}:
      runId,string,true,Run ID
    flags[2]{name,short,type,default,desc}:
      watch,w,boolean,false,Refresh continuously
      interval,i,number,2,Watch refresh seconds
  - name: node
    purpose: Show enriched node details for debugging retries, tool calls, and output
    args[1]{name,type,required,desc}:
      nodeId,string,true,Node ID
    flags[6]{name,short,type,default,desc}:
      run-id,r,string,,Run ID containing the node
      iteration,i,number,,Loop iteration; latest if omitted
      attempts,,boolean,false,Expand all attempts in human output
      tools,,boolean,false,Expand tool input/output payloads
      watch,w,boolean,false,Refresh continuously
      interval,,number,2,Watch refresh seconds
  - name: why
    purpose: Explain why a run is currently blocked or paused
    args[1]{name,type,required,desc}:
      runId,string,true,Run ID
    flags[1]{name,short,type,default,desc}:
      json,,boolean,false,Structured JSON diagnosis
  - name: human
    purpose: List, answer, or cancel durable human requests
    args[2]{name,type,required,desc}:
      action,string,true,inbox|answer|cancel
      requestId,string,false,Human request ID for answer/cancel
    flags[2]{name,short,type,default,desc}:
      value,,string,,JSON response for answer
      by,,string,,Human operator identifier
  - name: ask-human
    purpose: Raise a blocking human-approval request from inside a run and wait for the decision
    args[1]{name,type,required,desc}:
      prompt,string,true,The decision or question to put to a human
    flags[7]{name,short,type,default,desc}:
      context,,string,,Extra context appended to the prompt
      choices,,string,,Comma-separated choices for a fixed-choice decision
      run-id,r,string,,Run to attach to (SMITHERS_RUN_ID or single active run)
      node,n,string,,Node id to attach to (SMITHERS_NODE_ID)
      iteration,,number,0,Loop iteration (SMITHERS_ITERATION or 0)
      timeout,,number,,Seconds before the request expires
      poll,,number,3,Poll interval in seconds while blocking
  - name: alerts
    purpose: List and manage durable alert instances
    args[2]{name,type,required,desc}:
      action,string,true,list|ack|resolve|silence
      alertId,string,false,Alert ID for ack/resolve/silence
  - name: approve
    purpose: Approve a paused approval gate; auto-detects the node if only one is pending
    args[1]{name,type,required,desc}:
      runId,string,true,Run ID
    flags[4]{name,short,type,default,desc}:
      node,n,string,,Node ID required if multiple approvals are pending
      iteration,,number,0,Loop iteration
      note,,string,,Approval note
      by,,string,,Approver identifier
  - name: deny
    purpose: Deny a paused approval gate
    args[1]{name,type,required,desc}:
      runId,string,true,Run ID
    flags[4]{name,short,type,default,desc}:
      node,n,string,,Node ID required if multiple approvals are pending
      iteration,,number,0,Loop iteration
      note,,string,,Denial note
      by,,string,,Denier identifier
  - name: signal
    purpose: Deliver a durable signal to a run waiting on Signal or WaitForEvent
    args[2]{name,type,required,desc}:
      runId,string,true,Run ID
      signalName,string,true,Signal name
    flags[3]{name,short,type,default,desc}:
      data,,string,,Signal payload as JSON; defaults to {}
      correlation,,string,,Correlation ID to match a specific waiter
      by,,string,,Signal sender identifier
  - name: cancel
    purpose: Safely halt agents and terminate one active run
    args[1]{name,type,required,desc}:
      runId,string,true,Run ID
  - name: down
    purpose: Cancel all active runs in the nearest smithers.db
    flags[1]{name,short,type,default,desc}:
      force,,boolean,false,Cancel runs even if they still appear live; without this only stale runs are cancelled
  - name: graph
    purpose: Render the workflow graph without executing it
    args[1]{name,type,required,desc}:
      workflow,string,true,Workflow file path
    flags[2]{name,short,type,default,desc}:
      run-id,r,string,graph,Run ID for context
      input,,string,,Input JSON; overrides persisted input
  - name: gui
    purpose: Open a directory as a workspace in Smithers GUI
    args[1]{name,type,required,desc}:
      path,string,false,Directory path (defaults to current working directory)
    flags[1]{name,short,type,default,desc}:
      bundle-id,,string,com.smithers.SmithersGUI,Smithers GUI app bundle identifier
  - name: ui
    purpose: Open the custom UI for a workflow run in your browser
    args[1]{name,type,required,desc}:
      runId,string,false,Run to open. Defaults to the most recent run.
    flags[4]{name,short,type,default,desc}:
      gateway,g,string,,Gateway base URL (default http://127.0.0.1:<port>)
      port,,number,7331,Gateway port when --gateway is not set
      workflow,w,string,,Open this workflow's UI directly skipping run lookup
      open,,boolean,true,Open a browser; use --no-open to just print the URL
  - name: revert
    purpose: Revert the workspace to a previous task attempt's filesystem state
    args[1]{name,type,required,desc}:
      workflow,string,true,Workflow file path
    flags[4]{name,short,type,default,desc}:
      run-id,r,string,,Run ID
      node-id,n,string,,Node ID
      attempt,,number,1,Attempt number
      iteration,,number,0,Loop iteration
  - name: retry-task
    purpose: Retry a specific task within a run, then resume the workflow
    args[1]{name,type,required,desc}:
      workflow,string,true,Workflow file path
    flags[5]{name,short,type,default,desc}:
      run-id,r,string,,Run ID
      node-id,n,string,,Task/node ID to retry
      iteration,,number,0,Loop iteration
      no-deps,,boolean,false,Only reset this node; skip dependents
      force,,boolean,false,Allow retry even if run is still running
  - name: timetravel
    purpose: Time-travel to a task state; revert filesystem, reset DB, optionally resume
    args[1]{name,type,required,desc}:
      workflow,string,true,Workflow file path
    flags[8]{name,short,type,default,desc}:
      run-id,r,string,,Run ID
      node-id,n,string,,Task/node ID
      iteration,,number,0,Loop iteration
      attempt,a,number,,Attempt number; latest if omitted
      no-vcs,,boolean,false,Skip filesystem revert; DB only
      no-deps,,boolean,false,Only reset this node
      resume,,boolean,false,Resume after time travel
      force,,boolean,false,Force even if run is still running
  - name: replay
    purpose: Fork from a checkpoint and resume execution
    args[1]{name,type,required,desc}:
      workflow,string,true,Workflow file path
    flags[6]{name,short,type,default,desc}:
      run-id,r,string,,Source run ID
      frame,f,number,,Frame number to fork from
      node,n,string,,Node ID to reset to pending
      input,i,string,,Input overrides as JSON
      label,l,string,,Branch label for the fork
      restore-vcs,,boolean,false,Restore jj filesystem state to source frame revision
  - name: fork
    purpose: Create a branched run from a snapshot checkpoint
    args[1]{name,type,required,desc}:
      workflow,string,true,Workflow file path
    flags[6]{name,short,type,default,desc}:
      run-id,r,string,,Source run ID
      frame,f,number,,Frame number to fork from
      reset-node,n,string,,Node ID to reset to pending
      input,i,string,,Input overrides as JSON
      label,l,string,,Branch label
      run,,boolean,false,Immediately start the forked run
  - name: timeline
    purpose: View execution timeline for a run and its forks
    args[1]{name,type,required,desc}:
      runId,string,true,Run ID
    flags[2]{name,short,type,default,desc}:
      tree,,boolean,false,Include all child forks recursively
      json,,boolean,false,Output as JSON
  - name: tree
    purpose: Print DevTools snapshot as an XML tree
    args[1]{name,type,required,desc}:
      runId,string,true,Run ID
    flags[6]{name,short,type,default,desc}:
      frame,,number,,Historical frame number
      watch,,boolean,false,Stream live DevTools events
      json,j,boolean,false,Emit snapshot JSON
      depth,,number,,Truncate depth
      node,,string,,Scope to subtree
      color,,enum,auto,auto|always|never
  - name: diff
    purpose: Print a node DiffBundle as unified diff
    args[2]{name,type,required,desc}:
      runId,string,true,Run ID containing the node
      nodeId,string,true,Node ID to diff
    flags[4]{name,short,type,default,desc}:
      iteration,,number,,Loop iteration; latest if omitted
      json,j,boolean,false,Emit raw DiffBundle
      stat,,boolean,false,Show stat summary only
      color,,enum,auto,auto|always|never
  - name: output
    purpose: Print a node output row
    args[2]{name,type,required,desc}:
      runId,string,true,Run ID containing the node
      nodeId,string,true,Node ID to fetch output for
    flags[3]{name,short,type,default,desc}:
      iteration,,number,,Loop iteration; latest if omitted
      json,j,boolean,true,Emit raw row as JSON
      pretty,,boolean,false,Schema-ordered render
  - name: rewind
    purpose: Rewind a run to a previous frame
    args[2]{name,type,required,desc}:
      runId,string,true,Run ID to rewind
      frameNo,number,true,Target frame number
    flags[2]{name,short,type,default,desc}:
      yes,,boolean,false,Skip confirmation prompt
      json,j,boolean,false,Emit JumpResult JSON
  - name: snapshots
    purpose: List durability snapshots (workspace checkpoints) for a run
    args[1]{name,type,required,desc}:
      runId,string,true,Run ID to list snapshots for
    flags[1]{name,short,type,default,desc}:
      json,,boolean,false,Emit rows as JSON
  - name: restore
    purpose: Restore a worktree to a durability checkpoint (latest for the node, or --seq)
    args[2]{name,type,required,desc}:
      runId,string,true,Run ID containing the checkpoint
      nodeId,string,true,Node ID whose worktree to restore
    flags[2]{name,short,type,default,desc}:
      iteration,,number,,Loop iteration
      seq,,number,,Checkpoint seq; latest if omitted
  - name: snapshot-hook
    purpose: "Internal: PostToolUse hook that requests a Tier 1 durability snapshot"
  - name: observability
    purpose: Start or stop the local Docker Compose observability stack
    flags[2]{name,short,type,default,desc}:
      detach,d,boolean,false,Run containers in the background
      down,,boolean,false,Stop and remove the stack
  - name: ask
    purpose: Ask a question about Smithers using an installed agent and the Smithers MCP server
    args[1]{name,type,required,desc}:
      question,string,false,Question to ask
    flags[6]{name,short,type,default,desc}:
      agent,,enum,,claude|codex|antigravity|gemini|kimi|pi
      list-agents,,boolean,false,List detected agents and exit
      dump-prompt,,boolean,false,Print generated system prompt and exit
      tool-surface,,enum,semantic,semantic|raw
      no-mcp,,boolean,false,Disable MCP bootstrap and use prompt fallback
      print-bootstrap,,boolean,false,Print selected bootstrap configuration and exit
  - name: scores
    purpose: View scorer results for a specific run
    args[1]{name,type,required,desc}:
      runId,string,true,Run ID
    flags[1]{name,short,type,default,desc}:
      node,,string,,Filter scores to a specific node ID
  - name: usage
    purpose: Show how much rate limit or subscription quota each registered account has used
    flags[3]{name,short,type,default,desc}:
      account,,string,,Only report this account label
      provider,,string,,Only report accounts for this provider
      fresh,,boolean,false,Bypass the short usage cache while respecting provider rate-limit floors
  - name: docs
    purpose: Print llms.txt for this CLI version
    flags[2]{name,short,type,default,desc}:
      latest,,boolean,false,Fetch the latest docs from smithers.sh instead of docs for this CLI version
      docs-version,,string,,Fetch docs for a specific Smithers version, e.g. 0.22.0 or v0.22.0
  - name: docs-full
    purpose: Print llms-full.txt for this CLI version
    flags[2]{name,short,type,default,desc}:
      latest,,boolean,false,Fetch the latest docs from smithers.sh instead of docs for this CLI version
      docs-version,,string,,Fetch docs for a specific Smithers version, e.g. 0.22.0 or v0.22.0
  - name: agents.capabilities
    purpose: Print JSON capability registry for built-in CLI agents
  - name: agents.doctor
    purpose: Validate built-in CLI agent capability registries and command-surface contracts
    flags[1]{name,short,type,default,desc}:
      json,,boolean,false,Print doctor report as JSON
  - name: agents.add
    purpose: Register a Smithers agent account, interactively or with flags
    flags[9]{name,short,type,default,desc}:
      provider,,enum,,"claude-code|antigravity|codex|gemini|kimi|anthropic-api|openai-api|gemini-api"
      label,,string,,Unique account label
      config-dir,,string,,Per-account CLI config dir for subscription providers
      api-key,,string,,API key for API-key providers
      model,,string,,Default model for this account
      skip-login,,boolean,false,Skip credential-directory check
      force,,boolean,false,Register even if no credentials are present
      replace,,boolean,false,Overwrite an existing account with the same label
      loop,,boolean,false,Wizard mode; keep adding accounts until done
  - name: agents.list
    purpose: List registered Smithers agent accounts
  - name: agents.remove
    purpose: Remove a registered agent account by label
    args[1]{name,type,required,desc}:
      label,string,true,Account label
    flags[1]{name,short,type,default,desc}:
      silent,,boolean,false,Do not error if the label is not registered
  - name: agents.test
    purpose: Spawn an account's underlying CLI with --version
    args[1]{name,type,required,desc}:
      label,string,true,Account label
  - name: workflow.list
    purpose: List discovered workflows (local .smithers/workflows/ plus the global ~/.smithers/workflows/; each entry reports its scope, local shadows global)
  - name: workflow.run
    purpose: Run a discovered workflow by ID
    args[1]{name,type,required,desc}:
      name,string,true,Workflow ID
    flags[2]{name,short,type,default,desc}:
      prompt,p,string,,Shorthand for input.prompt when --input is omitted
      up-flags,,object,,Also accepts all smithers-orchestrator up flags
  - name: workflow.path
    purpose: Resolve a workflow ID to its entry file path
    args[1]{name,type,required,desc}:
      name,string,true,Workflow ID
  - name: workflow.inspect
    purpose: Show workflow metadata and an agent-facing skill preview
    args[1]{name,type,required,desc}:
      name,string,true,Workflow ID
  - name: workflow.create
    purpose: Create a new flat workflow scaffold in .smithers/workflows/ (or ~/.smithers with --global)
    args[1]{name,type,required,desc}:
      name,string,true,New workflow ID
    flags[1]{name,short,type,default,desc}:
      global,,boolean,false,Create in the global ~/.smithers pack (honors SMITHERS_HOME) instead of the local .smithers
  - name: workflow.skills
    purpose: Generate agent-facing skill docs for discovered workflows
    args[1]{name,type,required,desc}:
      name,string,false,Workflow ID; omit for all workflows
    flags[3]{name,short,type,default,desc}:
      output,,string,,Output file for one workflow, or output directory for all
      force,,boolean,false,Overwrite existing skill files
      global,,boolean,false,Write skills into the global ~/.smithers pack instead of the local .smithers
  - name: workflow.doctor
    purpose: Inspect workflow discovery, preload files, bunfig, and detected agents
    args[1]{name,type,required,desc}:
      name,string,false,Workflow ID; omit for all
  - name: cron.start
    purpose: Start the background scheduler loop in the current terminal
  - name: cron.add
    purpose: Register a new workflow cron schedule
    args[2]{name,type,required,desc}:
      pattern,string,true,Cron expression
      workflowPath,string,true,Path or ID of workflow to schedule
  - name: cron.list
    purpose: List registered background cron schedules
  - name: cron.rm
    purpose: Delete a cron schedule by ID
    args[1]{name,type,required,desc}:
      cronId,string,true,Cron ID
  - name: memory.list
    purpose: List all memory facts in a namespace
    args[1]{name,type,required,desc}:
      namespace,string,true,Namespace such as workflow:my-flow
    flags[1]{name,short,type,default,desc}:
      workflow,w,string,,Path to workflow file that locates the DB
  - name: openapi.list
    purpose: Preview tools generated from an OpenAPI spec
    args[1]{name,type,required,desc}:
      specPath,string,true,File path or URL to OpenAPI spec
  - name: token.issue
    purpose: Issue a local short-lived Gateway bearer token grant
    flags[4]{name,short,type,default,desc}:
      scopes,,string,run:read,Comma or space separated Gateway scopes
      role,,string,operator,Role recorded on the token grant
      user-id,,string,,User ID recorded on the token grant
      ttl,,string,1h,Token lifetime such as 15m or 1h
  - name: token.revoke
    purpose: Revoke a locally issued Gateway bearer token
    args[1]{name,type,required,desc}:
      token,string,true,Bearer token to revoke
  - name: completions
    purpose: Generate shell completion scripts
    args[1]{name,type,required,desc}:
      shell,string,true,bash|fish|nushell|zsh
  - name: mcp.add
    purpose: Register Smithers as an MCP server for an agent integration
    flags[3]{name,short,type,default,desc}:
      agent,,string,,Target agent such as claude-code or cursor
      command,c,string,,Override the command agents will run
      no-global,,boolean,false,Install to project instead of globally
  - name: skills.add
    purpose: Sync skill files to agent integrations
    flags[2]{name,short,type,default,desc}:
      depth,,number,1,Grouping depth for skill files
      no-global,,boolean,false,Install to project instead of globally
  - name: skills.list
    purpose: List available skills
```

## Operational notes

- **Detached mode** (`up --detach`): redirects stdout/stderr to a log file, prints `runId`/`pid`/`logFile`, and exits.
- **Serve mode** (`up --serve`): starts the HTTP app and keeps the process alive until interrupted. Add `--supervise` to run stale-run recovery in the same process.
- **Watch mode**: `ps`, `events`, `inspect`, `node`, and `tree` have watch-style behavior. They stop cleanly on SIGINT and most stop when the run becomes terminal.
- **DevTools commands**: `tree`, `diff`, `output`, and `rewind` intentionally use command-scoped `--json`/`-j` and return exit code `1` for parser/user errors.
- **Account commands**: `agents add|list|remove|test` manage `~/.smithers/accounts.json`; subscription providers use CLI config directories, API providers use API keys.
- **Output format**: all commands honour `--format toon|json|yaml|md|jsonl`; `--filter-output <key.path>` extracts a nested field from JSON output.
- **Removed command**: `bunx smithers-orchestrator tui` was removed in 0.20.2. Use `ps`, `events`, `chat`, `inspect`, `node`, `tree`, `diff`, `output`, or `rewind` instead. See `tui.mdx` for migration details.

---

## CLI Quickstart

> Short operational cheatsheet. The full catalog lives in CLI.

**Prerequisites:** `bun` installed, and a project directory where you want to run Smithers.

**Set up**

```bash
bunx smithers-orchestrator init                                       # scaffold .smithers/
bunx smithers-orchestrator init --template idea-to-tickets            # scaffold with guided template next steps
bunx smithers-orchestrator starters                                   # browse template IDs
```

**Verify the setup**

```bash
bunx smithers-orchestrator --version          # prints the version, e.g. 0.23.0
bunx smithers-orchestrator workflow doctor    # vcs + workflow pack health
bunx smithers-orchestrator workflow list      # workflows discovered in this repo
```

`workflow doctor` confirms a VCS resolved and the pack scaffolded (output trimmed):

```text
vcs:
  jj:
    path: jj
    source: path
  git:
    path: git
    source: path
  ok: true
workflowRoot: /your/repo/.smithers
packs[2]:
  - scope: local
    packDir: /your/repo/.smithers
    preload:
      path: /your/repo/.smithers/preload.ts
      exists: true
    bunfig:
      path: /your/repo/.smithers/bunfig.toml
      exists: true
  - scope: global
    ...
workflows[46]:
  - id: audit
  - id: implement
  - ...
```

`ok: true` under `vcs` means jj or git resolved; if it is `false`, install one (see Installation). A non-empty `workflows[...]` count means the pack is in place. `workflow list` prints the same workflow entries with full metadata.

**Run**

```bash
bunx smithers-orchestrator workflow run implement --prompt "…"        # launch a seeded workflow
bunx smithers-orchestrator up workflow.tsx --run-id RUN_ID --resume true   # resume a paused run
```

**What success looks like**

`workflow run` and `up` stream the lifecycle event log and exit on a terminal line:

```text
[12:00:01] ▶ Run started
[12:00:01] ↺ Run status: running
[12:00:38] ✓ implement (attempt 1)
[12:00:40] ✓ Run finished
```

`✓ Run finished` means the run completed. `⏸ <node> waiting for approval` means it paused for you (clear it with `approve`, then resume). `✗ Run failed: …` reports the error. The CLI prints the run id and suggested next commands (`logs`, `inspect`) after launch; pass that id to every command below.

**Observe & control**

```bash
bunx smithers-orchestrator ps                                         # list runs
bunx smithers-orchestrator inspect RUN_ID                           # structured run state
bunx smithers-orchestrator logs RUN_ID --tail 20 --follow           # stream events
bunx smithers-orchestrator why RUN_ID                               # why is it paused?
bunx smithers-orchestrator approve RUN_ID --node NODE_ID --by NAME   # approve a paused node
bunx smithers-orchestrator cancel RUN_ID                            # cancel a run
```

See Tour for a full worked example. See CLI overview for the complete flag reference.

---

## <Workflow>

> Root container; sequences direct children, optionally caches for resume.

```ts
import { Workflow } from "smithers-orchestrator";

type WorkflowProps = {
  name: string;
  cache?: boolean; // skip already-completed nodes on resume
  children?: ReactNode;
};
```

```tsx
import { createSmithers } from "smithers-orchestrator";
import { ToolLoopAgent as Agent } from "ai";
import { anthropic } from "@ai-sdk/anthropic";
import { z } from "zod";

const { Workflow, Task, smithers, outputs } = createSmithers({
  research: z.object({ findings: z.string() }),
});

const researcher = new Agent({
  model: anthropic("claude-sonnet-4-20250514"),
  instructions: "You are a research assistant.",
});

export default smithers((ctx) => (
  <Workflow name="research-pipeline" cache>
    <Task id="research" output={outputs.research} agent={researcher}>
      {`Research: ${ctx.input.topic}`}
    </Task>
  </Workflow>
));
```

## Notes

- Direct children run sequentially; `<Sequence>` is only needed inside other control-flow components.
- Custom Drizzle tables require `runId`, `nodeId`, and `iteration` columns with a composite primary key `(run_id, node_id, iteration)`. Tasks outside a `<Loop>` write `iteration = 0`.

---

## <Task>

> Executable node; runs an agent, compute callback, or emits a static value.

```ts
import { Task } from "smithers-orchestrator";

type TaskProps = {
  id: string;
  output: z.ZodObject | Table | string;
  outputSchema?: z.ZodObject; // inferred when output is a Zod schema
  agent?: AgentLike | AgentLike[]; // array = [primary, ...fallbacks]
  fallbackAgent?: AgentLike;
  dependsOn?: string[];
  needs?: Record<string, string>;
  deps?: Record<string, OutputTarget>; // typed render-time upstream outputs
  fork?: string; // start from another task's final agent session snapshot
  allowTools?: string[]; // CLI-agent tool allowlist
  key?: string;
  skipIf?: boolean;
  needsApproval?: boolean; // pause for human before executing
  async?: boolean; // with needsApproval: let unrelated flow continue
  timeoutMs?: number;
  retries?: number; // default Infinity with exponential backoff
  noRetry?: boolean;
  retryPolicy?: { backoff?: "fixed" | "linear" | "exponential"; initialDelayMs?: number };
  continueOnFail?: boolean;
  cache?: { by?: (ctx) => unknown; version?: string; key?: string; ttlMs?: number; scope?: "run" | "workflow" | "global" };
  label?: string;
  meta?: Record<string, unknown>;
  scorers?: ScorersMap;
  memory?: {
    recall?: { namespace?: string; query?: string; topK?: number };
    remember?: { namespace?: string; key?: string };
    threadId?: string;
  };
  heartbeatTimeoutMs?: number; // fail if no heartbeat in window
  heartbeatTimeout?: number; // alias of heartbeatTimeoutMs
  hijack?: boolean; // request an immediate hijack handoff as soon as the task starts running
  onHijackExit?: "complete" | "reopen"; // what Smithers should do after a hijacked session exits
  children?:
    | string
    | Row
    | (() => Row | Promise<Row>)
    | ReactNode
    | ((deps) => Row | ReactNode);
};
```

```tsx
import { ToolLoopAgent as Agent } from "ai";
import { anthropic } from "@ai-sdk/anthropic";

const codeAgent = new Agent({
  model: anthropic("claude-sonnet-4-20250514"),
  instructions: "You are a senior software engineer.",
});

<Task id="analyze" output={outputs.analysis} agent={codeAgent}>
  {`Analyze: ${ctx.input.repoPath}`}
</Task>

<Task id="review" output={outputs.review} agent={reviewAgent} deps={{ analyze: outputs.analysis }}>
  {(deps) => `Review: ${deps.analyze.summary}`}
</Task>
```

## Fork

Every agent task produces a reusable session snapshot. Use `fork` to start a new task from any previous task's context.

`<Task id={B} fork={A}>` means:

- `B` depends on `A` and cannot run until `A` has completed.
- `B` starts from a **copy** of `A`'s final agent session context, then submits its own prompt into that copy.
- `B` produces its own output and its own session snapshot. `A` is never mutated.

`fork` is immutable. It does not continue or mutate the source task; it copies the conversation into a fresh, independent session. Multiple tasks may fork the same source safely, and a forked task may itself be forked.

```tsx
const PLAN = "plan" as const;
const IMPLEMENT = "implement" as const;
const VERIFY = "verify" as const;

<Task id={PLAN} agent={claude} output={outputs.plan}>
  Make a plan.
</Task>

<Task id={IMPLEMENT} agent={claude} fork={PLAN} output={outputs.patch}>
  Implement the plan.
</Task>

<Task id={VERIFY} agent={claude} fork={IMPLEMENT} output={outputs.result}>
  Run tests and fix failures.
</Task>
```

`VERIFY` forks `IMPLEMENT`, which forked `PLAN`, so `VERIFY` sees the whole `plan → implement` conversation.

**Parallel branches**: fork the same source from sibling tasks; each gets its own copy and they never affect each other:

```tsx
<Task id="investigate" agent={claude} output={outputs.investigation}>
  Understand the bug and identify possible fixes.
</Task>

<Parallel>
  <Task id="minimal-fix" agent={claude} fork="investigate" output={outputs.patch}>
    Try the minimal fix.
  </Task>
  <Task id="refactor-fix" agent={claude} fork="investigate" output={outputs.patch}>
    Try the refactor fix.
  </Task>
</Parallel>
```

`fork` composes with `dependsOn`, `needs`, `deps`, `Sequence`, `Parallel`, `Branch`, and `Loop`. Inside a loop, `fork` resolves to the **latest completed** session snapshot for that task id; there is no iteration selector and no ambiguity.

### Error cases

- **`TASK_FORK_SOURCE_NOT_FOUND`**: `fork` points to a task id not present in the graph (including a source that exists only in an unselected `<Branch>`).
- **`TASK_FORK_CYCLE`**: `fork` creates a cycle, directly or indirectly.
- **`TASK_FORK_SESSION_UNAVAILABLE`**: the forking task is not an agent task, or the source completed but produced no usable session snapshot (e.g. a compute/static source, or a source that was skipped/cancelled).
- **`TASK_FORK_SOURCE_NOT_COMPLETE`**: the source exists but has not completed; the forked task waits and does not run.

## Notes

- Three modes by `children` shape: agent (with `agent`), compute (function, no agent), static (value, no agent).
- `fork` requires an agent task; the source must be an agent task with a session snapshot. Forking copies the conversation into a new session and never reuses a native session id.
- When `outputSchema` is set, JSON is extracted from agent text; schema-validation retries don't consume `retries`.
- Auth errors short-circuit retries; non-idempotent tool reuse warns on the next attempt.

---

## <Sequence>

> Run children in source order.

```ts
import { Sequence } from "smithers-orchestrator";

type SequenceProps = {
  key?: string;
  skipIf?: boolean;
  children?: ReactNode;
};
```

```tsx
<Workflow name="build-and-deploy">
  <Parallel maxConcurrency={2}>
    <Sequence>
      <Task id="build-frontend" output={outputs.buildFrontend}>
        {{ status: "built" }}
      </Task>
      <Task id="test-frontend" output={outputs.testFrontend}>
        {{ passed: true }}
      </Task>
    </Sequence>
    <Sequence>
      <Task id="build-backend" output={outputs.buildBackend}>
        {{ status: "built" }}
      </Task>
      <Task id="test-backend" output={outputs.testBackend}>
        {{ passed: true }}
      </Task>
    </Sequence>
  </Parallel>
</Workflow>
```

## Notes

- `<Workflow>` sequences direct children implicitly; `<Sequence>` is only needed inside other control-flow components.
- Empty `<Sequence>` is valid and produces no tasks.

---

## <Parallel>

> Run children concurrently with an optional concurrency cap.

```ts
import { Parallel } from "smithers-orchestrator";

type ParallelProps = {
  id?: string;
  maxConcurrency?: number; // default: unlimited (no cap)
  skipIf?: boolean;
  children?: ReactNode;
};
```

```tsx
<Workflow name="checks">
  <Parallel maxConcurrency={2}>
    <Task id="lint" output={outputs.lint}>
      {{ errors: 0 }}
    </Task>
    <Task id="typecheck" output={outputs.typecheck}>
      {{ passed: true }}
    </Task>
    <Task id="test" output={outputs.test}>
      {{ passed: true }}
    </Task>
  </Parallel>
</Workflow>
```

## Notes

- Group completes when all children finish. To let the group proceed past a failing child, set `continueOnFail` on that individual child `<Task>`. It is not a `<Parallel>` prop.
- Children receive `parallelGroupId` and `parallelMaxConcurrency` in their descriptor.

---

## <Branch>

> Conditional fork; mounts `then` when `if` is true, otherwise `else`.

```ts
import { Branch } from "smithers-orchestrator";

type BranchProps = {
  if: boolean;
  then: ReactElement;
  else?: ReactElement | null;
  skipIf?: boolean;
};
```

```tsx
<Workflow name="deploy-pipeline">
  <Task id="test" output={outputs.test}>
    {{ passed: true }}
  </Task>

  <Branch
    if={ctx.output(outputs.test, { nodeId: "test" }).passed}
    then={
      <Task id="deploy" output={outputs.deploy}>
        {{ url: "https://prod.example.com" }}
      </Task>
    }
    else={
      <Task id="notify-failure" output={outputs.notifyFailure}>
        {{ message: "Tests failed, skipping deploy." }}
      </Task>
    }
  />
</Workflow>
```

## Notes

- `if` re-evaluates every render frame; read completed-task outputs via `ctx.outputMaybe()`.
- Each branch takes one element; wrap multiples in `<Sequence>` or `<Parallel>`.
- Unselected branch tasks are absent from the task graph.

---

## <Loop>

> Re-run children until `until` is true or `maxIterations` is hit.

```ts
import { Loop } from "smithers-orchestrator";

type LoopProps = {
  id?: string; // auto-generated from tree position
  until?: boolean;
  maxIterations?: number; // default 5
  onMaxReached?: "fail" | "return-last"; // default "return-last"
  continueAsNewEvery?: number; // checkpoint every N iters to bound history
  skipIf?: boolean;
  children?: ReactNode;
};
```

```tsx
export default smithers((ctx) => {
  const latestReview = ctx.latest("review", "review");

  return (
    <Workflow name="refine-loop">
      <Loop until={latestReview?.approved === true} maxIterations={5}>
        <Sequence>
          <Task id="write" output={outputs.draft} agent={writer}>
            {latestReview
              ? `Improve the draft. Feedback: ${latestReview.feedback}`
              : `Write a draft about: ${ctx.input.topic}`}
          </Task>
          <Task id="review" output={outputs.review} agent={reviewer}>
            {`Review the latest draft.`}
          </Task>
        </Sequence>
      </Loop>
    </Workflow>
  );
});
```

## Notes

- `ctx.latest(table, nodeId)` reads the highest-iteration output; `until` must use `ctx.outputMaybe()` since output is absent on iter 0.
- Direct nesting of `<Loop>` in `<Loop>` throws; wrap the inner loop in `<Sequence>`.
- Custom Drizzle tables for loop tasks require `iteration` in the primary key.
- `Ralph` is still exported as a deprecated alias of `Loop` for older workflows. New code should import and render `Loop`.

---

## <Approval>

> Durable human approval; persists ApprovalDecision, selection, or ranking.

```ts
import { Approval, approvalDecisionSchema } from "smithers-orchestrator";

type ApprovalProps = {
  id: string;
  mode?: "approve" | "select" | "rank"; // default "approve"
  options?: ApprovalOption[]; // required for select/rank
  output: z.ZodObject | Table | string;
  outputSchema?: z.ZodObject; // default: mode-appropriate schema (approve→approvalDecisionSchema, select→approvalSelectionSchema, rank→approvalRankingSchema); or the output schema itself when output is a ZodObject
  request: { title: string; summary?: string; metadata?: Record<string, unknown> };
  onDeny?: "fail" | "continue" | "skip"; // default "fail"
  allowedScopes?: string[];
  allowedUsers?: string[];
  autoApprove?: {
    after?: number; // auto-approve after N consecutive manual approvals
    condition?: (ctx: WorkflowContext) => boolean;
    audit?: boolean;
    revertOn?: (ctx: WorkflowContext) => boolean;
  };
  async?: boolean; // unrelated downstream may continue while pending
  dependsOn?: string[];
  needs?: Record<string, string>;
  skipIf?: boolean;
  timeoutMs?: number;
  heartbeatTimeoutMs?: number;
  heartbeatTimeout?: number;
  retries?: number;
  retryPolicy?: { backoff?: "fixed" | "linear" | "exponential"; initialDelayMs?: number };
  continueOnFail?: boolean;
  cache?: { by?: (ctx) => unknown; version?: string };
  label?: string;
  meta?: Record<string, unknown>;
};
```

```tsx
import { Approval, Sequence, Task, Workflow, approvalDecisionSchema, createSmithers } from "smithers-orchestrator";
import { z } from "zod";

const { smithers, outputs } = createSmithers({
  publishApproval: approvalDecisionSchema,
  publishResult: z.object({ status: z.enum(["published", "rejected"]) }),
});

export default smithers((ctx) => {
  const decision = ctx.outputMaybe(outputs.publishApproval, { nodeId: "approve-publish" });
  return (
    <Workflow name="publish-flow">
      <Sequence>
        <Approval
          id="approve-publish"
          output={outputs.publishApproval}
          request={{ title: "Publish the draft?", summary: "Human review required." }}
          onDeny="continue"
        />
        {decision ? (
          <Task id="record" output={outputs.publishResult}>
            {{ status: decision.approved ? "published" : "rejected" }}
          </Task>
        ) : null}
      </Sequence>
    </Workflow>
  );
});
```

## Notes

- `mode="select"` returns `{ selected, notes }`; `mode="rank"` returns `{ ranked, notes }`.
- Durable deferred keyed on (run, node, iteration) survives restarts; `bunx smithers-orchestrator approve`/`deny` resolves it.
- For a pre-task pause without persisted decision, use `<Task needsApproval>`.

---

## <ApprovalGate>

> Conditional approval; pauses for human when `when` is true, else auto-approves.

```ts
import { ApprovalGate } from "smithers-orchestrator";

type ApprovalGateProps = {
  id: string;
  output: OutputTarget; // z.ZodObject<z.ZodRawShape> | { $inferSelect: Record<string, unknown> } | string
  request: { title: string; summary?: string; metadata?: Record<string, unknown> };
  when: boolean; // true => require human; false => auto-approve immediately
  onDeny?: "fail" | "continue" | "skip"; // if omitted, denial fails the task
  skipIf?: boolean;
  timeoutMs?: number;
  heartbeatTimeoutMs?: number;
  heartbeatTimeout?: number;
  retries?: number;
  retryPolicy?: { backoff?: "fixed" | "linear" | "exponential"; initialDelayMs?: number };
  continueOnFail?: boolean;
};
```

```tsx
const risk = ctx.output(outputs.riskScore, { nodeId: "risk" });

<Workflow name="deploy-pipeline">
  <Sequence>
    <Task id="risk" output={outputs.riskScore} agent={riskAgent}>
      Assess deploy risk.
    </Task>
    <ApprovalGate
      id="deploy-approval"
      output={outputs.deployDecision}
      when={risk.level === "high"}
      request={{
        title: "Approve high-risk deploy?",
        summary: `Risk score: ${risk.score}/100`,
      }}
      onDeny="fail"
    />
    <Task id="deploy" output={outputs.deploy}>
      {{ deployed: true }}
    </Task>
  </Sequence>
</Workflow>
```

## Notes

- Auto-approve emits a valid `ApprovalDecision` (`{ approved: true, note: "auto-approved", ... }`); downstream branching stays uniform.
- `onDeny` applies only to the human path; auto-approve always succeeds.

---

## <EscalationChain>

> Sequential agent escalation with optional human fallback.

```ts
// Props
import { EscalationChain } from "smithers-orchestrator";

type EscalationChainProps = {
  id?: string; // default "escalation"
  levels: EscalationLevel[];
  humanFallback?: boolean; // default false
  humanRequest?: ApprovalRequest;
  escalationOutput: z.ZodObject | { $inferSelect: Record<string, unknown> } | string;
  skipIf?: boolean;
  children?: ReactNode; // prompt forwarded to every level
};

type EscalationLevel = {
  agent: AgentLike;
  output: z.ZodObject | { $inferSelect: Record<string, unknown> } | string;
  label?: string;
  escalateIf?: (result: unknown) => boolean; // true -> next level
};
```

```tsx
<Workflow name="support-ticket">
  <EscalationChain
    id="support"
    escalationOutput={outputs.escalation}
    humanFallback
    humanRequest={{ title: "Ticket needs human support", summary: "Agents could not resolve." }}
    levels={[
      { agent: fastAgent, output: outputs.tier1, label: "Tier 1", escalateIf: (r) => r.confidence < 0.7 },
      { agent: powerAgent, output: outputs.tier2, label: "Tier 2", escalateIf: (r) => r.confidence < 0.9 },
    ]}
  >
    Resolve this ticket: {ctx.input.ticketBody}
  </EscalationChain>
</Workflow>
```

## Notes

- Each level uses `continueOnFail`; failures propagate to the next level.
- `escalateIf` is evaluated at render time. The chain re-renders reactively as each level's output becomes available and calls the predicate to decide whether the next level mounts.

---

## <DecisionTable>

> Flat rule table that replaces nested Branch trees with declarative routing.

```ts
// Props
import { DecisionTable } from "smithers-orchestrator";

type DecisionTableProps = {
  id?: string;
  rules: DecisionRule[];
  default?: ReactElement; // rendered when no rule matches
  strategy?: "first-match" | "all-match"; // default "first-match"
  skipIf?: boolean;
};

type DecisionRule = {
  when: boolean; // evaluated at render time
  then: ReactElement;
  label?: string;
};
```

```tsx
<DecisionTable
  rules={[
    {
      when: triage.severity === "critical",
      then: (
        <Task id="page-oncall" output={outputs.page} agent={pagerAgent}>
          Page the on-call engineer immediately.
        </Task>
      ),
    },
    {
      when: triage.severity === "high",
      then: <Task id="assign-senior" output={outputs.assign}>{{ assignee: "senior-pool" }}</Task>,
    },
  ]}
  default={<Task id="default-assign" output={outputs.assign}>{{ assignee: "general-pool" }}</Task>}
/>
```

## Notes

- `first-match` builds nested Branches; order matters.
- `all-match` wraps every matching rule in a Parallel; no ordering guarantee.

---

## <HumanTask>

> Suspend until a human submits JSON matching the output schema.

```ts
import { HumanTask } from "smithers-orchestrator";

type HumanTaskProps = {
  id: string;
  output: z.ZodObject | Table | string;
  outputSchema?: z.ZodObject; // inferred when output is a Zod schema
  prompt: string | ReactNode;
  maxAttempts?: number; // default 10
  async?: boolean; // unrelated downstream may continue while pending
  skipIf?: boolean;
  timeoutMs?: number;
  continueOnFail?: boolean;
  dependsOn?: string[];
  needs?: Record<string, string>;
  label?: string;
  meta?: Record<string, unknown>;
  key?: string;
};
```

```tsx
import { Workflow, Sequence, Task, HumanTask, createSmithers } from "smithers-orchestrator";
import { z } from "zod";

const { smithers, outputs } = createSmithers({
  review: z.object({
    approved: z.boolean(),
    comments: z.string(),
    severity: z.enum(["low", "medium", "high"]),
  }),
  summary: z.object({ status: z.string() }),
});

export default smithers((ctx) => {
  const review = ctx.outputMaybe(outputs.review, { nodeId: "human-review" });
  return (
    <Workflow name="review-flow">
      <Sequence>
        <HumanTask
          id="human-review"
          output={outputs.review}
          prompt="Review the PR. Provide approved (boolean), comments (string), severity (low|medium|high)."
          maxAttempts={5}
          timeoutMs={86_400_000}
        />
        {review ? (
          <Task id="record" output={outputs.summary}>
            {{ status: review.approved ? "approved" : "changes-requested" }}
          </Task>
        ) : null}
      </Sequence>
    </Workflow>
  );
});
```

## Notes

- Submit via `bunx smithers-orchestrator human answer <requestId> --value '<json>'`. Find the requestId with `bunx smithers-orchestrator human inbox`. The `bunx smithers-orchestrator approve` command is for `<Approval>` components, not `<HumanTask>`.
- Failed JSON re-prompts up to `maxAttempts` with zero backoff.
- Same durable deferred mechanism as `<Approval>`; survives restarts.

---

## <Signal>

> Typed wrapper around &lt;WaitForEvent&gt;; signal name equals node id.

```ts
import { Signal } from "smithers-orchestrator";

type SignalProps = {
  id: string; // signal name and node id
  schema: z.ZodObject; // typed payload + output target
  correlationId?: string;
  timeoutMs?: number;
  onTimeout?: "fail" | "skip" | "continue"; // default "fail" (applied by the underlying <WaitForEvent>)
  async?: boolean;
  skipIf?: boolean;
  dependsOn?: string[];
  needs?: Record<string, string>;
  label?: string;
  meta?: Record<string, unknown>;
  key?: string;
  smithersContext?: React.Context<SmithersCtx | null>; // advanced: inject a custom Smithers context
  children?: (data) => ReactNode; // mounts only after payload arrives
};
```

```tsx
import { Signal, Task, Workflow, createSmithers } from "smithers-orchestrator";
import { z } from "zod";

const { smithers, outputs } = createSmithers({
  feedback: z.object({ rating: z.number(), comment: z.string() }),
  summary: z.object({ upper: z.string() }),
});

export default smithers(() => (
  <Workflow name="signal-demo">
    <Signal id="user-feedback" schema={outputs.feedback} async>
      {(feedback) => (
        <Task id="summarize" output={outputs.summary}>
          {{ upper: feedback.comment.toUpperCase() }}
        </Task>
      )}
    </Signal>
  </Workflow>
));
```

## Notes

- Renders `<WaitForEvent event={id} output={schema}>` internally.
- Async waits show in `smithers_external_wait_async_pending{kind="event"}`.

---

## <WaitForEvent>

> Durably suspend until a correlated external event arrives, or time out.

```ts
import { WaitForEvent } from "smithers-orchestrator";

type WaitForEventProps = {
  id: string;
  event: string;
  output: z.ZodObject | Table | string;
  correlationId?: string;
  outputSchema?: z.ZodObject;
  timeoutMs?: number;
  onTimeout?: "fail" | "skip" | "continue"; // default "fail"
  async?: boolean; // unrelated downstream may proceed while pending
  skipIf?: boolean;
  dependsOn?: string[];
  needs?: Record<string, string>;
  label?: string;
  meta?: Record<string, unknown>;
  key?: string;
};
```

```tsx
<Workflow name="deploy-watcher">
  <Sequence>
    <WaitForEvent
      id="wait-deploy"
      event="deploy.completed"
      correlationId={ctx.input.deployId}
      output={outputs.deployEvent}
      outputSchema={deployPayload}
      timeoutMs={600_000}
      onTimeout="fail"
    />
    <Task id="notify" output={outputs.summary} agent={notifier}>
      The deploy finished. Summarize the result.
    </Task>
  </Sequence>
</Workflow>
```

## Notes

- Push-based; for poll-based checks use `<Task>` with a compute function.
- With `async`, dependents via `dependsOn`/`needs` still block until payload arrives.
- Async waits are tracked by the `smithers_external_wait_async_pending{kind="event"}` gauge while pending (it rises on start, falls on completion).

---

## <Timer>

> Durably suspend for a relative duration or until an absolute time.

```ts
import { Timer } from "smithers-orchestrator";

type TimerProps = {
  id: string;
  duration?: string; // "500ms" | "30s" | "2h" | "7d"; exactly one of duration/until required
  until?: string | Date; // ISO 8601 string or Date
  every?: string; // reserved: recurring timers ship in phase 2; throws if set
  skipIf?: boolean;
  dependsOn?: string[];
  needs?: Record<string, string>;
  label?: string;
  meta?: Record<string, unknown>;
  key?: string;
};
```

```tsx
<Workflow name="delayed-report">
  <Sequence>
    <Timer id="cooldown" duration="30s" />
    <Task id="report" output={outputs.report} agent={reportAgent}>
      Generate the daily summary report.
    </Task>
  </Sequence>
</Workflow>
```

## Notes

- Exactly one of `duration` or `until` is required; both or neither throws at render time.
- Produces no output. Past `until` timestamps fire immediately.
- Worker restarts during the wait don't reset the timer.

---

## <Saga>

> Forward steps with compensations run in reverse on failure.

```ts
import { Saga } from "smithers-orchestrator";

type SagaStepDef = {
  id: string;
  action: ReactElement;
  compensation: ReactElement;
  label?: string;
};

type SagaProps = {
  id?: string;
  steps?: SagaStepDef[]; // takes priority over children
  onFailure?: "compensate" | "compensate-and-fail" | "fail"; // default "compensate"
  skipIf?: boolean;
  children?: React.ReactNode; // alternative: <Saga.Step> children
};
```

```tsx
<Workflow name="deploy-saga">
  <Saga
    id="deploy"
    steps={[
      {
        id: "create-pr",
        action: <Task id="create-pr" output={outputs.pr} agent={codeAgent}>Create a PR.</Task>,
        compensation: <Task id="close-pr" output={outputs.closePr} agent={codeAgent}>Close the PR.</Task>,
      },
      {
        id: "deploy-staging",
        action: <Task id="deploy-staging" output={outputs.staging} agent={deployAgent}>Deploy staging.</Task>,
        compensation: <Task id="rollback-staging" output={outputs.rollbackStaging} agent={deployAgent}>Rollback staging.</Task>,
      },
    ]}
  />
</Workflow>
```

## Notes

- Steps run sequentially; compensations run in reverse from the failed step.
- Compensations should be idempotent.
- `steps` and `<Saga.Step>` children are mutually exclusive; `steps` wins if both are given.

---

## <TryCatchFinally>

> Workflow-scoped error boundary with catch handlers and guaranteed cleanup.

```ts
import { TryCatchFinally } from "smithers-orchestrator";

type TryCatchFinallyProps = {
  id?: string;
  try: ReactElement;
  catch?: ReactElement | ((error: SmithersError) => ReactElement);
  catchErrors?: SmithersErrorCode[]; // restrict which codes trigger catch
  finally?: ReactElement; // always runs after try or catch
  skipIf?: boolean;
};
```

```tsx
<Workflow name="safe-deploy">
  <TryCatchFinally
    try={
      <Sequence>
        <Task id="build" output={outputs.build} agent={buildAgent}>Build the project.</Task>
        <Task id="deploy" output={outputs.deploy} agent={deployAgent}>Deploy to production.</Task>
      </Sequence>
    }
    catch={(error) => (
      <Task id="recover" output={outputs.recover} agent={recoveryAgent}>
        {`Recover from ${error.code}: ${error.summary}`}
      </Task>
    )}
    finally={
      <Task id="cleanup" output={outputs.cleanup}>{{ cleanedUp: true }}</Task>
    }
  />
</Workflow>
```

## Notes

- `try` takes one `ReactElement`; wrap multiples in `<Sequence>` or `<Parallel>`.
- `finally` runs even if `catch` fails.
- Unmatched `catchErrors` propagate to outer boundaries.

---

## <Sandbox>

> Run a child workflow through an injectable sandbox provider and collect outputs, artifacts, and reviewed file changes.

`<Sandbox>` is a task boundary for work that should execute outside the parent task process. The public component is provider-first: pass an injectable provider object or a registered provider id. `runtime` remains only for the built-in legacy local transports.

```ts
import { Sandbox } from "smithers-orchestrator";
import type { SandboxProvider } from "smithers-orchestrator/sandbox";

type SandboxProps = {
  id: string;
  output: ZodObject | DrizzleTable | string;
  workflow?: SmithersWorkflow<unknown>;
  input?: unknown;

  provider?: SandboxProvider | string;
  runtime?: "bubblewrap" | "docker" | "codeplane"; // legacy local transports

  allowNetwork?: boolean;
  reviewDiffs?: boolean; // default true
  autoAcceptDiffs?: boolean; // default false
  allowNested?: boolean; // default false

  image?: string;
  env?: Record<string, string>;
  ports?: Array<{ host: number; container: number }>;
  volumes?: Array<{ host: string; container: string; readonly?: boolean }>;
  memoryLimit?: string;
  cpuLimit?: string;
  command?: string;
  workspace?: {
    name: string;
    snapshotId?: string;
    idleTimeoutSecs?: number;
    persistence?: "ephemeral" | "sticky";
  };
};
```

## Basic usage

```tsx
const provider = {
  id: "remote-vm",
  async run(request) {
    const remote = await createRemoteVm({
      input: request.input,
      requestBundlePath: request.requestBundlePath,
    });

    return {
      status: "finished",
      output: await remote.readJson("/workspace/smithers-result.json"),
      remoteRunId: remote.id,
      workspaceId: remote.workspaceId,
      diffBundle: await remote.diffBundle(),
    };
  },
};

<Workflow name="remote-codegen">
  <Sandbox
    id="generate"
    provider={provider}
    workflow={generateCodeWorkflow}
    input={{ prompt: ctx.input.prompt }}
    output={outputs.result}
    allowNetwork={false}
    reviewDiffs
  />
</Workflow>
```

## Execution model

1. Smithers renders `<Sandbox>` as one scheduler task. Children do not become parent-run tasks.
2. At execution time Smithers writes a request bundle under `.smithers/sandboxes/<run>/<sandbox>/request-bundle`.
3. A provider receives the request, runs work remotely, and returns either a local bundle path or a structured result.
4. Smithers validates the result bundle, records sandbox lifecycle events, enforces diff review policy, applies accepted `diffBundle`s, and returns `outputs` to the parent task output table.

The provider contract receives the child workflow, input, root directory, request/result bundle paths, limits, abort signal, and a heartbeat callback:

```ts
type SandboxProvider = {
  id: string;
  run(request: SandboxProviderRequest): Promise<SandboxProviderResult> | SandboxProviderResult;
  cleanup?(request: SandboxProviderRequest): Promise<void> | void;
};
```

Register reusable providers when a workflow should reference them by id:

```ts
import { registerSandboxProvider } from "smithers-orchestrator/sandbox";

const unregister = registerSandboxProvider(provider);

<Sandbox id="generate" provider="remote-vm" workflow={child} output={outputs.result} />;
```

## Result bundles

A provider can return a path to a bundle it created:

```ts
return {
  bundlePath: "/tmp/smithers-result-bundle",
  remoteRunId: vmId,
  workspaceId: vmId,
};
```

Or it can return a structured result and let Smithers materialize the bundle locally:

```ts
return {
  status: "finished",
  output: { summary: "done" },
  runId: vmId,
  diffBundle: {
    seq: 1,
    baseRef: "HEAD",
    patches: [
      {
        path: "src/app.ts",
        operation: "modify",
        diff: "diff --git a/src/app.ts b/src/app.ts\n...",
      },
    ],
  },
};
```

Bundle limits are enforced before the result is accepted: 100 MB total, 5 MB manifest file (README.md), 1,000 patch files, bounded JSON output, and no path traversal or symlinks in bundle paths.

## Diff review

`reviewDiffs` defaults to `true`. If the sandbox returns patch files or a `diffBundle`, Smithers records `SandboxDiffReviewRequested`.

When `autoAcceptDiffs` is false, changed bundles fail closed until a review path accepts them. When `autoAcceptDiffs` is true, or `reviewDiffs` is false, Smithers applies `diffBundle` through the engine diff-bundle applier. Legacy patch files are still collected and review-gated, but the apply path is `diffBundle`.

## Nested sandboxes

Nested sandbox execution is disabled by default. A sandbox running inside another sandbox must set `allowNested`.

Use nesting only when the provider and diff policy are designed for it. The hard cases are:

- Diff base conflicts: an inner sandbox can generate a `diffBundle` against a different base than the outer sandbox.
- Cleanup ordering: an outer provider cleanup can delete the workspace before the inner provider finishes.
- Quotas and concurrency: nested remote VMs can multiply resource usage quickly.
- Network and secrets: inherited remote credentials may be broader than intended.
- Event lineage: parent run, outer sandbox run, and inner sandbox run need clear ids for debugging.

For most workflows, use sibling sandboxes under a `Parallel` or `MergeQueue` instead of nesting.

## Built-in local transports

When `provider` is omitted, Smithers uses the legacy local transport path. `runtime` may be `"bubblewrap"`, `"docker"`, or `"codeplane"`. If `runtime` is omitted, the local path defaults to `"bubblewrap"`.

Unknown runtimes now fail closed. Docker is not silently replaced by bubblewrap when Docker is unavailable.

## Freestyle provider example

[Freestyle VMs](https://docs.freestyle.sh/v2/vms) are full sandboxes with nested virtualization, full networking, and the ability to scale to more resources than alternatives. Use Freestyle VMs when you want to give your agents a real computer rather than a code runner.

See `examples/freestyle/` for a provider adapter that creates a Freestyle VM, ships a request with `additionalFiles`, executes a command with `vm.exec()`, reads `smithers-result.json`, and returns a Smithers sandbox result.

Freestyle's current VM docs show the stable package as `freestyle`, VM creation through `freestyle.vms.create()`, support for `additionalFiles`, `gitRepos`, `workdir`, `idleTimeoutSeconds`, and command execution with `vm.exec()`. Relevant Freestyle docs:

- https://docs.freestyle.sh/v2/vms
- https://docs.freestyle.sh/v2/vms/configuration
- https://docs.freestyle.sh/v2/vms/configuration/files-and-repos
- https://docs.freestyle.sh/v2/vms/lifecycle

---

## <Subflow>

> Invoke a child workflow with its own retry, cache, resume boundary.

```ts
import { Subflow } from "smithers-orchestrator";

type SubflowProps = {
  id: string;
  workflow: SmithersWorkflow<unknown>;
  output: OutputTarget;
  input?: unknown;
  mode?: "childRun" | "inline"; // default "childRun"
  skipIf?: boolean;
  timeoutMs?: number;
  heartbeatTimeoutMs?: number;
  heartbeatTimeout?: number;
  retries?: number;
  retryPolicy?: RetryPolicy;
  continueOnFail?: boolean;
  cache?: CachePolicy;
  dependsOn?: string[];
  needs?: Record<string, string>;
  label?: string;
  meta?: Record<string, unknown>;
  key?: string;
  children?: React.ReactNode;
};
```

```tsx
<Workflow name="parent-flow">
  <Sequence>
    <Subflow
      id="run-child"
      workflow={childWorkflow}
      input={{ repo: "acme/app" }}
      output={outputs.childResult}
      retries={2}
      timeoutMs={300_000}
    />
    <Task id="summarize" output={outputs.finalResult} agent={summarizer}>
      Summarize the child workflow result.
    </Task>
  </Sequence>
</Workflow>
```

## Notes

- `childRun` (default) gives the child its own DB row; retry/cache/resume scope it as a unit.
- `inline` renders the child tree as siblings in the parent plan, sharing its scope.
- Subflows compose; children may contain `<Subflow>` themselves.

---

## <ContinueAsNew>

> Close the current run; start a fresh one with optional carried state.

```ts
import { ContinueAsNew, continueAsNew } from "smithers-orchestrator";

type ContinueAsNewProps = {
  state?: unknown; // JSON-serializable; arrives as ctx.input.__smithersContinuation.payload
};

// continueAsNew(state?) is a helper equivalent to <ContinueAsNew state={state} />
```

```tsx
export default smithers((ctx) => {
  const continuation = (ctx.input as any)?.__smithersContinuation as
    | { payload?: { cursor?: string; count?: number } }
    | undefined;

  const cursor = continuation?.payload?.cursor ?? null;
  const count = continuation?.payload?.count ?? 0;

  return (
    <Workflow name="paginated-processor">
      <Sequence>
        <Task id="process-batch" output={outputs.processed} agent={processorAgent}>
          {`Process next page. Cursor: ${cursor ?? "start"}. Total so far: ${count}.`}
        </Task>
        <ContinueAsNew
          state={{
            cursor: ctx.outputMaybe(outputs.processed, { nodeId: "process-batch" })?.lastCursor,
            count: count + 1,
          }}
        />
      </Sequence>
    </Workflow>
  );
});
```

## Notes

- `state` must be JSON-serializable; total continuation envelope < 10 MB.
- Workflow id is preserved across continuations; only run id increments.
- Nodes rendered after `<ContinueAsNew>` in the same sequence don't execute.

---

## <SuperSmithers>

> Reads and modifies source code at runtime via a markdown strategy doc.

```ts
import { SuperSmithers } from "smithers-orchestrator";

type SuperSmithersProps = {
  strategy: string | ReactElement; // markdown or MDX strategy document
  agent: AgentLike;
  id?: string; // default "super-smithers"; prefixes internal task ids
  targetFiles?: string[]; // glob patterns scoped to the agent prompt
  reportOutput?: OutputTarget;
  dryRun?: boolean; // default false; skips the apply step
  skipIf?: boolean;
};
```

```tsx
<Workflow name="intervention">
  <SuperSmithers
    id="refactor"
    strategy={`
      ## Refactoring Strategy
      1. Find all deprecated API calls in the target files
      2. Replace them with the new API equivalents
      3. Ensure all imports are updated
    `}
    agent={codeAgent}
    targetFiles={["src/**/*.ts"]}
    reportOutput={outputs.report}
  />
</Workflow>
```

## Notes

- Expands to read → propose → apply → report (`dryRun` skips apply).
- Only meaningful in hot-reload mode; otherwise changes apply on the next run.
- All internal tasks share `agent`; for per-stage agents compose `<Task>` manually.

---

## <Aspects>

> Propagate token, latency, and cost budgets to descendant tasks.

```ts
import { Aspects } from "smithers-orchestrator";

type TokenBudgetConfig = {
  max: number;
  perTask?: number;
  onExceeded?: "fail" | "warn" | "skip-remaining"; // default "fail"
};

type LatencySloConfig = {
  maxMs: number;
  perTask?: number;
  onExceeded?: "fail" | "warn"; // default "fail"
};

type CostBudgetConfig = {
  maxUsd: number;
  onExceeded?: "fail" | "warn" | "skip-remaining"; // default "fail"
};

type TrackingConfig = { tokens?: boolean; latency?: boolean; cost?: boolean };

type AspectsProps = {
  tokenBudget?: TokenBudgetConfig;
  latencySlo?: LatencySloConfig;
  costBudget?: CostBudgetConfig;
  tracking?: TrackingConfig; // default all true
  children?: ReactNode;
};
```

```tsx
<Workflow name="budgeted-workflow">
  <Aspects
    tokenBudget={{ max: 100_000, perTask: 25_000, onExceeded: "warn" }}
    latencySlo={{ maxMs: 30_000, onExceeded: "fail" }}
    costBudget={{ maxUsd: 5.0, onExceeded: "skip-remaining" }}
  >
    <Task id="analyze" output={outputs.analysis} agent={codeAgent}>
      Analyze the repository.
    </Task>
    <Task id="review" output={outputs.review} agent={reviewAgent}>
      Review the analysis.
    </Task>
  </Aspects>
</Workflow>
```

## Notes

- Nested `<Aspects>` inherit outer budget configs wholesale. If an inner `<Aspects>` sets `tokenBudget`, `latencySlo`, or `costBudget`, that field's entire parent config object is replaced (no per-field merge). The `tracking` config is the exception: `tokens`, `latency`, and `cost` each fall back to the parent value independently.
- Budgets enforce regardless of `tracking`; `tracking` controls metric emission only.
- Accumulator is per-run and resets on resume.

---

## <Worktree>

> Run a subtree in an isolated worktree (git or jj) rooted at `path`.

```ts
import { Worktree } from "smithers-orchestrator";

type WorktreeProps = {
  key?: string;
  path: string; // required, non-empty; resolved against baseRootDir or cwd if relative
  id?: string;
  branch?: string; // omit to use current branch
  baseBranch?: string; // default "main"
  skipIf?: boolean;
  children?: ReactNode;
};
```

```tsx
<Worktree path="/tmp/smithers/wt-a" baseBranch="main">
  <Task id="build" output={outputs.outputC}>{{ value: 1 }}</Task>
  <Task id="test" output={outputs.outputC}>{{ value: 2 }}</Task>
  <MergeQueue>
    <Task id="apply" output={outputs.outputC}>{{ value: 3 }}</Task>
  </MergeQueue>
  <Parallel maxConcurrency={2}>
    <Task id="lint" output={outputs.outputC}>{{ value: 4 }}</Task>
  </Parallel>
</Worktree>
```

## Notes

- Descendants inherit `worktreeId` and absolute `worktreePath` as `cwd`.
- Innermost `<Worktree>` wins when nested; duplicate ids are rejected.
- Empty/whitespace `path` is rejected at render time.

---

## <ReviewLoop>

> Produce, review, and fix in a loop until the reviewer approves.

```ts
// Props
import { ReviewLoop } from "smithers-orchestrator";

type ReviewLoopProps = {
  id?: string; // default "review-loop"; task ids derived as {id}-produce, {id}-review
  producer: AgentLike;
  reviewer: AgentLike | AgentLike[];
  produceOutput: OutputTarget;
  reviewOutput: OutputTarget; // must include `approved: boolean`
  maxIterations?: number; // default 5
  onMaxReached?: "return-last" | "fail"; // default "return-last"
  skipIf?: boolean;
  children: string | ReactNode; // initial producer prompt
};
```

```tsx
export default smithers(() => (
  <Workflow name="code-review">
    <ReviewLoop
      producer={coder}
      reviewer={reviewer}
      produceOutput={outputs.code}
      reviewOutput={outputs.review}
      maxIterations={3}
    >
      Implement a REST API for user authentication with JWT tokens.
    </ReviewLoop>
  </Workflow>
));
```

## Notes

- The runtime reads `approved` each frame to decide whether to loop.
- On subsequent iterations the producer is re-run, but the loop does not wire reviewer output into the producer via a `needs` dependency. Any feedback the producer sees comes from the runtime/agent conversation history carrying prior outputs, not from the component's declared data flow.

---

## <Optimizer>

> Generate, evaluate, and improve in a loop until a target score is reached.

```ts
// Props
import { Optimizer } from "smithers-orchestrator";

type OptimizerProps = {
  id?: string; // default "optimizer"; task ids {id}-generate, {id}-evaluate
  generator: AgentLike;
  evaluator: AgentLike | ((candidate: unknown) => unknown | Promise<unknown>); // function = compute task
  generateOutput: OutputTarget;
  evaluateOutput: OutputTarget; // must include `score: number`
  targetScore?: number; // omit to run all iterations
  maxIterations?: number; // default 10
  onMaxReached?: "return-last" | "fail"; // default "return-last"
  skipIf?: boolean;
  children: string | ReactNode; // initial generation prompt
};
```

```tsx
export default smithers(() => (
  <Workflow name="prompt-optimizer">
    <Optimizer
      generator={promptEngineer}
      evaluator={evaluator}
      generateOutput={outputs.prompt}
      evaluateOutput={outputs.evaluation}
      targetScore={90}
      maxIterations={5}
    >
      Generate a prompt for summarizing legal documents.
    </Optimizer>
  </Workflow>
));
```

## Notes

- `score` drives convergence against `targetScore`.
- A function `evaluator` renders as a compute task rather than an agent task.

---

## <ContentPipeline>

> Typed waterfall of refinement stages, each depending on the previous.

```ts
// Props
import { ContentPipeline } from "smithers-orchestrator";

type ContentPipelineProps = {
  id?: string;
  stages: ContentPipelineStage[];
  skipIf?: boolean;
  children: string | ReactNode; // initial prompt for stage[0]
};

type ContentPipelineStage = {
  id: string;
  agent: AgentLike;
  output: OutputTarget;
  label?: string;
};
```

```tsx
export default smithers(() => (
  <Workflow name="blog-pipeline">
    <ContentPipeline
      stages={[
        { id: "outline", agent: outliner, output: outputs.outline, label: "Create outline" },
        { id: "draft", agent: writer, output: outputs.draft, label: "Write draft" },
        { id: "edit", agent: editor, output: outputs.edited, label: "Edit and polish" },
      ]}
    >
      Write a blog post about building AI workflows with React components.
    </ContentPipeline>
  </Workflow>
));
```

## Notes

- Each stage after the first depends on the previous via `needs`.
- Stage `id` values must be unique within the workflow.

---

## <DriftDetector>

> Capture state, compare against a baseline, and alert on meaningful drift.

```ts
// Props
import { DriftDetector } from "smithers-orchestrator";

type DriftDetectorProps = {
  id?: string; // default "drift"; ids {id}-capture, {id}-compare
  captureAgent: AgentLike;
  compareAgent: AgentLike;
  captureOutput: OutputTarget;
  compareOutput: OutputTarget; // include `drifted: boolean`
  baseline: unknown;
  alertIf?: (comparison: any) => boolean; // reserved: not yet evaluated at runtime
  alert?: ReactElement;
  poll?: { intervalMs: number; maxPolls?: number }; // default maxPolls = 100; intervalMs is reserved, not yet passed to the Loop
  skipIf?: boolean;
};
```

```tsx
<Workflow name="api-drift-check">
  <DriftDetector
    captureAgent={snapshotAgent}
    compareAgent={diffAgent}
    captureOutput={outputs.capture}
    compareOutput={outputs.compare}
    baseline={{ endpoints: ["/users", "/orders"], schemaHash: "abc123" }}
    alert={
      <Task id="notify" output={outputs.notify} agent={slackAgent}>
        API drift detected. Notify the team.
      </Task>
    }
  />
</Workflow>
```

## Notes

- Without `poll`, the component runs once; with `poll`, it wraps in a Loop.
- Without `alert`, the component compares but takes no action on drift.

---

## <ScanFixVerify>

> Scan for problems, fix in parallel, verify, then report in a retry loop.

```ts
// Props
import { ScanFixVerify } from "smithers-orchestrator";

type ScanFixVerifyProps = {
  id?: string; // default "sfv"
  scanner: AgentLike;
  fixer: AgentLike | AgentLike[]; // array cycles across issues
  verifier: AgentLike;
  scanOutput: OutputTarget; // include `issues: Array`
  fixOutput: OutputTarget;
  verifyOutput: OutputTarget;
  reportOutput: OutputTarget;
  maxConcurrency?: number; // omit for no per-group cap (bounded only by the run-level concurrency limit, default 4)
  maxRetries?: number; // default 3
  skipIf?: boolean;
  children?: ReactNode; // scan prompt
};
```

```tsx
<Workflow name="lint-fix">
  <ScanFixVerify
    scanner={lintAgent}
    fixer={fixerAgent}
    verifier={verifyAgent}
    scanOutput={outputs.scan}
    fixOutput={outputs.fix}
    verifyOutput={outputs.verify}
    reportOutput={outputs.report}
    maxRetries={5}
    maxConcurrency={3}
  >
    Scan the codebase for linting errors and type issues.
  </ScanFixVerify>
</Workflow>
```

## Notes

- The loop runs up to `maxRetries` scan-fix-verify cycles. Early exit based on verifier output is not currently wired; the loop always completes all iterations (then ends via `onMaxReached: return-last`).
- The report task always runs, even when retries are exhausted.

---

## <Poller>

> Poll an external condition with configurable backoff until satisfied or timed out.

```ts
// Props
import { Poller } from "smithers-orchestrator";

type PollerProps = {
  id?: string; // default "poll"
  check: AgentLike | (() => Promise<unknown> | unknown);
  checkOutput: OutputTarget; // must include `satisfied: boolean`
  maxAttempts?: number; // default 30
  backoff?: "fixed" | "linear" | "exponential"; // default "fixed"
  intervalMs?: number; // default 5000
  onTimeout?: "fail" | "return-last"; // default "fail"
  skipIf?: boolean;
  children?: ReactNode; // condition description
};
```

```tsx
<Workflow name="wait-for-deploy">
  <Poller
    check={statusChecker}
    checkOutput={outputs.check}
    maxAttempts={20}
    intervalMs={10_000}
    backoff="exponential"
    onTimeout="fail"
  >
    Check whether the deployment to production has completed successfully.
  </Poller>
</Workflow>
```

## Notes

- `satisfied` drives the loop's `until`.
- Backoff: fixed = `intervalMs`; linear = `intervalMs * (N+1)`; exponential = `intervalMs * 2^N`.

---

## <Runbook>

> Sequential steps with risk classification; safe auto-runs, risky/critical gate on approval.

```ts
// Props
import { Runbook } from "smithers-orchestrator";

type RunbookProps = {
  id?: string; // used as step-id prefix; defaults to "runbook" when omitted
  steps: RunbookStep[];
  defaultAgent?: AgentLike;
  stepOutput: OutputTarget;
  approvalRequest?: Partial<ApprovalRequest>;
  onDeny?: "fail" | "skip"; // default "fail"
  skipIf?: boolean;
};

type RunbookStep = {
  id: string;
  agent?: AgentLike;
  command?: string;
  risk: "safe" | "risky" | "critical"; // critical adds `elevated: true` to approval meta
  label?: string;
  output?: OutputTarget;
};
```

```tsx
export default smithers(() => (
  <Workflow name="deploy-runbook">
    <Runbook
      defaultAgent={ops}
      stepOutput={outputs.stepResult}
      steps={[
        { id: "health-check", command: "curl -f https://api.example.com/health", risk: "safe" },
        { id: "backup-db", command: "pg_dump prod > backup.sql", risk: "risky" },
        { id: "run-migration", command: "npx prisma migrate deploy", risk: "critical" },
        { id: "smoke-test", command: "npm run test:smoke", risk: "safe" },
      ]}
    />
  </Workflow>
));
```

## Notes

- Each step depends on the previous via `needs`; execution order is guaranteed.
- Critical steps set `elevated: true` in approval metadata for stronger auth UIs.
- Approval output is stored at `{prefix}-{step.id}-approval-decision`.

---

## <Supervisor>

> Boss plans, workers run in parallel, boss reviews and re-delegates failures.

```ts
// Props
import { Supervisor } from "smithers-orchestrator";

type SupervisorProps = {
  id?: string;                                  // default: "supervisor"
  boss: AgentLike;
  workers: Record<string, AgentLike>;           // { coder, tester, ... }
  planOutput: OutputTarget;                     // { tasks: [{ id, workerType, instructions }] }
  workerOutput: OutputTarget;
  reviewOutput: OutputTarget;                   // { allDone: boolean, retriable: string[] }
  finalOutput: OutputTarget;
  maxIterations?: number;                       // default: 3
  maxConcurrency?: number;                      // default: 5
  useWorktrees?: boolean;                       // default: false
  skipIf?: boolean;
  children: string | ReactNode;                 // goal/prompt for the boss
};
```

```tsx
export default smithers(() => (
  <Workflow name="build-feature">
    <Supervisor
      boss={boss}
      workers={{ coder, tester }}
      planOutput={outputs.plan}
      workerOutput={outputs.workerResult}
      reviewOutput={outputs.review}
      finalOutput={outputs.final}
      maxIterations={3}
      maxConcurrency={4}
    >
      Build the user authentication module with tests.
    </Supervisor>
  </Workflow>
));
```

## Notes

- Generated node ids: `{id}-plan`, `{id}-loop`, `{id}-worker-{type}`, `{id}-review`, `{id}-final`.
- Workers run with `continueOnFail`; a single failure does not abort the cycle.
- With `useWorktrees`, each worker runs in `.worktrees/{prefix}-worker-{type}` on branch `worker/{prefix}-worker-{type}`.

---

## <MergeQueue>

> Queue child tasks so at most maxConcurrency run; defaults to single-lane.

```ts
// Props
import { MergeQueue } from "smithers-orchestrator";

type MergeQueueProps = {
  id?: string;
  maxConcurrency?: number;  // default: 1
  skipIf?: boolean;
  children?: ReactNode;
};
```

```tsx
<MergeQueue maxConcurrency={2}>
  {items.map((it, i) => (
    <Task key={i} id={`t${i}`} output={outputs.outputC}>{{ value: i }}</Task>
  ))}
</MergeQueue>
```

## Notes

- Innermost group determines the effective cap for its descendants.
- Tasks outside the queue are unaffected by its limit.

---

## <CheckSuite>

> Parallel checks with auto-aggregated pass/fail verdict.

```ts
// Props
import { CheckSuite } from "smithers-orchestrator";

type CheckConfig = { id: string; agent?: AgentLike; command?: string; label?: string };

type CheckSuiteProps = {
  id?: string;                                              // default: "checksuite"
  checks: CheckConfig[] | Record<string, Omit<CheckConfig, "id">>;
  verdictOutput: OutputTarget;
  strategy?: "all-pass" | "majority" | "any-pass";          // default: "all-pass"
  maxConcurrency?: number;                                  // default: Infinity
  continueOnFail?: boolean;                                 // default: true
  skipIf?: boolean;
};
```

```tsx
<Workflow name="ci-checks">
  <CheckSuite
    checks={[
      { id: "lint", agent: lintAgent, label: "ESLint" },
      { id: "typecheck", agent: typecheckAgent, label: "TypeScript" },
      { id: "test", agent: testAgent, label: "Unit Tests" },
    ]}
    verdictOutput={outputs.verdict}
    strategy="all-pass"
  />
</Workflow>
```

## Notes

- Check task ids are `{prefix}-{checkId}`; verdict is `{prefix}-verdict`.
- `strategy` is evaluated in pure code: `all-pass` requires every check to pass, `majority` requires more than half (`passCount*2 > total`), and `any-pass` requires at least one to pass.
- Use `command` instead of `agent` for shell-based checks.

---

## <ClassifyAndRoute>

> Classify items into categories, then route each to a category-specific agent in parallel.

```ts
// Props
import { ClassifyAndRoute } from "smithers-orchestrator";

type CategoryConfig = {
  agent: AgentLike;
  output?: OutputTarget;
  prompt?: (item: unknown) => string;
};

type ClassifyAndRouteProps = {
  id?: string;                                                       // prefix for auto-generated child task IDs; defaults to "classify-and-route"
  items: unknown | unknown[];
  categories: Record<string, AgentLike | CategoryConfig>;
  classifierAgent: AgentLike;
  classifierOutput: OutputTarget;
  routeOutput: OutputTarget;
  classificationResult?: { classifications: Array<{ category: string; itemId?: string }> } | null;
  maxConcurrency?: number;                                           // optional; unbounded when omitted
  skipIf?: boolean;
  children?: ReactNode;                                              // custom classifier prompt
};
```

```tsx
const classification = ctx.outputMaybe(outputs.classification, {
  nodeId: "classify-and-route-classify",
});

<Workflow name="support-router">
  <ClassifyAndRoute
    items={ctx.input.tickets}
    categories={{ billing: billingAgent, support: supportAgent, sales: salesAgent }}
    classifierAgent={classifierAgent}
    classifierOutput={outputs.classification}
    routeOutput={outputs.handled}
    classificationResult={classification}
  />
</Workflow>;
```

## Notes

- Two-phase: first render classifies; pass the result back via `classificationResult` to mount route handlers.
- Each entry's `category` must match a key in `categories`; unknown categories are silently skipped.
- Route tasks default to `continueOnFail`.

---

## <GatherAndSynthesize>

> Parallel gather from multiple sources, then synthesize into a unified result.

```ts
// Props
import { GatherAndSynthesize } from "smithers-orchestrator";

type SourceDef = {
  agent: AgentLike;
  prompt?: string;        // optional; defaults to a generated gather prompt
  output?: OutputTarget;
  children?: ReactNode;        // overrides prompt
};

type GatherAndSynthesizeProps = {
  id?: string;                                       // default: "gather-and-synthesize"
  sources: Record<string, SourceDef>;
  synthesizer: AgentLike;
  gatherOutput: OutputTarget;
  synthesisOutput: OutputTarget;
  gatheredResults?: Record<string, unknown> | null;  // typically from ctx.outputMaybe()
  maxConcurrency?: number;                           // default: Infinity
  synthesisPrompt?: string;
  skipIf?: boolean;
  children?: ReactNode;                              // overrides synthesisPrompt
};
```

```tsx
<Workflow name="research">
  <GatherAndSynthesize
    sources={{
      docs: { agent: docsAgent, prompt: "Search the documentation." },
      code: { agent: codeAgent, prompt: "Analyze the codebase." },
      issues: { agent: issueAgent, prompt: "Review open issues." },
    }}
    synthesizer={synthesisAgent}
    gatherOutput={outputs.gathered}
    synthesisOutput={outputs.synthesis}
    gatheredResults={gathered}
  />
</Workflow>
```

## Notes

- Synthesis task auto-receives `needs` for every source, gating it on all gathers.
- Source `children` takes priority over `prompt`.
- When `gatheredResults` is provided, it is folded into the default synthesis prompt.

---

## <Panel>

> Parallel specialist agents review the same input; a moderator synthesizes results.

```ts
// Props
import { Panel } from "smithers-orchestrator";

type PanelistConfig = { agent: AgentLike; role?: string; label?: string };

type PanelProps = {
  id?: string;                                           // default: "panel"
  panelists: PanelistConfig[] | AgentLike[];
  moderator: AgentLike;
  panelistOutput: OutputTarget;
  moderatorOutput: OutputTarget;
  strategy?: "synthesize" | "vote" | "consensus";        // default: "synthesize"
  minAgree?: number;                                     // for "vote" / "consensus"
  maxConcurrency?: number;                               // default: Infinity
  skipIf?: boolean;
  children: string | ReactNode;                          // prompt sent to every panelist
};
```

```tsx
<Workflow name="code-review-panel">
  <Panel
    panelists={[
      { agent: securityAgent, role: "Security Reviewer" },
      { agent: qualityAgent, role: "Code Quality Reviewer" },
      { agent: architectureAgent, role: "Architecture Reviewer" },
    ]}
    moderator={moderatorAgent}
    panelistOutput={outputs.review}
    moderatorOutput={outputs.synthesis}
  >
    Review the changes in src/auth/ for security, quality, and architecture concerns.
  </Panel>
</Workflow>
```

## Notes

- Panelist task ids: `{prefix}-{label|role|panelist-N}`; moderator is `{prefix}-moderator`.
- `strategy` and `minAgree` are passed as prompt context to the moderator, which interprets them.
- All panelists write to the same `panelistOutput` schema, differentiated by task id.

---

## <Debate>

> Adversarial multi-round debate between proposer and opponent, then judge verdict.

```ts
// Props
import { Debate } from "smithers-orchestrator";

type DebateProps = {
  id?: string;                       // default: "debate"
  proposer: AgentLike;               // arguing FOR
  opponent: AgentLike;               // arguing AGAINST
  judge: AgentLike;                  // renders final verdict
  rounds?: number;                   // default: 2
  argumentOutput: OutputTarget;
  verdictOutput: OutputTarget;
  topic: string | ReactNode;
  skipIf?: boolean;
};
```

```tsx
<Workflow name="architecture-debate">
  <Debate
    proposer={monolithAdvocate}
    opponent={microservicesAdvocate}
    judge={architectureJudge}
    rounds={3}
    argumentOutput={outputs.argument}
    verdictOutput={outputs.verdict}
    topic="Should we migrate from a monolith to microservices for the payments system?"
  />
</Workflow>
```

## Notes

- Task ids: `{prefix}-proposer`, `{prefix}-opponent`, `{prefix}-judge`, loop `{prefix}-loop`.
- Loop runs exactly `rounds` iterations with `onMaxReached="return-last"`.
- Proposer and opponent share `argumentOutput`, differentiated by task id.

---

## <Kanban>

> Process items through ordered columns with a pluggable ticket source.

```ts
// Props
import { Kanban } from "smithers-orchestrator";

type ColumnDef = {
  name: string;
  agent: AgentLike;
  output: OutputTarget;
  prompt?: (ctx: { item: unknown; column: string }) => string;
  task?: Omit<Partial<TaskProps>, "agent" | "children" | "id" | "key" | "output" | "smithersContext">;  // retries, timeoutMs, etc.
};

type KanbanProps = {
  id?: string;                                       // default: "kanban"
  columns: ColumnDef[];
  useTickets: () => Array<{ id: string; [key: string]: unknown }>;
  agents?: Record<string, AgentLike>;                // overrides column-level agents
  maxConcurrency?: number;                           // default: unlimited (no cap), per column
  onComplete?: OutputTarget;
  until?: boolean;                                   // default: false
  maxIterations?: number;                            // default: 5
  skipIf?: boolean;
  children?: ReactNode | Record<string, unknown>;    // content for onComplete task
};
```

```tsx
const columns = [
  { name: "triage", agent: triageAgent, output: outputs.triage },
  { name: "work", agent: workerAgent, output: outputs.work },
  { name: "review", agent: reviewAgent, output: outputs.review },
];

<Workflow name="ticket-board">
  <Kanban
    columns={columns}
    useTickets={() => tickets}
    until={allDone}
    maxIterations={3}
  />
</Workflow>;
```

## Notes

- Item tasks default to `continueOnFail={true}`; use `column.task` to add retries or override.
- `useTickets` is called at render time; return different items per iteration for dynamic sources.
- Use `until` with `ctx.outputMaybe()` to exit when all items reach the final column.

---

## Recipes

> Tight, reusable patterns. Copy, paste, adapt.

Each recipe is a working snippet plus one line of context. They compose freely.

If you want a ready-to-run outcome before writing workflow code, start with Starters or run `bunx smithers-orchestrator starters`.

## Implement → review loop

Iterate until a reviewer signs off, with a hard cap.

```tsx
<Loop
  until={ctx.outputMaybe(outputs.review, { nodeId: "review" })?.approved === true}
  maxIterations={5}
  onMaxReached="return-last"
>
  <Sequence>
    <Task id="implement" output={outputs.impl} agent={implementer}>
      {`${ctx.input.task}\nPrior review: ${ctx.latest(outputs.review, "review")?.feedback ?? "none"}`}
    </Task>
    <Task id="review" output={outputs.review} agent={reviewer}>
      {`Review the latest implementation. Return { approved, feedback }.`}
    </Task>
  </Sequence>
</Loop>
```

Stop conditions must be measurable (boolean, count, array length). Avoid "looks good" prompts; agents are literal.

## Parallel multi-agent review

Two models catch different bugs. Cost = the slower model's latency.

```tsx
<Parallel>
  <Task id={`${ticket.id}:review-claude`} output={outputs.review} agent={claude} continueOnFail>
    <ReviewPrompt reviewer="claude" />
  </Task>
  <Task id={`${ticket.id}:review-codex`} output={outputs.review} agent={codex} continueOnFail>
    <ReviewPrompt reviewer="codex" />
  </Task>
</Parallel>
```

`continueOnFail` keeps one model's timeout from blocking the other.

## Approval gate with branching

Decision data drives the next branch.

```tsx
<Approval
  id="ship-decision"
  output={outputs.shipDecision}
  request={{ title: `Ship v${ctx.input.version}?`, summary: testReport }}
  onDeny="continue"
/>

{ctx.outputMaybe(outputs.shipDecision, { nodeId: "ship-decision" })?.approved
  ? <Task id="release" .../>
  : <Task id="rollback" .../>}
```

`onDeny`: `"fail"` aborts, `"continue"` proceeds without the gated branch, `"skip"` skips the gated tasks.

## Retry policy & timeouts

```tsx
<Task
  id="api-call"
  agent={agent}
  retries={3}
  retryPolicy={{ backoff: "exponential", initialDelayMs: 1000 }}
  timeoutMs={30_000}
>
  Call external API.
</Task>
```

Defaults to fit the work: simple tasks 30–60s + 1–2 retries, tool-heavy 2–5m + 1–2, large generations 5–10m + 0–1. Exponential backoff for rate-limited APIs.

## Optional, non-blocking step

```tsx
<Task id="lint" output={outputs.lint} agent={linter} continueOnFail>
  Run lint checks. Pipeline continues if this fails.
</Task>
```

Use for nice-to-have telemetry, lint, optional analysis.

## Conditional branch on output

```tsx
const analysis = ctx.outputMaybe(outputs.analysis, { nodeId: "analyze" });

{analysis?.risk === "high" ? (
  <Task id="escalate" output={outputs.escalation} agent={escalator}>
    {`Critical: ${analysis.summary}`}
  </Task>
) : null}
```

`ctx.outputMaybe` for control flow.

## Dynamic ticket discovery

Discover work, run each ticket, re-render to catch the next batch. Scales to large projects.

```tsx
export default smithers((ctx) => {
  const discover = ctx.latest(outputs.discover, "discover");
  const unfinished = (discover?.tickets ?? []).filter(
    (t) => !ctx.latest(outputs.report, `${t.id}:report`)
  );

  return (
    <Workflow name="big-project">
      <Sequence>
        <Branch if={unfinished.length === 0} then={<Discover />} />
        {unfinished.map((t) => (
          <TicketPipeline key={t.id} ticket={t} />
        ))}
      </Sequence>
    </Workflow>
  );
});
```

Use stable IDs (`t.id`, not array index) so resume matches.

## Coherent task with tools

One context boundary per logical operation, not per step. Splitting too finely loses cross-step reasoning.

```tsx
<Task id="fix-config-bugs" output={outputs.result} agent={agentWithTools}>
  {`Analyze config files in ${ctx.input.dir}, find bugs, fix them, write results.
   Use read, edit, bash. Return { summary, filesChanged }.`}
</Task>
```

## Per-agent least-privilege tools

```tsx
import { AnthropicAgent } from "smithers-orchestrator";

const analyst     = new AnthropicAgent({ model, system: "Return JSON" });               // no tools
const reviewer    = new AnthropicAgent({ model, system: "...", tools: { read, grep } }); // read-only
const implementer = new AnthropicAgent({ model, system: "...", tools: { read, write, edit, bash } });
```

Match the tool surface to the role.

## Side-effect tools with idempotency

External mutations must mark themselves and use the runtime idempotency key.

```tsx
import { defineTool } from "smithers-orchestrator/tools";

const createTicket = defineTool({
  name: "jira.create",
  schema: z.object({ title: z.string() }),
  sideEffect: true,
  idempotent: false,
  async execute(args, ctx) {
    return jira.createIssue({ ...args, idempotencyKey: ctx.idempotencyKey });
  },
});
```

Retries reuse the same idempotency key, so a successful side effect from attempt 1 isn't doubled by attempt 2.

## Caching for iterative authoring

```tsx
<Workflow name="report" cache>
  <Task
    id="analyze"
    output={outputs.analysis}
    agent={analyst}
    cache={{ by: (ctx) => ({ repo: ctx.input.repo }), version: "v2" }}
  >
    {`Analyze ${ctx.input.repo}`}
  </Task>
  <Task id="report" output={outputs.report} agent={reporter} deps={{ analyze: outputs.analysis }}>
    {(deps) => `Report on ${deps.analyze.summary}`}
  </Task>
</Workflow>
```

Tweak the downstream Task without re-running the expensive upstream one. Don't cache side effects.

## Schemas in their own file

```ts
// schemas.ts
export const schemas = {
  analysis: z.object({ summary: z.string(), issues: z.array(z.string()) }),
  review:   z.object({ approved: z.boolean(), feedback: z.string() }),
  report:   z.object({ title: z.string(), body: z.string() }),
};

// workflow.tsx
import { schemas } from "./schemas";
const { Workflow, smithers, outputs } = createSmithers(schemas);
```

All data shapes in one place; new contributors read schemas.ts first.

## MDX prompt with auto-injected schema

```mdx
{/* Review.mdx */}
Review this code:

**Files**: {props.files.join(", ")}
**Tests**: {props.testsPassed}/{props.testsRun} passing

Return JSON matching schema:
{props.schema}
```

`props.schema` is the JSON-schema description of the Task's `outputSchema`, auto-injected. Keeps the prompt and the validator in sync.

## Custom hooks over `ctx`

```tsx
function useReviewState(ticketId: string) {
  const ctx = useCtx();
  const claude = ctx.latest("review", `${ticketId}:review-claude`);
  const codex  = ctx.latest("review", `${ticketId}:review-codex`);
  return { claude, codex, allApproved: !!(claude?.approved && codex?.approved) };
}
```

Workflow logic factors out into hooks the same way React UI logic does.

## VCS revert & per-attempt snapshots

Smithers records a jj change ID (or git SHA) per attempt. Revert any attempt to its exact workspace state:

```bash
bunx smithers-orchestrator revert workflow.tsx --run-id RUN_ID --node-id implement --attempt 1
```

Useful when an experiment leaves the worktree in a bad state.

## Time travel: fork, replay, diff

```bash
bunx smithers-orchestrator timeline RUN_ID --tree
bunx smithers-orchestrator diff RUN_ID NODE_ID
bunx smithers-orchestrator fork workflow.tsx --run-id RUN_ID --frame 5 --reset-node analyze --label exp1
bunx smithers-orchestrator replay workflow.tsx --run-id RUN_ID --frame 5 --restore-vcs
```

Fork makes a child run without starting it (add `--run` to start immediately); replay also makes a child run but immediately resumes it. `--restore-vcs` checks out the original revision so re-execution sees the same source.

## Scoring tasks

```tsx
import { schemaAdherenceScorer, latencyScorer, llmJudge } from "smithers-orchestrator/scorers";

<Task
  id="analyze"
  output={outputs.analysis}
  agent={analyst}
  scorers={{
    schema:  { scorer: schemaAdherenceScorer() },
    latency: { scorer: latencyScorer({ targetMs: 5000 }) },
    quality: {
      scorer: llmJudge({ model: claude, prompt: "Rate the analysis quality 0-1" }),
      sampling: { kind: "ratio", ratio: 0.1 },
    },
  }}
>
  Analyze...
</Task>
```

Scorers run after the task and never block. Sample expensive scorers with `ratio`.

## Eval suites for regressions

```jsonl
{"id":"happy-path","input":{"prompt":"Draft release notes"},"expected":{"status":"finished"}}
{"id":"quality-gate","input":{"prompt":"Find risky changes"},"expected":{"status":"finished","outputContains":{"analysis":[{"riskLevel":"low"}]}}}
```

```bash
bunx smithers-orchestrator eval workflow.tsx --cases evals/smoke.jsonl --suite smoke --force
```

Use eval suites when you need repeatable workflow-level checks. Assertions support `status`, `output` (exact match), and `outputContains` (partial match). Reports land in `.smithers/evals/<suite>.json`; the command exits non-zero on failures.

## Continue-as-new for very long runs

A run with too much accumulated state hands off to a fresh run with carried state.

```tsx
<ContinueAsNew when={iterationCount > 100} carry={{ summary: rolledUpState }} />
```

Avoids unbounded SQLite growth in long-lived loops.

## Hot reload while authoring

```bash
bunx smithers-orchestrator up workflow.tsx --hot
```

Edits to the workflow source apply on the next render frame without losing in-flight task state. Schema changes still require a fresh run.

## Fork agent session context

Every agent task produces a reusable session snapshot. `fork` starts a new task from a copy of another task's final context, without mutating the source.

```tsx
<Task id="investigate" agent={claude} output={outputs.investigation}>
  Understand the bug and identify possible fixes.
</Task>

<Parallel>
  <Task id="minimal-fix" agent={claude} fork="investigate" output={outputs.patch}>
    Try the minimal fix.
  </Task>
  <Task id="refactor-fix" agent={claude} fork="investigate" output={outputs.patch}>
    Try the refactor fix.
  </Task>
</Parallel>
```

`fork` adds the source as a dependency (the forked task waits for it), copies its conversation into a fresh session, then submits the new prompt. Both branches above start from the same investigation and never affect each other. Chain it for follow-ups (`plan → implement → verify`); inside a `<Loop>` it forks the latest completed snapshot for that id. See `<Task>` fork.

## Read next

- How It Works: the model these recipes plug into.
- Components: full prop surface.
- CLI: every command.

---

## Common Footguns

> The handful of mistakes that bite people first, and the pattern that avoids each.

Smithers is durable by design, and most of its sharp edges trace back to one fact: a run is replayed from persisted state, not from memory. The mistakes below are the ones people hit first. Each links to the canonical reference for the full story.

## Resume and state

### Unstable task IDs break resume

The runtime keys completed work by task `id`. A changed id looks like a brand new task, and an id that disappears is dropped from the plan. Derive ids from data, never from a loop index or a timestamp.

```tsx
{tickets.map((t) => <TicketPipeline key={t.id} id={`${t.id}:work`} />)}
// NOT id={`work-${i}`} or id={`work-${Date.now()}`}
```

It is the same rule as React keys. See How It Works.

### Input is immutable after the first run

A run's `--input` is persisted when it starts. Resuming with different input is an error, not a silent override. If you need different input, start a new run.

### Code changes block resume, they do not merge

A workflow source change is a different workflow. Resume validates the source hash of the original run, so editing the file and then resuming is blocked. Start a new run instead. To change a workflow that is still running, use hot reload (`up --hot`): edits apply to newly scheduled tasks while in-flight tasks finish on their original code. See Recipes.

### `useState` is not durable

React state resets on every render, which here means every frame. Anything that must survive a crash belongs in a Task output read back through `ctx`, not in component state.

## Caching

### Do not cache side-effecting tasks

`cache` is for pure work that is expensive to recompute. Caching a deploy, an email, or a mutation means it silently does not run on a cache hit. The cache key is `cache.by(ctx)` plus `cache.version` plus the output schema signature, so a schema change invalidates the cache automatically and a stale cached row fails validation and misses safely. See How It Works.

## Side effects and retries

### Mark side-effecting tools and key them

Tasks retry, and a retried agent loop can call a tool again. A custom tool that writes to the world should declare `sideEffect: true` and pass `ctx.idempotencyKey` through to the downstream system so a retry is a no-op rather than a second charge. `ctx.idempotencyKey` is stable across retries and resumes for the same task iteration.

```ts
import { defineTool } from "smithers-orchestrator";
import { z } from "zod";

const placeOrder = defineTool({
  name: "shop.place_order",
  description: "Place an order",
  schema: z.object({ sku: z.string() }),
  sideEffect: true,
  idempotent: false,
  async execute(args, ctx) {
    return await shop.placeOrder({ sku: args.sku, idempotencyKey: ctx.idempotencyKey });
  },
});
```

## Tools and sandbox

### Agents get only the tools you grant

The five built-in tools (`read`, `write`, `edit`, `grep`, `bash`) are sandboxed to `rootDir`. Symlinks, network, and long-running calls are denied by default; `--allow-network` opens bash to the network. Grant least privilege per task: a reviewer gets `read` and `grep`, an implementer gets `write`, `edit`, and `bash`, and an agent with no `tools` cannot touch the filesystem at all. See How It Works.

## Time travel and VCS

### Revert and VCS-restoring replay change your working tree

These rewrite filesystem state, so treat them the way you would treat `git checkout` over uncommitted work.

- `revert` restores the workspace to a previous attempt's filesystem state and discards graph snapshots recorded after that attempt. It restores files only and lands them as a new change on top of the current working copy. See Revert to Attempt.
- `replay --restore-vcs` checks out the jj revision the snapshot was taken at, so re-execution sees the same source as the original run.

### `revert` requires jj

Smithers prefers `.jj` over `.git`. Pure Git repos run fine but cannot use `revert`, because there is no per-attempt change to restore. Install jj if you want attempt-level revert. See VCS.

### Worktree runs auto-rebase on resume

On resume of a worktree run, Smithers rebases onto the base branch (default `main`) and continues even if the rebase fails. Expect the branch to move.

## Outputs

### `ctx.outputMaybe` is undefined until the task runs

Reading a downstream output before its task has completed returns `undefined`, not a default. Guard it so a not-yet-run task does not crash the render.

```tsx
const analysis = ctx.outputMaybe(outputs.analysis, { nodeId: "analyze" });
return analysis ? <Task id="report" output={outputs.report} agent={writer}>...</Task> : null;
```

## Read next

- How It Works: the execution model these rules come from.
- Recipes: caching, hot reload, and VCS revert in context.

---

## Types

> Public TypeScript surface for smithers-orchestrator.

One source of truth: `tsc --emitDeclarationOnly` would produce something close to this. Import these types from `smithers-orchestrator` unless noted otherwise.

Major sections at a glance:
- **Workflow / Context**: `SmithersWorkflow`, `SmithersCtx`, `RunOptions`, `RunResult`, the entry points for defining and running workflows.
- **Task / Graph**: `TaskDescriptor`, `TaskProps`, `GraphSnapshot`, the shape of nodes at runtime and in the JSX layer.
- **Component props**: `WorkflowProps`, `ApprovalProps`, `SignalProps`, `LoopProps`, etc., all JSX component interfaces.
- **Errors**: `SmithersError`, `KnownSmithersErrorCode`, typed error codes; see Errors for descriptions.
- **Server / Gateway**: `ServerOptions`, `GatewayOptions`, `GatewayAuthConfig`, self-hosting configuration.
- **Scorers / Memory / OpenAPI / Observability**: sub-path imports (`smithers-orchestrator/scorers`, `/memory`, `/openapi`, `/observability`).

```ts
// =============================================================================
// Workflow
// =============================================================================

interface SmithersWorkflow<Schema = unknown> {
  readonly readableName?: string;
  readonly description?: string;
  readonly db?: unknown;
  readonly build: (ctx: SmithersCtx<Schema>) => JSX.Element;
  readonly opts: SmithersWorkflowOptions;
  readonly schemaRegistry?: Map<string, SchemaRegistryEntry>;
}

type SmithersWorkflowOptions = {
  alertPolicy?: SmithersAlertPolicy;
  cache?: boolean;
  workflowHash?: string;
};

type SchemaRegistryEntry = {
  table: any;
  zodSchema: import("zod").ZodObject<any>;
};

type SmithersAlertPolicy = {
  defaults?: SmithersAlertPolicyDefaults;
  rules?: Record<string, SmithersAlertPolicyRule>;
  reactions?: Record<string, SmithersAlertReaction>;
};

type SmithersAlertSeverity = "info" | "warning" | "critical";
type SmithersAlertLabels = Record<string, string>;
type SmithersAlertPolicyDefaults = {
  owner?: string;
  severity?: SmithersAlertSeverity;
  runbook?: string;
  labels?: SmithersAlertLabels;
};
type SmithersAlertPolicyRule = SmithersAlertPolicyDefaults & {
  afterMs?: number;
  reaction?: string | SmithersAlertReaction;
};
type SmithersAlertReaction =
  | { kind: "emit-only" }
  | { kind: "pause" }
  | { kind: "cancel" }
  | { kind: "open-approval" }
  | { kind: "deliver"; destination: string };

// =============================================================================
// Context
// =============================================================================

declare class SmithersCtx<Schema = unknown> {
  readonly runId: string;
  readonly iteration: number;
  readonly iterations?: Record<string, number>;
  readonly input: Schema extends { input: infer T } ? T : any;
  readonly auth: RunAuthContext | null;
  readonly outputs: OutputAccessor<Schema>;

  output(table: any, key: OutputKey): any;
  outputMaybe(table: any, key: OutputKey): any | undefined;
  latest(table: any, nodeId: string): any | undefined;
  latestArray(value: unknown, schema: any): unknown[];
  iterationCount(table: any, nodeId: string): number;
}

type OutputKey = { nodeId: string; iteration?: number };
type OutputAccessor<Schema> = ((table: any) => any[]) & Record<string, any[]>;
type InferRow<TTable> = TTable extends { $inferSelect: infer R } ? R : never;
type InferOutputEntry<T> =
  T extends import("zod").ZodTypeAny ? import("zod").infer<T>
  : T extends { $inferSelect: any } ? InferRow<T>
  : never;

type RunAuthContext = {
  triggeredBy: string;
  scopes: string[];
  role: string;
  createdAt: string;
};

// =============================================================================
// Run
// =============================================================================

type RunOptions = {
  runId?: string;
  parentRunId?: string | null;
  input: Record<string, unknown>;
  maxConcurrency?: number;          // default 4
  onProgress?: (e: SmithersEvent) => void;
  signal?: AbortSignal;
  resume?: boolean;
  force?: boolean;                  // resume even if marked running
  workflowPath?: string;
  rootDir?: string;
  logDir?: string | null;
  allowNetwork?: boolean;           // default false; bash tool network access
  maxOutputBytes?: number;          // default 200000
  toolTimeoutMs?: number;           // default 60000
  hot?: boolean | HotReloadOptions;
  annotations?: Record<string, string | number | boolean>;
  auth?: RunAuthContext | null;
  config?: Record<string, unknown>;
  cliAgentToolsDefault?: "all" | "explicit-only";  // default "all"
  resumeClaim?: {                   // internal supervisor coordination
    claimOwnerId: string;
    claimHeartbeatAtMs: number;
    restoreRuntimeOwnerId?: string | null;
    restoreHeartbeatAtMs?: number | null;
  };
};

type HotReloadOptions = {
  rootDir?: string;
  outDir?: string;                  // default .smithers/hmr/<runId>
  maxGenerations?: number;          // default 3
  cancelUnmounted?: boolean;        // default false
  debounceMs?: number;              // default 100
};

type RunResult = {
  readonly runId: string;
  readonly status: RunStatus;
  readonly output?: unknown;
  readonly error?: unknown;
  readonly nextRunId?: string;      // set when the run continued-as-new
};

type RunStatus =
  | "running"
  | "waiting-approval"
  | "waiting-event"
  | "waiting-timer"
  | "finished"
  | "continued"
  | "failed"
  | "cancelled";

type RetryTaskOptions = {
  runId: string;
  nodeId: string;
  iteration?: number;
  resetDependents?: boolean;        // default true
  force?: boolean;                  // default false
  onProgress?: (e: SmithersEvent) => void;
};

type RetryTaskResult = {
  success: boolean;
  resetNodes: string[];
  error?: string;
};

// =============================================================================
// Task
// =============================================================================

type TaskDescriptor = {
  nodeId: string;
  ordinal: number;
  iteration: number;
  ralphId?: string;
  dependsOn?: string[];
  needs?: Record<string, string>;
  forkSource?: string;              // logical id of the task whose session this task forks
  worktreeId?: string;
  worktreePath?: string;
  worktreeBranch?: string;
  worktreeBaseBranch?: string;
  outputTable: unknown | null;
  outputTableName: string;
  outputRef?: import("zod").ZodObject<any>;
  outputSchema?: import("zod").ZodObject<any>;
  parallelGroupId?: string;
  parallelMaxConcurrency?: number;
  needsApproval: boolean;
  waitAsync?: boolean;
  approvalMode?: "gate" | "decision" | "select" | "rank";
  approvalOnDeny?: "fail" | "continue" | "skip";
  approvalOptions?: ApprovalOption[];
  approvalAllowedScopes?: string[];
  approvalAllowedUsers?: string[];
  approvalAutoApprove?: {
    after?: number;
    audit?: boolean;
    conditionMet?: boolean;
    revertOnMet?: boolean;
  };
  skipIf: boolean;
  retries: number;
  retryPolicy?: RetryPolicy;
  timeoutMs: number | null;
  heartbeatTimeoutMs: number | null;
  continueOnFail: boolean;
  cachePolicy?: CachePolicy;
  hijack?: boolean;
  onHijackExit?: "complete" | "reopen";
  agent?: AgentLike | AgentLike[];
  prompt?: string;
  staticPayload?: unknown;
  computeFn?: () => unknown | Promise<unknown>;
  label?: string;
  meta?: Record<string, unknown>;
  scorers?: ScorersMap;
  memoryConfig?: TaskMemoryConfig;
};

type RetryPolicy = {
  backoff?: "fixed" | "linear" | "exponential";   // default "fixed"
  initialDelayMs?: number;                         // default 0
  maxDelayMs?: number;
  multiplier?: number;
  jitter?: boolean;
};

type CachePolicy<Ctx = any> = {
  by?: (ctx: Ctx) => unknown;
  version?: string;
  key?: string;
  ttlMs?: number;
  scope?: "run" | "workflow" | "global";
};

type AgentLike = {
  id?: string;
  tools?: Record<string, any>;
  supportsNativeStructuredOutput?: boolean;
  capabilities?: any;
  generate: (args: any) => Promise<any>;
};

type TaskMemoryConfig = {
  recall?: { namespace?: MemoryNamespace; query?: string; topK?: number };
  remember?: { namespace?: MemoryNamespace; key?: string };
  threadId?: string;
};

type MemoryNamespace = { kind: MemoryNamespaceKind; id: string };
type MemoryNamespaceKind = "workflow" | "agent" | "user" | "global";

// =============================================================================
// Graph
// =============================================================================

type GraphSnapshot = {
  readonly runId: string;
  readonly frameNo: number;
  readonly xml: XmlNode | null;
  readonly tasks: readonly TaskDescriptor[];
};

type XmlNode = XmlElement | XmlText;

type XmlElement = {
  readonly kind: "element";
  readonly tag: string;             // "Workflow" | "Task" | "Sequence" | ...
  readonly props: Record<string, string>;
  readonly children: readonly XmlNode[];
};

type XmlText = { readonly kind: "text"; readonly text: string };

// =============================================================================
// Events
// =============================================================================
//
// `SmithersEvent` is a discriminated union of every lifecycle event the runtime
// emits. The full union is documented separately to keep this
// file usable as the everyday type reference.
//
// See: /reference/event-types (rendered) or /llms-full.txt (LLM bundle).

type SmithersEvent = { type: string; runId: string; timestampMs: number } & Record<string, unknown>;
// (Each variant has additional fields per its `type`. See event-types.)

// =============================================================================
// Component props
// =============================================================================

type WorkflowProps = {
  name: string;
  cache?: boolean;
  children?: React.ReactNode;
};

type OutputTarget = import("zod").ZodObject<any> | { $inferSelect: any } | string;
type DepsSpec = Record<string, OutputTarget>;
type InferDeps<D extends DepsSpec> = {
  [K in keyof D]: D[K] extends string ? unknown : InferOutputEntry<D[K]>;
};

type TaskProps<Row, Output extends OutputTarget = OutputTarget, D extends DepsSpec = {}> = {
  key?: string;
  id: string;
  output: Output;
  outputSchema?: import("zod").ZodObject<any>;
  agent?: AgentLike | AgentLike[];
  fallbackAgent?: AgentLike;
  dependsOn?: string[];
  needs?: Record<string, string>;
  deps?: D;
  fork?: string;                    // start from another task's final agent session snapshot
  skipIf?: boolean;
  needsApproval?: boolean;
  async?: boolean;                  // only with needsApproval
  timeoutMs?: number;
  heartbeatTimeoutMs?: number;
  noRetry?: boolean;
  retries?: number;                 // default Infinity (set 0 to disable)
  retryPolicy?: RetryPolicy;        // default exponential, 1000ms, capped 5min
  continueOnFail?: boolean;
  cache?: CachePolicy;
  scorers?: ScorersMap;
  memory?: TaskMemoryConfig;
  hijack?: boolean;
  onHijackExit?: "complete" | "reopen";
  allowTools?: string[];            // CLI-agent tool allowlist
  label?: string;
  meta?: Record<string, unknown>;
  // string = prompt literal; Row = static result; () => Row = compute fn;
  // (deps) => result = deps-aware fn; React.ReactNode = JSX subtree
  children: string | Row | (() => Row | Promise<Row>) | React.ReactNode | ((deps: InferDeps<D>) => Row | React.ReactNode);
};

type SequenceProps  = { skipIf?: boolean; children?: React.ReactNode };
type ParallelProps  = { id?: string; maxConcurrency?: number; skipIf?: boolean; children?: React.ReactNode };
type BranchProps    = { if: boolean; then: React.ReactElement; else?: React.ReactElement | null; skipIf?: boolean };
type LoopProps      = {
  id?: string;
  until?: boolean;
  maxIterations?: number;
  onMaxReached?: "fail" | "return-last";   // default "return-last"
  continueAsNewEvery?: number;
  skipIf?: boolean;
  children?: React.ReactNode;
};
type RalphProps     = LoopProps;            // deprecated alias

type ApprovalDecision  = { approved: boolean; note: string | null; decidedBy: string | null; decidedAt: string | null };
type ApprovalSelection = { selected: string; notes: string | null };
type ApprovalRanking   = { ranked: string[]; notes: string | null };
type ApprovalRequest   = { title: string; summary?: string; metadata?: Record<string, unknown> };
type ApprovalMode      = "approve" | "select" | "rank";
type ApprovalOption    = { key: string; label: string; summary?: string; metadata?: Record<string, unknown> };
type ApprovalAutoApprove = {
  after?: number;
  condition?: ((ctx: any) => boolean) | (() => boolean);
  audit?: boolean;
  revertOn?: ((ctx: any) => boolean) | (() => boolean);
};

type ApprovalProps<Row = ApprovalDecision, Output extends OutputTarget = OutputTarget> = {
  id: string;
  mode?: ApprovalMode;
  options?: ApprovalOption[];
  output: Output;
  outputSchema?: import("zod").ZodObject<any>;
  request: ApprovalRequest;
  onDeny?: "fail" | "continue" | "skip";
  allowedScopes?: string[];
  allowedUsers?: string[];
  autoApprove?: ApprovalAutoApprove;
  async?: boolean;
  dependsOn?: string[];
  needs?: Record<string, string>;
  skipIf?: boolean;
  timeoutMs?: number;
  heartbeatTimeoutMs?: number;
  retries?: number;
  retryPolicy?: RetryPolicy;
  continueOnFail?: boolean;
  cache?: CachePolicy;
  label?: string;
  meta?: Record<string, unknown>;
  key?: string;
  children?: React.ReactNode;
};

type SignalProps<S extends import("zod").ZodObject<any> = import("zod").ZodObject<any>> = {
  id: string;
  schema: S;
  correlationId?: string;
  timeoutMs?: number;
  onTimeout?: "fail" | "skip" | "continue";
  async?: boolean;
  skipIf?: boolean;
  dependsOn?: string[];
  needs?: Record<string, string>;
  label?: string;
  meta?: Record<string, unknown>;
  key?: string;
  children?: (data: import("zod").infer<S>) => React.ReactNode;
};

type WaitForEventProps = {
  id: string;
  event: string;
  correlationId?: string;
  output: OutputTarget;
  outputSchema?: import("zod").ZodObject<any>;
  timeoutMs?: number;
  onTimeout?: "fail" | "skip" | "continue";
  async?: boolean;
  skipIf?: boolean;
  dependsOn?: string[];
  needs?: Record<string, string>;
  label?: string;
  meta?: Record<string, unknown>;
  key?: string;
};

type TimerProps = {
  id: string;
  duration?: string;                // e.g. "30s", "5m"
  until?: string | Date;            // absolute timestamp
  every?: string;                   // periodic
  skipIf?: boolean;
  dependsOn?: string[];
  needs?: Record<string, string>;
  label?: string;
  meta?: Record<string, unknown>;
  key?: string;
};

type SagaStepDef    = { id: string; action: React.ReactElement; compensation: React.ReactElement; label?: string };
type SagaProps      = { id?: string; steps?: SagaStepDef[]; onFailure?: "compensate" | "compensate-and-fail" | "fail"; skipIf?: boolean; children?: React.ReactNode };
type SagaStepProps  = { id: string; compensation: React.ReactElement; children: React.ReactElement };

type TryCatchFinallyProps = {
  id?: string;
  try: React.ReactElement;
  catch?: React.ReactElement | ((error: SmithersError) => React.ReactElement);
  catchErrors?: SmithersErrorCode[];
  finally?: React.ReactElement;
  skipIf?: boolean;
};

// Higher-level composites
type PollerProps = {
  id?: string;
  check: AgentLike | ((...args: any[]) => any);
  checkOutput: OutputTarget;
  maxAttempts?: number;
  backoff?: "fixed" | "linear" | "exponential";
  intervalMs?: number;
  onTimeout?: "fail" | "return-last";
  skipIf?: boolean;
  children?: React.ReactNode;
};

type ColumnDef = {
  name: string;
  agent: AgentLike;
  output: OutputTarget;
  prompt?: (ctx: { item: unknown; column: string }) => string;
  task?: Partial<TaskProps<unknown>>;
};

type KanbanProps = {
  id?: string;
  columns: ColumnDef[];
  useTickets: () => Array<{ id: string; [key: string]: unknown }>;
  agents?: Record<string, AgentLike>;
  maxConcurrency?: number;
  onComplete?: OutputTarget;
  until?: boolean;
  maxIterations?: number;
  skipIf?: boolean;
  children?: React.ReactNode | Record<string, unknown>;
};

// Sandbox
type SandboxRuntime = "bubblewrap" | "docker" | "codeplane";
type SandboxVolumeMount = { host: string; container: string; readonly?: boolean };
type SandboxWorkspaceSpec = {
  name: string;
  snapshotId?: string;
  idleTimeoutSecs?: number;
  persistence?: "ephemeral" | "sticky";
};

type SandboxProps = {
  id: string;
  workflow?: (...args: any[]) => any;
  input?: unknown;
  output: OutputTarget;
  runtime?: SandboxRuntime;
  allowNetwork?: boolean;
  reviewDiffs?: boolean;
  autoAcceptDiffs?: boolean;
  image?: string;
  env?: Record<string, string>;
  ports?: Array<{ host: number; container: number }>;
  volumes?: SandboxVolumeMount[];
  memoryLimit?: string;
  cpuLimit?: string;
  command?: string;
  workspace?: SandboxWorkspaceSpec;
  skipIf?: boolean;
  timeoutMs?: number;
  heartbeatTimeoutMs?: number;
  retries?: number;
  retryPolicy?: RetryPolicy;
  continueOnFail?: boolean;
  cache?: CachePolicy;
  dependsOn?: string[];
  needs?: Record<string, string>;
  label?: string;
  meta?: Record<string, unknown>;
  key?: string;
  children?: React.ReactNode;
};

// =============================================================================
// Errors
// =============================================================================
//
// Every Smithers error is a SmithersError with a typed code. See the Errors page
// for the full list of built-in codes.

declare class SmithersError extends Error {
  readonly code: SmithersErrorCode;
  readonly summary: string;
  readonly docsUrl: string;
  readonly details?: Record<string, unknown>;
  readonly cause?: unknown;
}

type SmithersErrorCode = KnownSmithersErrorCode | (string & {});
type KnownSmithersErrorCode =
  | "INVALID_INPUT" | "MISSING_INPUT" | "MISSING_INPUT_TABLE" | "RESUME_METADATA_MISMATCH"
  | "UNKNOWN_OUTPUT_SCHEMA" | "INVALID_OUTPUT" | "WORKTREE_CREATE_FAILED" | "VCS_NOT_FOUND"
  | "SNAPSHOT_NOT_FOUND" | "VCS_WORKSPACE_CREATE_FAILED" | "TASK_TIMEOUT" | "TASK_ABORTED"
  | "TASK_HIJACK_UNSUPPORTED" | "RUN_NOT_FOUND" | "NODE_NOT_FOUND" | "INVALID_EVENTS_OPTIONS"
  | "SANDBOX_BUNDLE_INVALID" | "SANDBOX_BUNDLE_TOO_LARGE" | "WORKFLOW_EXECUTION_FAILED"
  | "SANDBOX_EXECUTION_FAILED" | "TASK_HEARTBEAT_TIMEOUT" | "HEARTBEAT_PAYLOAD_TOO_LARGE"
  | "HEARTBEAT_PAYLOAD_NOT_JSON_SERIALIZABLE" | "RUN_CANCELLED" | "RUN_NOT_RESUMABLE"
  | "RUN_OWNER_ALIVE" | "RUN_STILL_RUNNING" | "RUN_RESUME_CLAIM_LOST" | "RUN_RESUME_CLAIM_FAILED"
  | "RUN_RESUME_ACTIVATION_FAILED" | "RUN_HIJACKED" | "CONTINUATION_STATE_TOO_LARGE"
  | "INVALID_CONTINUATION_STATE" | "RALPH_MAX_REACHED" | "SCHEDULER_ERROR" | "SESSION_ERROR"
  | "TASK_ID_REQUIRED" | "TASK_MISSING_OUTPUT" | "DUPLICATE_ID" | "NESTED_LOOP"
  | "WORKTREE_EMPTY_PATH" | "MDX_PRELOAD_INACTIVE" | "CONTEXT_OUTSIDE_WORKFLOW"
  | "MISSING_OUTPUT" | "DEP_NOT_SATISFIED" | "ASPECT_BUDGET_EXCEEDED" | "APPROVAL_OUTSIDE_TASK"
  | "APPROVAL_OPTIONS_REQUIRED" | "WORKFLOW_MISSING_DEFAULT"
  | "TOOL_PATH_INVALID" | "TOOL_PATH_ESCAPE" | "TOOL_FILE_TOO_LARGE" | "TOOL_CONTENT_TOO_LARGE"
  | "TOOL_PATCH_TOO_LARGE" | "TOOL_PATCH_FAILED" | "TOOL_NETWORK_DISABLED"
  | "TOOL_GIT_REMOTE_DISABLED" | "TOOL_COMMAND_FAILED" | "TOOL_GREP_FAILED"
  | "AGENT_CLI_ERROR" | "AGENT_CONFIG_INVALID" | "AGENT_RPC_FILE_ARGS" | "AGENT_BUILD_COMMAND" | "AGENT_DIAGNOSTIC_TIMEOUT"
  | "DB_MISSING_COLUMNS" | "DB_REQUIRES_BUN_SQLITE" | "DB_QUERY_FAILED" | "DB_WRITE_FAILED"
  | "STORAGE_ERROR" | "INTERNAL_ERROR" | "PROCESS_ABORTED" | "PROCESS_TIMEOUT"
  | "PROCESS_IDLE_TIMEOUT" | "PROCESS_SPAWN_FAILED" | "TASK_RUNTIME_UNAVAILABLE"
  | "SCHEMA_CHANGE_HOT" | "HOT_OVERLAY_FAILED" | "HOT_RELOAD_INVALID_MODULE"
  | "SCORER_FAILED" | "WORKFLOW_EXISTS" | "CLI_DB_NOT_FOUND" | "CLI_AGENT_UNSUPPORTED"
  | "PI_HTTP_ERROR" | "EXTERNAL_BUILD_FAILED" | "SCHEMA_DISCOVERY_FAILED"
  | "OPENAPI_SPEC_LOAD_FAILED" | "OPENAPI_OPERATION_NOT_FOUND" | "OPENAPI_TOOL_EXECUTION_FAILED";

// =============================================================================
// Server
// =============================================================================

type ServerOptions = {
  port?: number;
  db?: unknown;
  authToken?: string;
  maxBodyBytes?: number;
  rootDir?: string;
  allowNetwork?: boolean;
  headersTimeout?: number;
  requestTimeout?: number;
};

type ServeOptions = {
  workflow: SmithersWorkflow<any>;
  adapter: any;
  runId: string;
  abort: AbortController;
  authToken?: string;
  metrics?: boolean;
};

type GatewayTokenGrant = { role: string; scopes: string[]; userId?: string };
type GatewayAuthConfig =
  | { mode: "token"; tokens: Record<string, GatewayTokenGrant> }
  | { mode: "jwt"; issuer: string; audience: string | string[]; secret: string;
      scopesClaim?: string; roleClaim?: string; userClaim?: string;
      defaultRole?: string; defaultScopes?: string[]; clockSkewSeconds?: number }
  | { mode: "trusted-proxy"; trustedHeaders?: string[]; allowedOrigins?: string[];
      defaultRole?: string; defaultScopes?: string[] };

type GatewayOptions = {
  protocol?: number;
  features?: string[];
  heartbeatMs?: number;
  auth?: GatewayAuthConfig;
  ui?: GatewayUiConfig;
  defaults?: { cliAgentTools?: "all" | "explicit-only" };
  maxBodyBytes?: number;
  maxPayload?: number;
  maxConnections?: number;
  eventWindowSize?: number;
  headersTimeout?: number;
  requestTimeout?: number;
};

// =============================================================================
// Scorers (smithers-orchestrator/scorers)
// =============================================================================

type ScoreResult  = { score: number; reason?: string; meta?: Record<string, unknown> };
type ScorerInput  = { input: unknown; output: unknown; groundTruth?: unknown; context?: unknown; latencyMs?: number; outputSchema?: import("zod").ZodObject<any> };
type ScorerFn     = (input: ScorerInput) => Promise<ScoreResult>;
type Scorer       = { id: string; name: string; description: string; score: ScorerFn };
type SamplingConfig =
  | { type: "all" }
  | { type: "ratio"; rate: number }
  | { type: "none" };
type ScorerBinding = { scorer: Scorer; sampling?: SamplingConfig };
type ScorersMap    = Record<string, ScorerBinding>;

type LlmJudgeConfig    = { model: string; systemPrompt?: string; temperature?: number; maxTokens?: number };
type CreateScorerConfig = {
  id: string;
  name: string;
  description: string;
  model: string;
  criteria: string;
  examples?: Array<{ input: unknown; output: unknown; score: number; explanation: string }>;
};

// =============================================================================
// Memory (smithers-orchestrator/memory)
// =============================================================================

type MemoryFact    = { namespace: string; key: string; valueJson: string; schemaSig?: string | null; createdAtMs: number; updatedAtMs: number; ttlMs?: number | null };
type MemoryMessage = { id: string; threadId: string; role: string; contentJson: string; runId?: string | null; nodeId?: string | null; createdAtMs: number };
type MemoryThread  = { threadId: string; namespace: string; title?: string | null; metadataJson?: string | null; createdAtMs: number; updatedAtMs: number };

type MemoryStore = {
  getFact(ns: MemoryNamespace, key: string): Promise<MemoryFact | undefined>;
  setFact(ns: MemoryNamespace, key: string, value: unknown, ttlMs?: number): Promise<void>;
  deleteFact(ns: MemoryNamespace, key: string): Promise<void>;
  listFacts(ns: MemoryNamespace): Promise<MemoryFact[]>;
  createThread(ns: MemoryNamespace, title?: string): Promise<MemoryThread>;
  getThread(threadId: string): Promise<MemoryThread | undefined>;
  deleteThread(threadId: string): Promise<void>;
  saveMessage(msg: Omit<MemoryMessage, "createdAtMs"> & { createdAtMs?: number }): Promise<void>;
  listMessages(threadId: string, limit?: number): Promise<MemoryMessage[]>;
  countMessages(threadId: string): Promise<number>;
  deleteExpiredFacts(): Promise<number>;
  getFactEffect(ns: MemoryNamespace, key: string): Effect.Effect<MemoryFact | undefined, SmithersError>;
  setFactEffect(ns: MemoryNamespace, key: string, value: unknown, ttlMs?: number): Effect.Effect<void, SmithersError>;
  deleteFactEffect(ns: MemoryNamespace, key: string): Effect.Effect<void, SmithersError>;
  listFactsEffect(ns: MemoryNamespace): Effect.Effect<MemoryFact[], SmithersError>;
  createThreadEffect(ns: MemoryNamespace, title?: string): Effect.Effect<MemoryThread, SmithersError>;
  getThreadEffect(threadId: string): Effect.Effect<MemoryThread | undefined, SmithersError>;
  deleteThreadEffect(threadId: string): Effect.Effect<void, SmithersError>;
  saveMessageEffect(msg: Omit<MemoryMessage, "createdAtMs"> & { createdAtMs?: number }): Effect.Effect<void, SmithersError>;
  listMessagesEffect(threadId: string, limit?: number): Effect.Effect<MemoryMessage[], SmithersError>;
  countMessagesEffect(threadId: string): Effect.Effect<number, SmithersError>;
  deleteExpiredFactsEffect(): Effect.Effect<number, SmithersError>;
};

// =============================================================================
// OpenAPI tools (smithers-orchestrator/openapi)
// =============================================================================

type OpenApiAuth =
  | { type: "apiKey"; name: string; in: "header" | "query"; value: string }
  | { type: "bearer"; token: string }
  | { type: "basic"; username: string; password: string };

type OpenApiToolsOptions = {
  baseUrl?: string;
  headers?: Record<string, string>;
  auth?: OpenApiAuth;
  include?: string[];
  exclude?: string[];
  namePrefix?: string;
};

// =============================================================================
// CreateSmithers (createSmithers(...) return)
// =============================================================================

type CreateSmithersApi<Schema = any> = {
  Workflow: (props: WorkflowProps) => React.ReactElement;
  Approval: <Row>(props: ApprovalProps<Row>) => React.ReactElement;
  Task: <Row, D extends DepsSpec = {}>(props: TaskProps<Row, any, D>) => React.ReactElement;
  Sequence: (props: SequenceProps) => React.ReactElement;
  Parallel: (props: ParallelProps) => React.ReactElement;
  MergeQueue: (props: MergeQueueProps) => React.ReactElement;
  Branch: (props: BranchProps) => React.ReactElement;
  Loop: (props: LoopProps) => React.ReactElement;
  Ralph: (props: LoopProps) => React.ReactElement;
  Worktree: any;                     // schema-generic; see <Worktree> component docs
  Sandbox: (props: SandboxProps) => React.ReactElement;
  Signal: <S extends import("zod").ZodObject<any>>(props: SignalProps<S>) => React.ReactElement;
  Timer: (props: TimerProps) => React.ReactElement;
  ContinueAsNew: any;               // schema-generic; wraps continueAsNew() for JSX use
  continueAsNew: any;               // schema-generic; typed at call-site via Schema
  useCtx: () => SmithersCtx<Schema>;
  smithers: (build: (ctx: SmithersCtx<Schema>) => React.ReactElement, opts?: SmithersWorkflowOptions) => SmithersWorkflow<Schema>;
  db: any;                          // schema-generic Drizzle db instance; typed at call-site
  tables: Record<string, any>;      // schema-generic table map; typed at call-site
  outputs: Record<string, any>;     // schema-generic output accessor map; typed at call-site
};

// =============================================================================
// Observability (smithers-orchestrator/observability)
// =============================================================================

type SmithersLogFormat = "json" | "pretty";
type SmithersObservabilityService = { emit(event: SmithersEvent): void | Promise<void> };
type SmithersObservabilityOptions = { service?: SmithersObservabilityService; logFormat?: SmithersLogFormat };
type ResolvedSmithersObservabilityOptions = SmithersObservabilityOptions & { metricsPort?: number; metricsPath?: string };
```

For canonical, machine-checked types, install `smithers-orchestrator` and use editor go-to-definition. For runtime errors, see Errors.

---

## Error Reference

> Exhaustive Smithers error codes, typed error helpers, and HTTP API error responses.

`SmithersErrorInstance` is a typed, code-bearing `Error` subclass used throughout Smithers internals. It surfaces when `runWorkflow` throws, in `NodeFailed` events emitted during execution, and as JSON in HTTP API error responses. The imports below are the full error utility surface.

```ts
import {
  ERROR_REFERENCE_URL,
  SmithersErrorInstance,
  errorToJson,
  getSmithersErrorDefinition,
  getSmithersErrorDocsUrl,
  isKnownSmithersErrorCode,
  isSmithersError,
  knownSmithersErrorCodes,
} from "smithers-orchestrator";
import type {
  KnownSmithersErrorCode,
  SmithersError,
  SmithersErrorCode,
} from "smithers-orchestrator";
```

Every built-in `SmithersErrorInstance` carries three pieces of documentation metadata:

| Field | Meaning |
|---|---|
| `message` | Human-readable description followed by a docs URL, e.g. `"Input failed validation. See https://…"` |
| `summary` | Raw message without the docs suffix. |
| `docsUrl` | Reference URL for Smithers errors. |

Use `KnownSmithersErrorCode` for an exhaustive switch over built-in Smithers codes. `SmithersErrorCode` includes the `(string & {})` escape hatch for user-defined custom codes.

| Export | Kind | Description |
|---|---|---|
| `SmithersErrorInstance` | class | Runtime error class used throughout Smithers internals. |
| `isSmithersError(err)` | function | Type guard for values carrying a Smithers-style `code`. |
| `isKnownSmithersErrorCode(code)` | function | Narrows a string to the built-in exhaustive error-code union. |
| `knownSmithersErrorCodes` | value | Array of every built-in Smithers error code documented on this page. |
| `getSmithersErrorDocsUrl(code)` | function | Returns the docs URL appended to built-in error messages. |
| `getSmithersErrorDefinition(code)` | function | Returns category, description, and details metadata for known codes. |
| `errorToJson(err)` | function | Serializes `name`, `message`, `summary`, `docsUrl`, `code`, `details`, `cause`, and `stack`. |
| `ERROR_REFERENCE_URL` | value | Base docs URL for Smithers runtime errors. |
| `KnownSmithersErrorCode` | type | Exact built-in Smithers code union. |
| `SmithersErrorCode` | type | Built-in codes plus the custom string escape hatch. |
| `SmithersError` | type | Public typed shape for serialized Smithers errors. |

```ts
import { Effect } from "effect";
import { runWorkflow } from "smithers-orchestrator";

try {
  await Effect.runPromise(runWorkflow(workflow, { input: {} }));
} catch (err) {
  if (isSmithersError(err) && isKnownSmithersErrorCode(err.code)) {
    switch (err.code) {
      case "INVALID_INPUT":
        console.error("Bad input:", err.summary);
        break;
      case "AGENT_CLI_ERROR":
        console.error("Agent failed:", err.summary);
        break;
      default:
        console.error(`[${err.code}] ${err.summary}`);
    }

    console.error("Docs:", err.docsUrl);
  }
}
```

## Engine

| Code | When | Details |
|---|---|---|
| `INVALID_INPUT` | Workflow input fails validation or the runtime receives a non-object input payload. | -- |
| `MISSING_INPUT` | A resume run references an input row that is missing from the database. | -- |
| `MISSING_INPUT_TABLE` | The workflow schema does not expose the expected input table during resume or hydration. | -- |
| `RESUME_METADATA_MISMATCH` | Stored run metadata no longer matches the workflow being resumed. | -- |
| `UNKNOWN_OUTPUT_SCHEMA` | A task references an output table that is not present in the schema registry. | -- |
| `INVALID_OUTPUT` | Agent output cannot be parsed or validated against the declared output schema. | -- |
| `WORKTREE_CREATE_FAILED` | Smithers fails to create or hydrate a git or jj worktree for a task. | `{ worktreePath, vcsType, branch? }` |
| `VCS_NOT_FOUND` | No supported git or jj repository root can be found for the workflow. | `{ rootDir }` |
| `SNAPSHOT_NOT_FOUND` | A requested time-travel snapshot or frame does not exist. | `{ runId, frameNo }` |
| `VCS_WORKSPACE_CREATE_FAILED` | Smithers fails to materialize a jj workspace for time-travel or replay. | `{ runId, frameNo, vcsPointer, workspacePath }` |
| `TASK_TIMEOUT` | A task compute callback exceeds its configured timeout. | `{ nodeId, attempt, timeoutMs }` |
| `TASK_HIJACK_UNSUPPORTED` | A task requests auto-hijack but its agent cannot provide a resumable session or conversation. | `{ nodeId, agentId? }` |
| `TASK_FORK_SOURCE_NOT_COMPLETE` | A forked task began executing but its fork source has not completed, so no session snapshot exists yet. | `{ nodeId, forkSource }` |
| `TASK_FORK_SESSION_UNAVAILABLE` | A `<Task fork>` cannot obtain a usable agent session snapshot from the source task. | `{ nodeId, forkSource }` |
| `TASK_ABORTED` | A running task is aborted through an AbortSignal or shutdown path. | -- |
| `TASK_FORK_SOURCE_NOT_FOUND` | A `<Task fork>` references a source task id that is not present in the workflow graph (including a source that exists only in an unselected branch). | `{ nodeId, forkSource }` |
| `TASK_FORK_SOURCE_NOT_COMPLETE` | A forked task began executing but its fork source has not completed, so no session snapshot exists yet. | `{ nodeId, forkSource }` |
| `TASK_FORK_SESSION_UNAVAILABLE` | A `<Task fork>` cannot obtain a usable agent session snapshot, either because the forking task is not an agent task or because the source completed without producing a forkable conversation (e.g. a compute/static, skipped, or cancelled source). | `{ nodeId, forkSource }` |
| `TASK_FORK_CYCLE` | A `<Task fork>` introduces a dependency cycle, directly or indirectly. | `{ nodeId, forkSource }` |
| `RUN_NOT_FOUND` | A CLI or engine command references a run ID that does not exist in the database. | `{ runId }` |
| `NODE_NOT_FOUND` | A CLI command references a node ID that does not exist for the given run. | `{ runId, nodeId }` |
| `SANDBOX_BUNDLE_INVALID` | A sandbox bundle fails validation (missing README, invalid manifest, etc.). | `{ bundlePath }` |
| `SANDBOX_BUNDLE_TOO_LARGE` | A sandbox bundle exceeds the maximum allowed size. | `{ bundlePath, maxBytes }` |
| `WORKFLOW_EXECUTION_FAILED` | A child or builder workflow exits unsuccessfully without surfacing a typed error payload. | `{ status }` |
| `SANDBOX_EXECUTION_FAILED` | Sandbox setup or execution fails before a more specific sandbox error can be emitted. | `{ sandboxId, runId?, maxConcurrent?, activeSandboxCount? }` |
| `TASK_HEARTBEAT_TIMEOUT` | A task heartbeat timeout is exceeded while the task is still in progress. | `{ nodeId, iteration, attempt, timeoutMs, staleForMs, lastHeartbeatAtMs }` |
| `HEARTBEAT_PAYLOAD_TOO_LARGE` | A task heartbeat payload exceeds the maximum persisted checkpoint size. | `{ dataSizeBytes, maxBytes }` |
| `HEARTBEAT_PAYLOAD_NOT_JSON_SERIALIZABLE` | A task heartbeat payload contains values that cannot be serialized to JSON. | `{ path, valueType? }` |
| `RUN_CANCELLED` | A run is cancelled while runtime work is still active. | `{ runId }` |
| `RUN_NOT_RESUMABLE` | A resume request targets a run state that cannot be resumed. | `{ runId, status }` |
| `RUN_OWNER_ALIVE` | A resume attempt is skipped because the process that started the run is still alive (heartbeating). This is normal; it prevents two processes from running the same workflow simultaneously. | `{ runId, runtimeOwnerId }` |
| `RUN_STILL_RUNNING` | A recovery or resume operation finds a run that is still active. | `{ runId }` |
| `RUN_RESUME_CLAIM_LOST` | A runtime loses the resume claim before it can update the run. | `{ runId, runtimeOwnerId }` |
| `RUN_RESUME_CLAIM_FAILED` | A runtime cannot claim a stale run for resume. | `{ runId, runtimeOwnerId }` |
| `RUN_RESUME_ACTIVATION_FAILED` | A claimed run cannot be moved back into active execution. | `{ runId, runtimeOwnerId }` |
| `RUN_HIJACKED` | A run is interrupted because another runtime hijacked execution. | `{ runId, hijackTarget }` |
| `CONTINUATION_STATE_TOO_LARGE` | Continue-as-new state exceeds the configured serialized size limit. | `{ runId, sizeBytes, maxBytes }` |
| `INVALID_CONTINUATION_STATE` | Continue-as-new state cannot be parsed or applied. | -- |
| `RALPH_MAX_REACHED` | A Ralph loop reaches maxIterations with fail-on-max behavior. | `{ ralphId, maxIterations }` |
| `SCHEDULER_ERROR` | The scheduler cannot produce a valid execution decision. | -- |
| `SESSION_ERROR` | The workflow session state machine reaches an invalid or failed state. | -- |

## Components

| Code | When | Details |
|---|---|---|
| `TASK_ID_REQUIRED` | `<Task>` is missing a valid string id. | -- |
| `TASK_MISSING_OUTPUT` | `<Task>` is missing its output prop. | `{ nodeId }` |
| `TASK_FORK_SOURCE_NOT_FOUND` | A `<Task fork>` references a source task id that is not present in the workflow graph. | `{ nodeId, forkSource }` |
| `TASK_FORK_CYCLE` | A `<Task fork>` introduces a dependency cycle, directly or indirectly. | `{ nodeId, forkSource }` |
| `DUPLICATE_ID` | Two nodes with the same runtime id are mounted in one workflow graph. | `{ kind, id }` |
| `NESTED_LOOP` | `<Loop>` or `<Ralph>` is nested inside another loop construct that Smithers does not support. | -- |
| `WORKTREE_EMPTY_PATH` | `<Worktree>` is mounted with an empty path. | -- |
| `MDX_PRELOAD_INACTIVE` | A prompt object is rendered without the MDX preload layer being active. | -- |
| `CONTEXT_OUTSIDE_WORKFLOW` | Workflow context access happens outside an active Smithers workflow render. | -- |
| `MISSING_OUTPUT` | Code calls `ctx.output()` for a node result that does not exist. | `{ nodeId, iteration }` |
| `DEP_NOT_SATISFIED` | A typed dep on `<Task>` references an upstream output that has not been produced yet. | `{ taskId, depKey, resolvedNodeId }` |
| `ASPECT_BUDGET_EXCEEDED` | An Aspects budget (tokens, latency, or cost) has been exceeded. | `{ kind, limit, current }` |
| `APPROVAL_OUTSIDE_TASK` | `<Approval>` is resolved outside the active task runtime. | -- |
| `APPROVAL_OPTIONS_REQUIRED` | An approval mode that requires explicit options is missing them. | -- |
| `WORKFLOW_MISSING_DEFAULT` | A workflow module does not export a default Smithers workflow. | -- |

## Tools

| Code | When | Details |
|---|---|---|
| `TOOL_PATH_INVALID` | A filesystem tool receives a non-string path. | -- |
| `TOOL_PATH_ESCAPE` | A filesystem tool resolves a path outside the sandbox root, including through symlinks. | -- |
| `TOOL_FILE_TOO_LARGE` | A read or edit operation exceeds the configured file size limit. | -- |
| `TOOL_CONTENT_TOO_LARGE` | A write operation exceeds the configured content size limit. | -- |
| `TOOL_PATCH_TOO_LARGE` | An edit patch exceeds the configured patch size limit. | -- |
| `TOOL_PATCH_FAILED` | A unified diff patch cannot be applied to the target file. | -- |
| `TOOL_NETWORK_DISABLED` | The bash tool tries to access the network while network access is disabled. | -- |
| `TOOL_GIT_REMOTE_DISABLED` | The bash tool attempts a remote git operation while network access is disabled. | -- |
| `TOOL_COMMAND_FAILED` | A bash tool command exits with a non-zero status. | -- |
| `TOOL_GREP_FAILED` | The grep tool fails with an rg execution error. | -- |

## Agents

| Code | When | Details |
|---|---|---|
| `AGENT_CLI_ERROR` | A CLI-backed agent exits unsuccessfully, streams an explicit error, or its RPC transport fails. | -- |
| `AGENT_CONFIG_INVALID` | A CLI-backed agent fails with a non-retryable configuration error such as an unknown model, missing LLM, or unsupported model. | -- |
| `AGENT_RPC_FILE_ARGS` | Pi RPC mode is used with file arguments that the transport does not support. | -- |
| `AGENT_BUILD_COMMAND` | An agent implementation forbids `buildCommand()` because it uses a custom `generate()` transport. | -- |
| `AGENT_DIAGNOSTIC_TIMEOUT` | An internal agent diagnostic check exceeds the per-check timeout budget. | -- |

## Database

| Code | When | Details |
|---|---|---|
| `DB_MISSING_COLUMNS` | A table used by Smithers does not expose required columns such as `runId` or `nodeId`. | -- |
| `DB_REQUIRES_BUN_SQLITE` | The database adapter is not backed by a Bun SQLite client with `exec()`. | -- |
| `DB_QUERY_FAILED` | A database read query throws or rejects while running inside an Effect. | -- |
| `DB_WRITE_FAILED` | A database write or migration fails, including after SQLite retry exhaustion. | -- |
| `STORAGE_ERROR` | A storage service operation fails before surfacing a more specific database code. | -- |

## Effect / Runtime

| Code | When | Details |
|---|---|---|
| `INTERNAL_ERROR` | An unexpected internal exception crossed an Effect boundary without a more specific Smithers code. | -- |
| `PROCESS_ABORTED` | A spawned child process is aborted by signal or shutdown. | `{ command, args, cwd }` |
| `PROCESS_TIMEOUT` | A spawned child process exceeds its total timeout. | `{ command, args, cwd, timeoutMs }` |
| `PROCESS_IDLE_TIMEOUT` | A spawned child process stops producing output longer than its idle timeout. | `{ command, args, cwd, idleTimeoutMs }` |
| `PROCESS_SPAWN_FAILED` | The runtime cannot spawn the requested child process. | `{ command, args, cwd }` |
| `TASK_RUNTIME_UNAVAILABLE` | Builder task runtime APIs are accessed outside an executing step. | -- |

## Hot Reload

| Code | When | Details |
|---|---|---|
| `SCHEMA_CHANGE_HOT` | Hot reload detects a schema change that requires a full restart. | -- |
| `HOT_OVERLAY_FAILED` | Building or cleaning the generated hot-reload overlay fails. | -- |
| `HOT_RELOAD_INVALID_MODULE` | A hot-reloaded workflow module does not export a valid default workflow build. | -- |

## Scorers

| Code | When | Details |
|---|---|---|
| `SCORER_FAILED` | A scorer throws or rejects while Smithers is evaluating a result. | -- |

## CLI

| Code | When | Details |
|---|---|---|
| `INVALID_EVENTS_OPTIONS` | The smithers events command receives invalid filter options. | -- |
| `WORKFLOW_EXISTS` | The workflow creation CLI refuses to overwrite an existing workflow file. | -- |
| `CLI_DB_NOT_FOUND` | A CLI command cannot find a nearby `smithers.db` file. | -- |
| `CLI_AGENT_UNSUPPORTED` | The ask command selects an agent integration that Smithers does not support in that mode. | -- |

## Integrations

| Code | When | Details |
|---|---|---|
| `PI_HTTP_ERROR` | The Pi or server integration receives a non-success HTTP response from Smithers. | -- |
| `EXTERNAL_BUILD_FAILED` | An external workflow host fails to build a Smithers HostNode payload. | `{ scriptPath, error?, exitCode?, stderr?, stdout? }` |
| `SCHEMA_DISCOVERY_FAILED` | External workflow schema discovery fails or returns invalid output. | `{ scriptPath, error?, exitCode?, stderr? }` |
| `OPENAPI_SPEC_LOAD_FAILED` | An OpenAPI spec cannot be loaded or parsed. | -- |
| `OPENAPI_OPERATION_NOT_FOUND` | The requested operationId does not exist in the OpenAPI spec. | -- |
| `OPENAPI_TOOL_EXECUTION_FAILED` | An OpenAPI tool call fails during HTTP execution. | -- |

## HTTP API Errors

JSON response codes, not `SmithersErrorInstance` objects.

| Code | Status | When |
|---|---|---|
| `INVALID_REQUEST` | 400 | Invalid request body or query params |
| `PAYLOAD_TOO_LARGE` | 413 | Body exceeds `maxBodyBytes` |
| `INVALID_JSON` | 400 | Body not valid JSON |
| `SERVER_ERROR` | 500 | Unexpected server error |
| `UNAUTHORIZED` | 401 | Missing or invalid auth token |
| `WORKFLOW_PATH_OUTSIDE_ROOT` | 400 | Workflow path outside server root |
| `RUN_ID_REQUIRED` | 400 | `runId` required when `resume: true` |
| `RUN_ALREADY_EXISTS` | 409 | Run ID already exists |
| `RUN_NOT_FOUND` | 404 | No run with given ID |
| `RUN_NOT_ACTIVE` | 409 | Run not active (cannot cancel) |
| `NOT_FOUND` | 404 | Route or resource not found |
| `DB_NOT_CONFIGURED` | 400 | Server database not configured |


---

## Package Configuration

> Reference for the smithers-orchestrator package exports, TypeScript configuration, and Bun preload setup.

This page covers: CLI binary usage, subpath export map, TypeScript compiler options, Bun preload and test config, and npm scripts.

## Binary

Use `bunx smithers-orchestrator <command>` for CLI commands. The repository root keeps a private development bin that points at `apps/cli/src/index.js`; application code should import from the package exports below.

## Subpath Exports

Use the subpath form to import only the surface you need.

| Import path | Entry file | Purpose |
|---|---|---|
| `smithers-orchestrator` | `./src/index.js` | Core API: `createSmithers`, components, `runWorkflow`, `renderMdx`, errors |
| `smithers-orchestrator/gateway` | `./src/gateway.js` | Gateway server primitives from `@smithers-orchestrator/server/gateway` |
| `smithers-orchestrator/gateway-client` | `./src/gateway-client.js` | Typed client helpers from `@smithers-orchestrator/gateway-client` |
| `smithers-orchestrator/gateway-react` | `./src/gateway-react.js` | React hooks and providers for gateway-backed UIs |
| `smithers-orchestrator/sandbox` | `./src/sandbox.js` | Sandbox provider contracts, bundles, and execution helpers |
| `smithers-orchestrator/jsx-runtime` | `./src/jsx-runtime.js` | JSX runtime (auto-resolved by `jsxImportSource`) |
| `smithers-orchestrator/jsx-dev-runtime` | `./src/jsx-runtime.js` | JSX dev runtime (auto-resolved in dev mode) |
| `smithers-orchestrator/tools` | `./src/tools.js` | Tool sandbox: `defineTool`, `read`, `grep`, `bash`, `edit`, `write` |
| `smithers-orchestrator/server` | `./src/server.js` | HTTP server for run management and event streaming |
| `smithers-orchestrator/observability` | `./src/observability.js` | OpenTelemetry traces, metrics, and Prometheus integration |
| `smithers-orchestrator/mdx-plugin` | `./src/mdx-plugin.js` | Bun preload plugin for `.mdx` imports |
| `smithers-orchestrator/dom/renderer` | `./src/dom/renderer.js` | Internal renderer (advanced use) |
| `smithers-orchestrator/serve` | `./src/serve.js` | Single-workflow HTTP server via `createServeApp` |
| `smithers-orchestrator/scorers` | `./src/scorers.js` | Eval scorers: `createScorer`, `llmJudge`, `aggregateScores` |
| `smithers-orchestrator/memory` | `./src/memory.js` | Cross-run facts, message history, processors, and metrics |
| `smithers-orchestrator/openapi` | `./src/openapi.js` | Generate AI SDK tools from OpenAPI specs |
| `smithers-orchestrator/control-plane` | `./src/control-plane.js` | Organization, project, billing, usage, secret-reference, and audit primitives |

The PI plugin is published as the separate `@smithers-orchestrator/pi-plugin` package. The old `smithers-orchestrator/pi-plugin` and `smithers-orchestrator/pi-extension` subpaths are no longer exported.

## Workspace Packages

Most applications should import from `smithers-orchestrator`. The scoped workspace packages below are published for advanced integrations, custom clients, and framework development.

| Package | Primary surface | Related docs |
|---|---|---|
| `smithers-orchestrator` | Public facade for workflow authoring, components, agents, tools, server helpers, memory, OpenAPI tools, scorers, and JSX runtime setup | This page, Types |
| `@smithers-orchestrator/cli` | CLI entrypoint, MCP server, local workflow pack, account registry commands, DevTools commands, cron, alerts, and server commands | CLI Overview, MCP Server |
| `@smithers-orchestrator/accounts` | Subscription and API-key account registry helpers: `listAccounts`, `addAccount`, `removeAccount`, `getAccount`, `accountToProviderEnv` | CLI Agents |
| `@smithers-orchestrator/agents` | AI SDK and CLI agent adapters, CLI capability reports, agent contracts, and tool capability registry | CLI Agents, SDK Agents |
| `@smithers-orchestrator/components` | JSX workflow components such as `Task`, `Workflow`, `Approval`, `Sandbox`, `Timer`, `Signal`, and control-flow components | Components |
| `@smithers-orchestrator/control-plane` | Durable organization, project, team, billing, usage, secret-reference, and audit primitives for hosted deployments | Control Plane |
| `@smithers-orchestrator/db` | SQLite/Drizzle adapter, table setup, run-state derivation, schema helpers, and output tables | Data Model, Run State |
| `@smithers-orchestrator/devtools` | Snapshot, tree, diff, node lookup, task collection, and DevTools run-store helpers behind the CLI inspect commands | Debugging, CLI Overview |
| `@smithers-orchestrator/driver` | Runtime driver contracts: `RunOptions`, `RunResult`, `RunStatus`, `SmithersCtx`, outputs, task runtime, interop, and child process helpers | Run Workflow, Execution Model |
| `@smithers-orchestrator/engine` | Workflow rendering/execution API: `runWorkflow`, `renderFrame`, `workflow`, `Smithers`, `fragment`, signals, and Effect versioning | Render Frame, Run Workflow |
| `@smithers-orchestrator/errors` | Error definitions, known codes, JSON serialization, docs URLs, and type guards | Errors |
| `@smithers-orchestrator/gateway` | Stable Gateway RPC contracts, auth scopes, deployment metadata, and generated OpenAPI schema | Gateway, RPC |
| `@smithers-orchestrator/gateway-client` | Browser/client SDK for Gateway RPC requests and event streams | Gateway |
| `@smithers-orchestrator/gateway-react` | React hooks and root helpers for Gateway-backed UIs | Gateway |
| `@smithers-orchestrator/graph` | Framework-neutral workflow graph model, XML nodes, task descriptors, and graph snapshots | Planner Internals, Types |
| `@smithers-orchestrator/jj-darwin-arm64` | Vendored jj (Jujutsu) binary for darwin-arm64; auto-installed as an optional dependency of `@smithers-orchestrator/vcs`, not depended on directly | VCS Guide |
| `@smithers-orchestrator/jj-darwin-x64` | Vendored jj (Jujutsu) binary for darwin-x64; auto-installed as an optional dependency of `@smithers-orchestrator/vcs`, not depended on directly | VCS Guide |
| `@smithers-orchestrator/jj-linux-arm64` | Vendored jj (Jujutsu) binary for linux-arm64; auto-installed as an optional dependency of `@smithers-orchestrator/vcs`, not depended on directly | VCS Guide |
| `@smithers-orchestrator/jj-linux-x64` | Vendored jj (Jujutsu) binary for linux-x64; auto-installed as an optional dependency of `@smithers-orchestrator/vcs`, not depended on directly | VCS Guide |
| `@smithers-orchestrator/jj-win32-x64` | Vendored jj (Jujutsu) binary for win32-x64; auto-installed as an optional dependency of `@smithers-orchestrator/vcs`, not depended on directly | VCS Guide |
| `@smithers-orchestrator/memory` | Cross-run facts, message history, processors, namespaces, service layer, and metrics | Memory, Memory Quickstart |
| `@smithers-orchestrator/observability` | Event types, logging, tracing, metrics, Prometheus rendering, and runtime observability layers | Events, Event Types |
| `@smithers-orchestrator/openapi` | OpenAPI parsing, operation extraction, AI SDK tool generation, schema conversion, and metrics | OpenAPI Tools, OpenAPI Quickstart |
| `@smithers-orchestrator/pi-plugin` | PI extension runtime, views, API wrappers, and workflow inspection integration | PI Integration |
| `@smithers-orchestrator/protocol` | Shared contracts, small value types, and protocol-level errors for cross-package use | Types, Errors |
| `@smithers-orchestrator/react-reconciler` | Custom React reconciler, host context, DOM renderer, DevTools preload, driver, and JSX runtime internals | Why React, Render Frame |
| `@smithers-orchestrator/sandbox` | Sandbox bundle, execute, and transport primitives used by the `Sandbox` component | Sandbox |
| `@smithers-orchestrator/scheduler` | Pure workflow state machine: task state, scheduler decisions, retry/cache policies, wait reasons, and workflow session services | Workflow State, Suspend and Resume |
| `@smithers-orchestrator/scorers` | Scorer definitions, LLM judges, batch execution, aggregation, persistence schema, and metrics | Evals, Evals Quickstart |
| `@smithers-orchestrator/server` | HTTP, WebSocket, Gateway, cron, webhook, metrics, and single-workflow serving APIs | Server, Serve |
| `@smithers-orchestrator/smithers` | Cerebras chat PWA and workflow graph demo application | CLI Overview |
| `@smithers-orchestrator/smithers-demo` | Interactive Smithers workflow generator demo with React Flow visualization | CLI Overview |
| `@smithers-orchestrator/smithers-studio-2` | Next Smithers Studio UI shell for browsing and driving workflow runs | CLI Overview |
| `@smithers-orchestrator/smithers-tui-demo` | Private React TUI demo app for terminal workflow interaction experiments | CLI Overview |
| `@smithers-orchestrator/time-travel` | Snapshots, diffs, forks, replay, timelines, VCS tags, rewind locks/audits, and time-travel metrics | Time Travel, Time Travel Quickstart |
| `@smithers-orchestrator/tool-context` | AsyncLocalStorage tool-execution context (run/node/idempotency keys, durability snapshot hook) shared by the engine and `smithers-orchestrator` without a dependency cycle | Execution Model |
| `@smithers-orchestrator/usage` | Account quota and rate-limit usage reporting for Smithers providers | CLI Overview |
| `@smithers-orchestrator/vcs` | VCS discovery and jj workspace operations such as `runJj`, `workspaceAdd`, `workspaceList`, and pointer reverts | VCS Helpers, VCS Guide |

### Usage

```ts
// Core API
import { createSmithers, runWorkflow } from "smithers-orchestrator";

// Tools
import { defineTool, bash, read, write } from "smithers-orchestrator/tools";

// Scorers
import { createScorer, llmJudge } from "smithers-orchestrator/scorers";

// MDX plugin (in preload.ts)
import { mdxPlugin } from "smithers-orchestrator/mdx-plugin";

// Control-plane primitives
import { ControlPlaneStore } from "smithers-orchestrator/control-plane";
```

## TypeScript Configuration

### JSX Import Source

```json
{
  "compilerOptions": {
    "jsx": "react-jsx",
    "jsxImportSource": "smithers-orchestrator"
  }
}
```

This tells TypeScript to resolve JSX transforms from `smithers-orchestrator/jsx-runtime` instead of `react/jsx-runtime`. The Smithers JSX runtime re-exports React's runtime, so component behavior is identical. This setting enables proper type resolution for Smithers workflow components.

See JSX Installation for the complete TypeScript setup.

### Path Aliases

When developing inside the `smithers-orchestrator` monorepo, the root `tsconfig.json` defines path aliases so source imports resolve without a build step:

```jsonc
"paths": {
  "smithers-orchestrator": ["./packages/smithers/src/index.js"],
  "smithers-orchestrator/jsx-runtime": ["./packages/smithers/src/jsx-runtime.js"],
  "smithers-orchestrator/jsx-dev-runtime": ["./packages/smithers/src/jsx-runtime.js"],
  "smithers-orchestrator/tools": ["./packages/smithers/src/tools.js"],
  "smithers-orchestrator/*": [
    "./packages/smithers/src/*.js",
    "./packages/smithers/src/*/index.js"
  ],
  "smithers-orchestrator/scorers": ["./packages/scorers/src/index.js"],
  "@smithers-orchestrator/agents": ["./packages/agents/src/index.js"],
  "@smithers-orchestrator/gateway-client": ["./packages/gateway-client/src/index.ts"],
  "@smithers-orchestrator/gateway-react": ["./packages/gateway-react/src/index.ts"],
  "@smithers-orchestrator/pi-plugin": ["./packages/pi-plugin/src/index.ts"],
  "@smithers-orchestrator/server": ["./packages/server/src/index.js"]
  // The root tsconfig includes the same style of alias for each workspace package.
}
```

The root package is a private `smithers-monorepo`; `smithers-orchestrator` resolves to `packages/smithers`.

**End users do not need path aliases**, only framework developers do. Installing `smithers-orchestrator` as a dependency lets Node/Bun module resolution handle import paths automatically.

### Local Type Root Shims

```json
"typeRoots": ["./packages/smithers/src/types", "./node_modules/@types"]
```

The `./packages/smithers/src/types` directory contains ambient type declarations that fill gaps in third-party packages. One shim ships today:

- `react-dom-server.d.ts` -- Declares the `react-dom/server` module so TypeScript doesn't error when server-side rendering types are referenced.

End users should add `@types/react-dom` to `devDependencies` instead of relying on this shim.

## Bun Configuration

### Runtime Preload

```toml
# bunfig.toml
preload = ["./preload.ts"]
```

The preload script registers the MDX esbuild plugin with Bun's bundler so `.mdx` files can be imported as JSX components at runtime. See MDX Prompts for details.

### Test Configuration

```toml
[test]
root = "./tests"
preload = ["./preload.ts"]
```

| Key | Value | Purpose |
|---|---|---|
| `root` | `./tests` | Bun discovers test files from this directory instead of scanning the entire project |
| `preload` | `["./preload.ts"]` | Registers the MDX plugin for test files so `.mdx` imports work in tests |

The test preload is separate from the runtime preload. Both point to the same file, but Bun's `[test]` section only applies when running `bun test`. Without it, tests that import `.mdx` files fail with a module resolution error.

## npm Scripts

Defined in the root `package.json` for development:

| Script | Command | Purpose |
|---|---|---|
| `typecheck` | `tsc --noEmit` | Type-check the `src/` and `tests/` trees against `tsconfig.json` |
| `typecheck:examples` | `tsc -p examples/tsconfig.json --noEmit` | Type-check example files against a separate config that maps `smithers-orchestrator` to `examples-entry.js` |
| `lint` | `oxlint ... packages/*/src packages/*/tests` | Lint package source and tests with oxlint |
| `lint:fix` | `oxlint ... --fix --fix-suggestions packages/*/src packages/*/tests` | Apply supported oxlint fixes |
| `cli` | `bun run apps/cli/src/index.js` | Run the local development CLI entrypoint |
| `test` | `node scripts/check-single-effect-version.mjs && node scripts/check-dependency-boundaries.mjs && pnpm -r test` | Run the dependency guard checks and each package test script |
| `check:effect` | `node scripts/check-single-effect-version.mjs` | Verify the workspace resolves a single Effect version |
| `check:deps` | `node scripts/check-dependency-boundaries.mjs` | Verify package dependency boundaries |
| `docs` | `cd docs && bunx mintlify dev` | Start the Mintlify docs dev server for local preview |
| `version` | `node scripts/bump.mjs` | Bump package versions using the release helper |
| `release` | `node scripts/publish.mjs` | Publish packages using the release helper |

### For end-user projects

When scaffolding your own project (with `bunx smithers-orchestrator init` or manually), add a typecheck script:

```json
{
  "scripts": {
    "typecheck": "tsc --noEmit"
  }
}
```

See Production Project Structure for a complete user-project `package.json` example.

---

## VCS Helper Reference

> Public JJ helper APIs exported by smithers-orchestrator for repo detection, snapshot inspection, and workspace management.

Smithers exports a small JJ helper surface for applications that inspect or manage Jujutsu state directly.

Lightweight by design:

- every helper accepts an optional `cwd` to target a specific repository
- spawn failures are normalized instead of throwing, so they are safe to call even when `jj` is not installed
- workspace helpers try a few command shapes to tolerate JJ version drift

## Import

```ts
import {
  runJj,
  getJjPointer,
  revertToJjPointer,
  isJjRepo,
  workspaceAdd,
  workspaceList,
  workspaceClose,
} from "smithers-orchestrator";
```

## `runJj(args, opts?)`

Run an arbitrary `jj` command and capture its output.

```ts
const result = await runJj(["status"], { cwd: "/path/to/repo" });
```

```ts
type RunJjOptions = {
  cwd?: string;
};

type RunJjResult = {
  code: number;
  stdout: string;
  stderr: string;
};
```

Notes:

- returns `{ code: 127, stdout: "", stderr: "..." }` when `jj` cannot be started
- does not throw for ordinary process failures
- a raw escape hatch beyond the higher-level helpers below

## `getJjPointer(cwd?)`

Return the current workspace `change_id` for `@`, or `null` when JJ is unavailable or the current directory is not a JJ repo.

```ts
const pointer = await getJjPointer("/path/to/repo");
```

```ts
function getJjPointer(cwd?: string): Promise<string | null>;
```

Smithers uses the same pointer model internally for revert support and cache invalidation.

## `revertToJjPointer(pointer, cwd?)`

Restore the working copy from a previously recorded JJ pointer. A pointer is a JJ `change_id` string, as returned by `getJjPointer`.

```ts
const result = await revertToJjPointer("zqkopwvn", "/path/to/repo");
```

```ts
type JjRevertResult = {
  success: boolean;
  error?: string;
};
```

This helper wraps `jj restore --from <pointer>`.

## `isJjRepo(cwd?)`

Detect whether a directory is a readable JJ repository.

```ts
const enabled = await isJjRepo("/path/to/repo");
```

```ts
function isJjRepo(cwd?: string): Promise<boolean>;
```

Use this before showing JJ-specific UI or attempting a revert flow.

## `workspaceAdd(name, path, opts?)`

Create a JJ workspace with a friendly name at a target filesystem path.

```ts
const result = await workspaceAdd("feature-auth", "/tmp/wt-feature-auth", {
  cwd: "/path/to/repo",
  atRev: "@",
});
```

```ts
type WorkspaceAddOptions = {
  cwd?: string;
  atRev?: string;
};

type WorkspaceResult = {
  success: boolean;
  error?: string;
};
```

Behavior notes:

- removes an existing workspace with the same name before retrying
- removes the target directory if it exists and creates its parent directory if needed
- tries multiple `jj workspace add` syntaxes to work across JJ versions

## `workspaceList(cwd?)`

List known workspaces for the current JJ repo.

```ts
const workspaces = await workspaceList("/path/to/repo");
```

```ts
type WorkspaceInfo = {
  name: string;
  path: string | null;
  selected: boolean;
};
```

Prefers template output when supported, falls back to parsing the human-readable `jj workspace list` output.

## `workspaceClose(name, opts?)`

Forget a JJ workspace by name.

```ts
const result = await workspaceClose("feature-auth", {
  cwd: "/path/to/repo",
});
```

```ts
function workspaceClose(
  name: string,
  opts?: { cwd?: string },
): Promise<WorkspaceResult>;
```

This wraps `jj workspace forget <name>`.

## When To Use These Helpers

Use these helpers when your application needs to:

- show whether JJ-backed revert is available
- record or inspect a pointer outside the Smithers engine
- manage JJ workspaces directly from an app or integration layer

For workflow-level revert behavior, prefer the runtime and CLI docs:

- VCS Integration
- CLI Reference
- Revert

---

## runWorkflow

> Programmatic entry point. Equivalent to `bunx smithers-orchestrator up`.

```ts
import { runWorkflow } from "smithers-orchestrator";
import { Effect } from "effect";

const result = await Effect.runPromise(runWorkflow(workflow, {
  input: { task: "fix bug" },
}));

result.runId;     // string
result.status;    // "finished" | "failed" | "cancelled" | "continued" | "waiting-approval" | "waiting-event" | "waiting-timer"
result.output;    // populated only if your schema has a key literally named `output`
result.error;     // serialized SmithersError on failure
```

Signature:

```ts
function runWorkflow<Schema>(
  workflow: SmithersWorkflow<Schema>,
  opts: RunOptions,                 // see Types
): Effect.Effect<RunResult, SmithersError>;
```

Both `RunOptions` and `RunResult` are defined in Types.

## Resume

Pass the original `runId` plus `resume: true`. State loads from SQLite, completed tasks are skipped, in-progress attempts older than 15 minutes are abandoned and retried.

```ts
await Effect.runPromise(runWorkflow(workflow, { input: {}, runId: "my-run-123", resume: true }));
```

The original input row is loaded from the DB; pass `{}` for `input`. The workflow file hash and VCS root must match the original run.

## Cancel via AbortSignal

```ts
const controller = new AbortController();
setTimeout(() => controller.abort(), 5 * 60 * 1000);

const result = await Effect.runPromise(runWorkflow(workflow, { input: {...}, signal: controller.signal }));
// result.status === "cancelled"
```

All in-flight attempts are marked cancelled and `NodeCancelled` events are emitted.

## Hijack handoff

If a CLI hijack happens mid-run (`bunx smithers-orchestrator hijack RUN_ID`), the run ends `"cancelled"` and the latest attempt metadata stores `hijackHandoff`. On `resume: true`, Smithers waits for a safe handoff point and continues with the persisted CLI session id (see CLI Agents) or the persisted message history (SDK agents).

## `result.output`

Populated only when the schema passed to `createSmithers()` has a key literally named `output`. Other schema rows live in their own SQLite tables, so query them directly:

```ts
import { Database } from "bun:sqlite";
const db = new Database("smithers.db", { readonly: true });
const rows = db.query("SELECT * FROM page WHERE run_id = ? ORDER BY iteration DESC").all(result.runId);
```

## Notes

- On macOS, `runWorkflow` acquires a `caffeinate` lock to prevent idle sleep and releases it on completion. On other platforms this is a no-op.
- Set `SMITHERS_LOG_LEVEL=debug` to enable verbose engine logging.
- For lifecycle events, pass `onProgress` (see Events).

---

## renderFrame

> Render a workflow tree to a GraphSnapshot without executing.

Use `renderFrame` to preview the task graph (for CI validation, graph-inspection UIs, or dry-run checks) without executing or persisting anything. `runId` and `iteration` are arbitrary strings used only for snapshot identity.

```ts
import { renderFrame } from "smithers-orchestrator";
import { SmithersCtx } from "@smithers-orchestrator/driver";
import { Effect } from "effect";

const ctx = new SmithersCtx({ runId: "preview", iteration: 0, input: { task: "preview" }, outputs: {} });
const snap = await Effect.runPromise(renderFrame(workflow, ctx));

snap.frameNo;       // 0
snap.tasks;         // TaskDescriptor[]
snap.xml;           // XmlNode | null (see Types)
```

`TaskDescriptor` and `GraphSnapshot` are defined in Types. Same shape the runtime extracts on every render frame; `renderFrame` doesn't execute or persist.

`outputs` lets you simulate completed upstream tasks:

```ts
const ctx = new SmithersCtx({
  runId: "sim", iteration: 0, input: { x: 1 },
  outputs: {
    analyze: [{ runId: "sim", nodeId: "analyze", iteration: 0, summary: "..." }],
  },
});
const snap = await Effect.runPromise(renderFrame(workflow, ctx));
```

CLI equivalent:

```bash
bunx smithers-orchestrator graph workflow.tsx --input '{"task":"preview"}'
```

---

## Revert to Attempt

> Restore the workspace to a previous task attempt's filesystem state via JJ.

Each successful task attempt captures the current [JJ](https://jj-vcs.github.io/jj/) change ID into `_smithers_attempts.jj_pointer`. `revert` restores the workspace to that state and discards any graph snapshots recorded after the attempt began, so the run's timeline rolls back to the point-in-time of that attempt.

```bash
bunx smithers-orchestrator revert workflow.tsx \
  --run-id RUN_ID --node-id NODE_ID [--attempt N=1] [--iteration N=0]
```

Revert restores files only; it doesn't alter JJ history. The restoration creates a new change on top of the current working copy.

## Requirements

- JJ in `PATH` (`brew install jj` or `cargo install jj-cli`)
- Workspace is a JJ repository (`jj git init` or `jj init`)
- The target attempt was completed when JJ was available (otherwise no pointer was captured)

## Programmatic

```ts
import { revertToAttempt } from "smithers-orchestrator";

const result = await revertToAttempt(adapter, {
  runId: "abc123",
  nodeId: "implement",
  attempt: 2,
  iteration: 0,
});
```

where `adapter` is the `SmithersDb` instance for the run's database.

`revertToAttempt` returns `{ success: false, error: string }` if the attempt is not found or has no recorded JJ pointer (it does not throw for these cases). Inspect the returned `result.success` boolean to detect failures. On success it returns `{ success: true, jjPointer: string }`.

---

## Run State

> The single typed model for "what is this run doing right now?"

Every Smithers surface (CLI, Gateway, and programmatic clients) answers
"what is this run doing right now?" by reading the same `RunStateView`,
computed server-side from persisted state plus liveness signals.

Surfaces never infer status from `ps`, event absence, or partial table
reads. They call `computeRunState`.

## RunState

```ts
type RunState =
  | "running"           // owner alive, work progressing
  | "waiting-approval"  // blocked on a human decision
  | "waiting-event"     // blocked on an external signal
  | "waiting-timer"     // blocked on a scheduled wakeup
  | "recovering"        // supervisor replaying / resuming
  | "stale"             // owner heartbeat expired, not yet recovered
  | "orphaned"          // owner gone, no supervisor candidate
  | "failed"
  | "cancelled"
  | "succeeded"
  | "unknown"           // telemetry gap; never invent a state
```

When state cannot be determined (e.g. a missing heartbeat with ambiguous DB status), the runtime returns `unknown`. Never invent a more specific state.

The legacy run-row `status` column maps to `RunState` like this:

| `_smithers_runs.status` | `RunState`                     |
| ----------------------- | ------------------------------ |
| `running` (fresh)       | `running`                      |
| `running` (stale)       | `stale` or `orphaned`          |
| `waiting-approval`      | `waiting-approval`             |
| `waiting-event`         | `waiting-event`                |
| `waiting-timer`         | `waiting-timer`                |
| `finished`              | `succeeded`                    |
| `continued`             | `succeeded`                    |
| `failed`                | `failed`                       |
| `cancelled`             | `cancelled`                    |
| missing / unknown text  | `unknown`                      |

`recovering` is reserved for the supervisor takeover window. As of this writing, it is defined but not yet emitted by the runtime.

## ReasonBlocked / ReasonUnhealthy

Every non-terminal, non-`running` state carries a typed reason.

```ts
type ReasonBlocked =
  | { kind: "approval"; nodeId: string; requestedAt: string }
  | { kind: "event";    nodeId: string; correlationKey: string }
  | { kind: "timer";    nodeId: string; wakeAt: string }
  | { kind: "provider"; nodeId: string; code: "rate-limit" | "auth" | "timeout" }
  | { kind: "tool";     nodeId: string; toolName: string; code: string }

type ReasonUnhealthy =
  | { kind: "engine-heartbeat-stale"; lastHeartbeatAt: string }
  | { kind: "ui-heartbeat-stale";     lastSeenAt: string }
  | { kind: "db-lock" }
  | { kind: "sandbox-unreachable" }
  | { kind: "supervisor-backoff"; attempt: number; nextAt: string }
```

Timestamps are ISO-8601 strings.

## RunStateView

```ts
type RunStateView = {
  runId: string;
  state: RunState;
  blocked?: ReasonBlocked;
  unhealthy?: ReasonUnhealthy;
  computedAt: string;        // ISO-8601
};
```

`blocked` is set when `state` is one of the `waiting-*` values.
`unhealthy` is set when `state` is `stale`, `orphaned`, or `recovering`.
Terminal states (`succeeded`, `failed`, `cancelled`) carry neither.

## computeRunState

```ts
import { computeRunState } from "@smithers-orchestrator/db/runState";

const view = await computeRunState(adapter, runId);
view.state;       // "running" | ...
view.blocked;     // present iff state is "waiting-*"
view.unhealthy;   // present iff state is "stale" | "orphaned" | "recovering"
```

`computeRunState` is pure over the DB plus the heartbeat / lease signals
on the run row. It does not call `ps`, does not probe sockets, and does
not run heuristics.

`deriveRunState` is the underlying pure function, useful in tests or
when you already have the rows in memory:

```ts
import { deriveRunState } from "@smithers-orchestrator/db/runState";

const view = deriveRunState({
  run,
  pendingApproval,
  pendingTimer,
  pendingEvent,
  now: 1_700_000_000_000,
  staleThresholdMs: 30_000,
});
```

The default `staleThresholdMs` is `30_000`, the same threshold the
engine uses for `isRunHeartbeatFresh`.

## Where it shows up

`RunStateView` is the wire format on every read surface:

- `bunx smithers-orchestrator inspect RUN_ID`: top-level `runState` field on the JSON
  output (and rendered in the human view).
- Gateway RPC `getRun`: `runState` field on the response.
- DevTools snapshot header: `runState?: RunStateView` field.
- Event stream: `RunStateChanged` event with `before` and `after`
  (emitted by the recovery state machine).

A run id that does not exist is not a `RunState`. It's an error
(`RUN_NOT_FOUND`). `unknown` is for ambiguity, not for "doesn't exist."

---

## TUI Removed

> The old smithers tui command is no longer shipped; use the CLI commands instead.

`bunx smithers-orchestrator tui` was removed in `0.20.2`. Use the commands in the table below instead.

The removal dropped the OpenTUI dependency and the `apps/cli/src/tui/` tree from the package.

## Replacements

| Need | Use |
| --- | --- |
| Active run list | `ps --watch` |
| Lifecycle event stream | `events --watch` or `logs --follow` |
| Agent transcript | `chat --follow` |
| Run detail | `inspect --watch` |
| Node attempts, tools, and output | `node --watch`, `output`, `diff` |
| Rendered DevTools tree | `tree --watch` |
| Desktop control plane | `gui` and the CLI overview |
| Remote or browser control plane | `up --serve`, Gateway, and MCP Server |

For a browser or remote control plane, `up --serve` exposes an HTTP API; the Gateway and MCP Server docs explain multi-user and agent-driven access.

Historical TUI product/design material is kept for reference, but it does not describe a shipped command.

---

## Workflow optimization

> Optimize Smithers workflow prompts against eval suites with GEPA-style prompt artifacts.

Run `bunx smithers-orchestrator optimize` to generate improved prompts for agent tasks via GEPA, verify the improvement against your eval suite, and save the result as a reusable artifact.

```bash
bunx smithers-orchestrator optimize workflow.tsx \
  --cases evals/smoke.jsonl \
  --suite smoke-gepa \
  --provider cerebras \
  --model gpt-oss-120b \
  --artifact .smithers/optimizations/smoke-gepa.json
```

`bunx smithers-orchestrator optimize` runs the eval suite twice:

1. baseline run with the workflow's current prompts
2. optimized run with GEPA-generated prompt patches applied

The command writes the artifact only when the optimized score improves by at least `--min-improvement`. Reports for both runs are written under `.smithers/optimizations/reports` unless `--report-dir` is set.

## Reuse an artifact

Apply the optimized prompts to future evals with `--optimization`:

```bash
bunx smithers-orchestrator eval workflow.tsx \
  --cases evals/smoke.jsonl \
  --suite smoke-optimized \
  --optimization .smithers/optimizations/smoke-gepa.json
```

The artifact patches only agent-backed `<Task>` prompts by `nodeId`. Workflow structure, output schemas, retries, approvals, and persistence behavior stay unchanged.

## Cerebras improvement demo

Example: the following run demonstrates a baseline failure corrected by a GEPA-generated patch. The baseline prompt did not include the required optimization token, so the eval failed. Cerebras GEPA generated a prompt patch that included the missing requirement, and the optimized eval passed.

```bash
CEREBRAS_API_KEY=... bunx smithers-orchestrator optimize workflow.tsx \
  --cases evals/opt.jsonl \
  --suite cerebras-proof \
  --provider cerebras \
  --model gpt-oss-120b \
  --artifact artifacts/optimized.json \
  --report-dir artifacts/reports \
  --format json
```

Observed result:

```json
{
  "baseline": { "score": 0.1, "passed": 0, "total": 1 },
  "optimized": { "score": 1, "passed": 1, "total": 1 },
  "improved": true,
  "provider": "cerebras",
  "model": "gpt-oss-120b"
}
```

## Providers

`bunx smithers-orchestrator optimize` accepts the same provider vocabulary Smithers uses for agents and accounts:

| Provider | Optimizer API | Required env | Default model |
| --- | --- | --- | --- |
| `cerebras` | OpenAI-compatible | `CEREBRAS_API_KEY` | `gpt-oss-120b` |
| `openai-api`, `openai`, `openai-sdk`, `codex` | OpenAI-compatible | `OPENAI_API_KEY` | `gpt-5.3-codex` |
| `anthropic-api`, `anthropic`, `anthropic-sdk`, `claude-code`, `claude` | Anthropic Messages API | `ANTHROPIC_API_KEY` | `claude-opus-4-7` |
| `gemini-api`, `gemini`, `antigravity` | Gemini generateContent API | `GEMINI_API_KEY` or `GOOGLE_API_KEY` | `gemini-3.1-pro-preview` |
| `kimi`, `moonshot` | OpenAI-compatible Moonshot API | `MOONSHOT_API_KEY` | `kimi-latest` |
| `opencode` | OpenAI-compatible endpoint | `SMITHERS_OPTIMIZER_API_KEY` and `SMITHERS_OPTIMIZER_BASE_URL` | `anthropic/claude-sonnet-4-5` |
| `pi` | OpenAI-compatible endpoint | `SMITHERS_OPTIMIZER_API_KEY` and `SMITHERS_OPTIMIZER_BASE_URL` | `gpt-5.3-codex` |
| `amp`, `forge`, `openai-compatible` | OpenAI-compatible endpoint | `SMITHERS_OPTIMIZER_API_KEY` and `SMITHERS_OPTIMIZER_BASE_URL` | pass `--model` when needed |

The CLI provider names (`claude-code`, `codex`, `antigravity`, `gemini`, `kimi`) map to their hosted API equivalents for optimization because GEPA needs a direct model call to propose prompt patches. Providers with no single hosted backend (`opencode`, `pi`, `amp`, `forge`) are still accepted through a generic OpenAI-compatible endpoint.

`--provider heuristic` is deterministic and intended for local tests and fixtures. Use `heuristic` when you want deterministic optimization without an API call: place `optimizationHints` in each case's `metadata` to control the patch. It uses eval-case metadata such as:

```json
{
  "metadata": {
    "optimizationHints": {
      "answer": "Include the exact rubric requirement in the task prompt."
    }
  }
}
```

Artifacts are Smithers JSON records with baseline score, optimized score, improvement, prompt patches, and linked eval reports.


===============================================================================

# Smithers Memory

> Smithers cross-run memory: working memory, message history, semantic recall, processors.

---

## Memory

> Cross-run facts, message history, namespaces, and task memory metadata.

Memory persists state **across runs**. Task outputs are per-run; memory is per-namespace and survives every workflow execution.

```mermaid
flowchart LR
  W[workflow] --> ST
  AG[agent] --> ST
  U[user] --> ST
  GL[global] --> ST
  ST[(Memory store<br/>one SQLite db)] --> F[Facts · namespaced JSON, optional TTL]
  ST --> H[Message history · ordered threads]
  style ST fill:#def,stroke:#36c
```

## Three layers

```ts
import { createMemoryStore } from "smithers-orchestrator/memory";
import { Database } from "bun:sqlite";

const store = createMemoryStore(new Database("smithers.db"));
const ns = { kind: "workflow" as const, id: "code-review" };
```

| Layer | API | Use for |
|---|---|---|
| Facts | `store.setFact(ns, key, value, ttlMs?)` / `store.getFact(ns, key)` / `store.listFacts(ns)` | Namespaced JSON facts. Optional TTL. Last-write-wins. |
| Message history | `store.createThread(ns, title?)`, `store.saveMessage(msg)`, `store.listMessages(threadId, limit)` | Ordered chat threads per agent or user. |
| Maintenance | `store.deleteExpiredFacts()` plus processors | TTL cleanup and history compaction. |

## Namespaces

```ts
type MemoryNamespace = { kind: "workflow" | "agent" | "user" | "global"; id: string };
```

Pick the kind to match the lifetime: `workflow` is scoped to a workflow definition; `agent` to an agent identity; `user` to an end user; `global` is shared across everything.

## Task Memory Metadata

```tsx
<Task
  id="review"
  output={outputs.review}
  agent={reviewer}
  memory={{
    recall:   { namespace: ns, topK: 3 },               // inject relevant past facts into prompt
    remember: { namespace: ns, key: "last-review" },    // persist this output as a fact
    threadId: `${ctx.input.repo}:reviews`,              // append messages to this thread
  }}
>
  Review the latest PR.
</Task>
```

`memory` is preserved on the task descriptor as metadata for runtimes and integrations that layer memory behavior onto task execution. The direct store APIs above are the current public memory read/write surface.

## Processors

Maintenance jobs you run periodically:

```ts
import { TtlGarbageCollector, TokenLimiter, Summarizer } from "smithers-orchestrator/memory";

const gc         = TtlGarbageCollector();   // expire facts past their TTL
const limiter    = TokenLimiter(4000);      // keep history under token budget
const summarizer = Summarizer(myAgent);     // compress old messages with an LLM

await gc.process(store);
await limiter.process(store);
await summarizer.process(store);
```

```mermaid
flowchart LR
  M[Facts + messages accumulate] --> GC[TtlGarbageCollector<br/>drop expired facts]
  GC --> TL[TokenLimiter<br/>trim history to budget]
  TL --> SM[Summarizer<br/>compress old messages with an LLM]
  SM --> K[Compact, in-budget memory]
  style K fill:#dfe,stroke:#3a3
```

## Inspect from the CLI

```bash
bunx smithers-orchestrator memory list workflow:code-review -w workflow.tsx
```

The CLI currently exposes fact listing. Use the store API for writes, deletes, threads, messages, and TTL cleanup.

## Notes

- Memory and task outputs are **distinct stores**. Don't use memory for run-scoped state; it's not transactional with the workflow's frame commits.
- Working-memory writes are unordered. Use message history when sequence matters.

---

## Memory Quickstart

> Folded into the opt-in memory fragment.

This material is now in the opt-in `/llms-memory.txt` fragment.


===============================================================================

# Smithers OpenAPI Tools

> Smithers OpenAPI tools: turn an OpenAPI spec into AI SDK tools, with auth, filters, and observability.

---

## OpenAPI Tools

> Generate AI SDK tools from an OpenAPI spec. Auth, filtering, observability built in.

`createOpenApiTools` parses an OpenAPI 3.x spec and returns AI SDK tools, one per operation, with Zod schemas converted from the spec's JSON schemas.

```ts
import { createOpenApiTools } from "smithers-orchestrator/openapi";

const tools = await createOpenApiTools("./petstore.json", {
  baseUrl: "https://api.petstore.example.com",
  auth: { type: "bearer", token: process.env.PETSTORE_TOKEN! },
});

const agent = new ToolLoopAgent({
  model: anthropic("claude-sonnet-4-20250514"),
  tools,
});
```

## Options

```ts
type OpenApiToolsOptions = {
  baseUrl?: string;                  // overrides spec.servers[0].url
  headers?: Record<string, string>;  // merged into every request
  auth?:
    | { type: "apiKey"; name: string; in: "header" | "query"; value: string }
    | { type: "bearer"; token: string }
    | { type: "basic"; username: string; password: string };
  include?: string[];                // operationId allowlist
  exclude?: string[];                // operationId blocklist
  namePrefix?: string;               // prefix generated tool names
};
```

Pass the spec itself as the first argument to `createOpenApiTools(input, options)`. `input` may be a parsed spec object, file path, URL, or raw OpenAPI text.

## Public API

| Export | Purpose |
|---|---|
| `createOpenApiTools(input, options?)` / `createOpenApiToolsSync(input, options?)` | Build a record of AI SDK tools keyed by operationId. |
| `createOpenApiTool(input, operationId, options?)` / `createOpenApiToolSync(...)` | Build one operation as a tool. |
| `listOperations(input)` | Return parsed operation metadata without creating tools. |
| `extractOperations(spec)` | Extract operations from an already parsed spec. |
| `loadSpecEffect(input)` / `loadSpecSync(input)` | Load and parse a spec from object, path, URL, or raw text. |
| `jsonSchemaToZod(schema)` / `buildOperationSchema(...)` | Lower-level schema conversion helpers. |

## CLI preview

```bash
bunx smithers-orchestrator openapi list ./api/openapi.yaml
```

Lists every operationId, method, path, and summary. Useful for auditing what an agent will be able to call before wiring it up.

## What gets generated

For each operation:

- A Zod schema for the request body + path/query/header parameters.
- An `execute(args)` function that performs the HTTP call.
- The operation's `summary` / `description` becomes the tool description.

## Response handling

- JSON responses are parsed and returned as objects.
- Non-JSON responses are returned as strings.
- HTTP status codes are not special-cased by the tool factory; if the server returns a JSON or text error body, that body is returned to the agent.
- Request failures, fetch failures, and schema/tool execution exceptions are returned as `{ error: true, message, status: "failed" }` so the agent can react in its loop.

## Filtering

`include` / `exclude` accept arrays of `operationId`. If both are set, exclude wins. Useful for limiting an agent's surface area to a specific feature ("just the inventory endpoints"). `namePrefix` prefixes generated tool names without changing the operationIds used for filtering.

## Observability

Each tool call emits an `OpenApiToolCalled` event with `operationId`, `method`, `path`, `durationMs`, `status`. Visible via `bunx smithers-orchestrator events RUN_ID --type openapi`.

## Notes

- Schema composition (`allOf`, `anyOf`, `oneOf`) is supported; converts to Zod unions/intersections.
- Nullable fields and defaults from the spec are preserved.

---

## OpenAPI Tools Quickstart

> Folded into the opt-in OpenAPI fragment.

This material is now in the opt-in `/llms-openapi.txt` fragment.


===============================================================================

# Smithers Observability

> Smithers observability surface: HTTP server, gateway, MCP, OpenTelemetry, metrics.

---

## HTTP Server

> Run Smithers as an HTTP server: REST routes for runs, approvals, tools.

`startServer` boots a multi-workflow HTTP server with REST endpoints for run lifecycle, SSE event streams, and human-in-the-loop approvals. For a single-workflow variant alongside `bunx smithers-orchestrator up`, see Serve Mode.

## Quick start

```ts
import { startServer } from "smithers-orchestrator";
import { drizzle } from "drizzle-orm/bun-sqlite";

const server = startServer({
  port: 7331,
  db: drizzle("./smithers.db"),
  authToken: process.env.SMITHERS_API_KEY,
  rootDir: process.cwd(),
});
```

```bash
curl -X POST http://localhost:7331/v1/runs \
  -H "Authorization: Bearer $SMITHERS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"workflowPath": "./bugfix.tsx", "input": {"description": "fix auth"}}'
```

## ServerOptions

```ts
type ServerOptions = {
  port?: number;              // default 7331
  db?: unknown;               // enables GET /v1/runs and approvals listing
  authToken?: string;         // falls back to process.env.SMITHERS_API_KEY
  maxBodyBytes?: number;      // default 1_048_576
  rootDir?: string;           // sandbox root for workflow paths and tools
  allowNetwork?: boolean;     // default false; allows network in `bash`
  headersTimeout?: number;    // default 30_000
  requestTimeout?: number;    // default 60_000
};
```

`startServer` returns a listening `http.Server`. `headersTimeout` and `requestTimeout` are applied to that server to bound slow header/body uploads.

## API Routes (TOON)

```toon
routes[15]{method,path,purpose,auth}:
  GET,/metrics,Prometheus exposition,bearer
  POST,/v1/runs,Start or resume a run,bearer
  GET,/v1/runs,List runs (requires db),bearer
  GET,/v1/runs/:runId,Run status and node summary,bearer
  POST,/v1/runs/:runId/resume,Resume paused or failed run,bearer
  POST,/v1/runs/:runId/cancel,Abort an active run,bearer
  GET,/v1/runs/:runId/events,SSE event stream (?afterSeq=N),bearer
  GET,/v1/runs/:runId/frames,List render frames,bearer
  POST,/v1/runs/:runId/nodes/:nodeId/approve,Approve a paused node,bearer
  POST,/v1/runs/:runId/nodes/:nodeId/deny,Deny a paused node,bearer
  POST,/v1/runs/:runId/signals/:signalName,Deliver a named signal,bearer
  GET,/v1/approvals,List pending approvals (requires db),bearer
  GET,/v1/approval/list,Legacy alias for /v1/approvals,bearer
  GET,/approval/list and /approvals,Legacy aliases for /v1/approvals,bearer
  POST,/signal/:runId/:signalName,Legacy alias for signals,bearer
```

JSON requests/responses use `Content-Type: application/json`, `Cache-Control: no-store`, and `X-Content-Type-Options: nosniff`. SSE events are named `smithers` and carry `SmithersEvent` JSON; the stream sends a keep-alive comment every 10 s and closes on terminal state.

Errors use the envelope `{ "error": { "code", "message", "details" } }`. Common codes: `INVALID_REQUEST`, `RUN_NOT_FOUND`, `RUN_ALREADY_EXISTS`, `RUN_NOT_ACTIVE`, `NOT_FOUND`, `UNAUTHORIZED`, `WORKFLOW_PATH_OUTSIDE_ROOT`, `DB_NOT_CONFIGURED`, `SERVER_ERROR`.

## Tool surface

Tools resolve relative to `rootDir`. The example below exposes a workflow that uses the built-in `bash` tool through the server; clients call it via `POST /v1/runs`.

```tsx
import { Task, Workflow, createSmithers, bash } from "smithers-orchestrator";
import { z } from "zod";

const { smithers, outputs } = createSmithers({
  result: z.object({ stdout: z.string() }),
});

export default smithers((ctx) => (
  <Workflow name="echo">
    <Task id="run" output={outputs.result}>
      {async () => ({ stdout: await bash(`echo ${ctx.input.msg}`) })}
    </Task>
  </Workflow>
));
```

`allowNetwork: false` (the default) keeps `bash` offline. Set `rootDir` to constrain the filesystem the workflow can touch.

## Authentication

When `authToken` is configured (directly or via `SMITHERS_API_KEY`), every request must include either:

- `Authorization: Bearer <token>`, or
- `x-smithers-key: <token>`.

Missing or invalid tokens return `401`. No scopes; for finer access control use the Gateway.

## Notes

- Each `POST /v1/runs` and `/resume` reloads the workflow source via a content-addressed shadow file with the same extension as the source, so edits take effect on the next call without a restart.
- Active runs heartbeat to `_smithers_runs.heartbeat_at_ms` every 1 s; stale rows (no heartbeat in 30 s) are treated as crashed and may be resumed.
- When a server-level `db` differs from a workflow's database, runs and events are mirrored asynchronously to the server db so they show up in `GET /v1/runs`.
- Metrics are exported via `/metrics`; set `SMITHERS_OTEL_ENABLED=1` plus `OTEL_EXPORTER_OTLP_ENDPOINT` for OTLP. See Observability.

---

## Serve Mode

> Run a single workflow as an HTTP server with Hono. Interact with it over REST, stream events via SSE, and manage approvals remotely.

Serve mode starts a Hono-based HTTP server alongside a running workflow. Every route operates on the single active run, with no workflow path or run ID in requests.

## CLI

```bash
bunx smithers-orchestrator up workflow.tsx --serve --port 3000 --host 0.0.0.0
```

| Flag | Default | Description |
|---|---|---|
| `--serve` | `false` | Enable HTTP server mode |
| `--port` | `7331` | TCP port |
| `--host` | `127.0.0.1` | Bind address |
| `--auth-token` | `SMITHERS_API_KEY` env | Bearer token for auth |
| `--metrics` | `true` | Expose `/metrics` Prometheus endpoint |

The process stays alive after the workflow completes so final state remains queryable. Ctrl+C stops both the server and the workflow.

Detached mode:

```bash
bunx smithers-orchestrator up workflow.tsx --serve --port 8080 -d
```

## Programmatic

```ts
import { createServeApp } from "smithers-orchestrator/serve";

const app = createServeApp({
  workflow,
  adapter,
  runId,
  abort: new AbortController(),
  authToken: "sk-secret",
});

Bun.serve({ port: 3000, fetch: app.fetch });
```

`createServeApp` returns a standard Hono app. Mount it with `Bun.serve`, pass it to another Hono app via `app.route()`, or use `app.fetch` in tests.

## ServeOptions

```ts
type ServeOptions = {
  workflow: SmithersWorkflow<unknown>;   // loaded workflow instance
  adapter: SmithersDb;               // database adapter; typically drizzle("./smithers.db"); see HTTP Server for the equivalent db config
  runId: string;                     // active run ID
  abort: AbortController;            // shared cancellation controller
  authToken?: string;                // bearer token; falls back to SMITHERS_API_KEY; disabled if unset
  metrics?: boolean;                 // expose /metrics; default true
};
```

---

## Authentication

When `authToken` is configured, every route except `/health` requires:

- `Authorization: Bearer <token>`, or
- `x-smithers-key: <token>`

Missing or invalid tokens receive `401`.

---

## Routes

### GET /health

Returns `200` regardless of auth.

```json
{ "ok": true }
```

### GET /

Run status and node summary.

```json
{
  "runId": "run-1234",
  "workflowName": "bugfix",
  "status": "running",
  "startedAtMs": 1707500000000,
  "finishedAtMs": null,
  "summary": { "finished": 3, "in-progress": 1, "pending": 2 }
}
```

### GET /events

SSE stream of lifecycle events. Same format as the multi-workflow server.

| Parameter | Type | Default | Description |
|---|---|---|---|
| `afterSeq` | `number` | `-1` | Only events after this sequence |

```
event: smithers
data: {"type":"NodeStarted","runId":"run-1234","nodeId":"analyze","iteration":0,"attempt":1}
id: 1

event: smithers
data: {"type":"NodeFinished","runId":"run-1234","nodeId":"analyze","iteration":0,"attempt":1}
id: 2
```

- Polls every 500ms.
- Auto-closes when the run reaches a terminal state.
- Reconnect with `?afterSeq=N` to resume from a known position.

### GET /frames

Rendered workflow frames.

| Parameter | Type | Default | Description |
|---|---|---|---|
| `limit` | `number` | `50` | Max frames |
| `afterFrameNo` | `number` | - | Frames after this number |

### POST /approve/:nodeId

Approve a pending approval gate. All fields optional. Returns `{ "runId": "run-1234" }`.

```json
{
  "iteration": 0,
  "note": "Looks good",
  "decidedBy": "alice"
}
```

### POST /deny/:nodeId

Deny a pending approval gate. Same body as `/approve/:nodeId`.

### POST /cancel

Cancel the running workflow.

| Status | Code | Condition |
|---|---|---|
| 200 | - | Cancelled successfully |
| 409 | `RUN_NOT_ACTIVE` | Run is not actively running (e.g. already finished, failed, cancelled, continued, or its heartbeat has gone stale) |

> **Note:** Runs in `waiting-approval` or `waiting-timer` state are cancelled immediately and return `200`.

### GET /metrics

Prometheus text exposition. Same metrics as the multi-workflow server.

---

## Error Format

```json
{
  "error": {
    "code": "ERROR_CODE",
    "message": "Human-readable description"
  }
}
```

Unknown routes return `404` with code `NOT_FOUND`.

---

## Serve Mode vs Multi-Workflow Server

| | Serve mode | Multi-workflow server |
|---|---|---|
| Scope | Single workflow, single run | Any workflow, multiple concurrent runs |
| Start | `bunx smithers-orchestrator up --serve` or `createServeApp()` | `startServer()` |
| Routes | `/`, `/events`, `/approve/:nodeId`, ... | `/v1/runs`, `/v1/runs/:runId`, ... |
| Framework | Hono | Node.js `http` |
| Use case | Development, single-purpose services | Production API gateway |

---

## Example

```bash
# Start a workflow with serve mode
bunx smithers-orchestrator up workflow.tsx --serve --port 8080 --auth-token sk-secret

# Status
curl http://localhost:8080/ -H "Authorization: Bearer sk-secret"

# Stream events
curl -N http://localhost:8080/events -H "Authorization: Bearer sk-secret"

# Approve
curl -X POST http://localhost:8080/approve/deploy \
  -H "Authorization: Bearer sk-secret" \
  -H "Content-Type: application/json" \
  -d '{"note": "Ship it", "decidedBy": "alice"}'

# Health (no auth)
curl http://localhost:8080/health
```

---

## Gateway

> WebSocket / RPC gateway for connecting clients and custom UIs to Smithers runs, providing pushed updates, subscriptions, metrics, and resilient reconnection.

`Gateway` is Smithers' headless control plane. Reach for it (instead of `startServer()`) when long-lived clients (bots, dashboards, schedulers, and custom UIs) need to authenticate once, stream events over WebSocket with resilient reconnection, decide approvals, inject signals, access metrics, and manage cron schedules across many registered workflows. Custom UIs, whether using the vanilla SDK or React hooks, rely on the Gateway to provide pushed updates and a stale-data-free model. For the single-workflow Hono-based HTTP surface, see Serve Mode (`createServeApp()` / `bunx smithers-orchestrator up --serve`).

## Quick start

```tsx
import { Gateway, Task, Workflow, createSmithers } from "smithers-orchestrator";
import { z } from "zod";

const { smithers, outputs } = createSmithers({
  result: z.object({ ok: z.boolean() }),
});

const deploy = smithers((ctx) => (
  <Workflow name="deploy">
    <Task id="ship" output={outputs.result}>{{ ok: true }}</Task>
  </Workflow>
));

const gateway = new Gateway({
  heartbeatMs: 15_000,
  auth: {
    mode: "token",
    tokens: { "operator-token": { role: "operator", scopes: ["*"] } },
  },
});

gateway.register("deploy", deploy, { schedule: "0 8 * * 1-5" });
await gateway.listen({ port: 7331 });
```

```ts
const ws = new WebSocket("ws://localhost:7331");
ws.onmessage = (m) => console.log(JSON.parse(m.data));
ws.onopen = () => ws.send(JSON.stringify({
  type: "req",
  id: "c1",
  method: "connect",
  params: {
    minProtocol: 1,
    maxProtocol: 1,
    client: { id: "docs-example", version: "1.0.0", platform: "browser" },
    auth: { token: "operator-token" },
  },
}));
```

## Gateway client SDK

Programmatic clients (bots, schedulers, dashboards, third-party UIs) talk to the Gateway through the typed client SDK over the same RPC and WebSocket API. For the full custom-UI guide (declarative queries, pushed updates, stale guards, reconnect/resume, backpressure, optimistic mutations, auth, vanilla JS + React hooks) see Custom UIs.

```ts
import { SmithersGatewayClient } from "smithers-orchestrator/gateway-client";

const gateway = new SmithersGatewayClient();
const workflows = await gateway.listWorkflows();

// Resilient pushed updates: backoff + jitter on reconnect, resume from the last
// observed seq via run.gap_resync, stop on run.completed or abort.
for await (const frame of gateway.streamRunEventsResilient({ runId: "run-1" })) {
  if (frame.event === "run.completed") break;
}
```

Gateway client exports:

| Package | Exports |
|---|---|
| `smithers-orchestrator/gateway-client` | `SmithersGatewayClient`, `SmithersGatewayConnection`, `GatewayRpcError`, `gatewayBackoffDelay`, RPC frame/type-map types, `GatewayUiBootConfig`, `SmithersGatewayClientOptions` |
| `smithers-orchestrator/gateway-react` | `SmithersGatewayProvider`, `createGatewayReactRoot`, `useGatewayRun`, `useGatewayRuns`, `useGatewayWorkflows`, `useGatewayApprovals`, `useGatewayNodeOutput`, `useGatewayRunEvents`, `useGatewayActions`, `useGatewayRpc`, `useSmithersGateway` |

## RPC methods (TOON)

```toon
rpc[19]{method,params,returns,scope,transport}:
  launchRun,workflow/input?/options.runId?/options.idempotencyKey?,{runId/workflow},run:write,http+websocket
  resumeRun,runId/options.force?,{runId/status},run:write,http+websocket
  cancelRun,runId,{runId/status:cancelling},run:write,http+websocket
  hijackRun,runId/options?,{runId/status:hijack-ready/sessionId},run:admin,http+websocket
  rewindRun,runId/frameNo/confirm:true,JumpResult,run:admin,http+websocket
  submitApproval,runId/nodeId/iteration?/decision,{runId/nodeId/iteration/approved},approval:submit,http+websocket
  submitSignal,runId/correlationKey/payload?/signalName?,Delivery metadata,signal:submit,http+websocket
  getRun,runId,RunStateView,run:read,http+websocket
  listRuns,filter.status?/filter.limit?,Run summaries,run:read,http+websocket
  listWorkflows,filter.hasUi?,Workflow summaries,run:read,http+websocket
  listApprovals,filter.runId?/filter.workflow?/filter.limit?,Pending approvals,run:read,http+websocket
  streamRunEvents,runId/afterSeq?,{streamId/runId/afterSeq/currentSeq},run:read,websocket
  streamDevTools,runId/afterSeq?,{streamId/runId/afterSeq} + devtools.event frames,observability:read,websocket
  getNodeOutput,runId/nodeId/iteration?,NodeOutputResponse,run:read,http+websocket
  getNodeDiff,runId/nodeId/iteration?,Node diff response,run:read,http+websocket
  cronList,filter.workflow?,Cron rows,cron:read,http+websocket
  cronCreate,workflow/pattern/cronId?/enabled?,Created cron row,cron:write,http+websocket
  cronDelete,cronId,{cronId/removed},cron:write,http+websocket
  cronRun,cronId? or workflow/input?,{runId/workflow},cron:write,http+websocket
```

`health` remains available as a utility RPC and `GET /health` is available without auth. The legacy method names are still accepted for compatibility (`runs.create`, `runs.get`, `runs.list`, `runs.cancel`, `runs.rerun`, `runs.diff`, `frames.list`, `frames.get`, `attempts.list`, `attempts.get`, `workflows.list`, `approvals.list`, `approvals.decide`, `signals.send`, `cron.list`, `cron.add`, `cron.remove`, `cron.trigger`, `getDevToolsSnapshot`, `jumpToFrame`, `devtools.jumpToFrame`, `devtools.getNodeOutput`, `devtools.getNodeDiff`), but new clients should use the v1 names above.

### Scopes

```toon
scopes[8]{scope,allows}:
  run:read,Read run state/lists/event streams/node output/node diffs
  run:write,Launch/resume/cancel runs; implies run:read
  run:admin,Hijack or rewind runs; implies run:write and run:read
  approval:submit,Submit approval decisions
  signal:submit,Submit workflow signals
  cron:read,List cron schedules
  cron:write,Create/delete/trigger cron schedules; implies cron:read
  observability:read,Read DevTools and other observability streams
```

`*` grants every scope. Pass a method name string in the `scopes` array (e.g. `"launchRun"`) to grant access to exactly that RPC call. Legacy wildcard method grants such as `cron.*` continue to match legacy method names; typed scopes are the contract to use for new integrations. Legacy ranked grants (`read`, `execute`, `approve`, `admin`) are accepted so older tokens keep working.

### `rewindRun` (destructive rewind)

Rewinds a run to a prior frame and makes it resumable from that point.
This is destructive: it truncates frames, attempts, output rows, and
diff-cache entries beyond the target; reverts JJ sandboxes; marks the
run `running` again; and emits a `TimeTravelJumped` event so
`streamDevTools` subscribers rebaseline.

Caller identity is authorized per-request: the connection must have
`run:admin` scope and must also be the run owner (`userId` matches
`ownerId`) or have `role: "admin"`. Scope alone never grants access.
The legacy aliases `jumpToFrame` and `devtools.jumpToFrame` route to
`rewindRun`.

Request:

```ts
type RewindRunRequest = {
  runId: string;     // /^[a-z0-9_-]{1,64}$/
  frameNo: number;   // 0 <= frameNo <= latestFrameNo
  confirm: true;     // must be literal true
};
```

Response (`JumpResult`):

```ts
type JumpResult = {
  ok: true;
  newFrameNo: number;
  revertedSandboxes: number;
  deletedFrames: number;
  deletedAttempts: number;
  invalidatedDiffs: number;
  durationMs: number;
};
```

Also broadcast after the DB commit as `run.time_travel_jumped` with
`{ runId, fromFrameNo, toFrameNo, timestampMs, caller }`.

Quota: 10 rewinds per run per caller per hour (default window). Exceeded
→ `RateLimited`.

Failure modes and HTTP status:

| Code                   | Meaning                                                                   | HTTP  |
| ---------------------- | ------------------------------------------------------------------------- | ----- |
| `InvalidRunId`         | `runId` fails `/^[a-z0-9_-]{1,64}$/`.                                     | `400` |
| `InvalidFrameNo`       | `frameNo` is not a non-negative i32 integer.                              | `400` |
| `ConfirmationRequired` | Caller omitted `confirm: true`.                                           | `400` |
| `FrameOutOfRange`      | `frameNo` > latest frame, or run has no frames.                           | `400` |
| `Unauthorized`         | Caller is neither the run owner nor an admin (audit row still written).   | `401` |
| `RunNotFound`          | `runId` does not exist.                                                   | `404` |
| `Busy`                 | Another rewind is in flight for this run.                                 | `409` |
| `RateLimited`          | Caller exceeded rewind quota (default 10/hour).                           | `429` |
| `UnsupportedSandbox`   | A sandbox cannot be reverted (missing / untrackable `jjPointer`).         | `501` |
| `VcsError`             | A JJ revert call failed; DB/reconciler rolled back.                       | `500` |
| `RewindFailed`         | Rewind failed and rollback was partial; run marked `needs_attention`.     | `500` |

Every call, whether success, failure, or unauthorized, writes one row to
`_smithers_time_travel_audit` with `result ∈ { success, failed, partial, in_progress }`.
An in-progress row is inserted before any mutation and updated in place
on completion; startup recovery flips any leftover `in_progress` rows to
`partial`.

### Node output

`getNodeOutput` returns the DevTools Output-tab payload for a single task iteration:

```ts
type NodeOutputResponse = {
  status: "produced" | "pending" | "failed";
  row: Record<string, unknown> | null;
  schema: OutputSchemaDescriptor | null;
  partial?: Record<string, unknown> | null; // only when status === "failed"
};

type OutputSchemaDescriptor = {
  fields: Array<{
    name: string;
    type: "string" | "number" | "boolean" | "object" | "array" | "null" | "unknown";
    optional: boolean;
    nullable: boolean;
    description?: string;
    enum?: readonly unknown[];
  }>;
};
```

### Error codes

Gateway v1 RPC errors use stable code strings and HTTP status mappings:

```toon
errors[19]{code,http}:
  InvalidRequest,400
  InvalidInput,400
  Unauthorized,401
  Forbidden,403
  RunNotFound,404
  CronNotFound,404
  NodeNotFound,404
  IterationNotFound,404
  NodeHasNoOutput,404
  FrameOutOfRange,400
  SeqOutOfRange,400
  Busy,409
  RateLimited,429
  PayloadTooLarge,413
  BackpressureDisconnect,429
  UnsupportedSandbox,501
  VcsError,500
  RewindFailed,500
  Internal,500
```

Some legacy DevTools aliases still surface older validation names such as
`InvalidRunId`, `InvalidFrameNo`, or `ConfirmationRequired`. Treat those as
legacy aliases for the matching v1 validation failure.

### Versioned wire shapes

All DevTools wire types carry `version: 1`.

`DevToolsSnapshot` (v1):

```ts
type DevToolsSnapshot = {
  version: 1;
  runId: string;
  frameNo: number;   // latest frame reflected in this tree
  seq: number;       // monotonic sequence id (equals frameNo today)
  root: DevToolsNode;
};

type DevToolsNode = {
  id: number;        // stable across frames for the same logical node
  type: "workflow" | "task" | "sequence" | "parallel" | /* …see protocol */;
  name: string;
  props: Record<string, unknown>;
  task?: { nodeId: string; kind: "agent" | "compute" | "static"; /* … */ };
  children: DevToolsNode[];
  depth: number;
};
```

`DevToolsDelta` (v1):

```ts
type DevToolsDelta = {
  version: 1;
  baseSeq: number;   // must match the subscriber's current seq
  seq: number;       // new seq after applying ops, in order
  ops: Array<
    | { op: "addNode"; parentId: number; index: number; node: DevToolsNode }
    | { op: "removeNode"; id: number }
    | { op: "updateProps"; id: number; props: Record<string, unknown> }
    | { op: "updateTask"; id: number; task: DevToolsNode["task"] }
    | { op: "replaceRoot"; node: DevToolsNode } // emitted when the root's
                                                // identity or shape changes;
                                                // `removeNode` of the root is
                                                // never emitted.
  >;
};
```

`DevToolsEvent` (v1), frames pushed over `devtools.event`:

```ts
type DevToolsEvent =
  | { version: 1; kind: "snapshot"; snapshot: DevToolsSnapshot }
  | { version: 1; kind: "delta"; delta: DevToolsDelta };
```

A subscription always starts with a `snapshot` event, then emits `delta` events
per frame. The server re-baselines (emits a full `snapshot` instead of a
`delta`) after 50 delta events, when a delta is larger than a fresh snapshot,
or when the gateway observes `TimeTravelJumped` for the run.

## WebSocket protocol

Three frame types share the same socket:

- `req`: `{ type: "req", id, method, params? }` from client.
- `res`: `{ type: "res", id, ok, payload?, error? }` from server, correlated by `id`.
- `event`: `{ type: "event", event, payload?, seq, stateVersion }` server-pushed; `seq` is per connection, `stateVersion` is global.

Handshake: on connect the server immediately pushes `connect.challenge` (`{ nonce, ts }`). The client replies with a `connect` request carrying `minProtocol`, `maxProtocol`, `client` metadata, `auth`, and an optional `subscribe: string[]` to filter events by `runId`. The server returns a `hello` payload (`protocol`, `features`, `policy.heartbeatMs`, `auth` with `sessionToken`/`role`/`scopes`/`userId`, `snapshot`).

After `connect`, the gateway emits `tick` events every `heartbeatMs`. `launchRun`, `submitApproval`, `submitSignal`, and `cronRun` automatically subscribe the connection to the affected `runId`. Server-pushed event names:

| Event | Category |
|---|---|
| `connect.challenge` | Connection |
| `tick` | Connection |
| `run.event` | Run lifecycle |
| `run.heartbeat` | Run lifecycle |
| `run.gap_resync` | Run lifecycle |
| `run.error` | Run lifecycle |
| `run.completed` | Run lifecycle |
| `run.time_travel_jumped` | Run lifecycle |
| `node.started` | Run lifecycle |
| `node.finished` | Run lifecycle |
| `node.failed` | Run lifecycle |
| `task.output` | Run lifecycle |
| `task.heartbeat` | Run lifecycle |
| `approval.requested` | Approval |
| `approval.decided` | Approval |
| `approval.auto_approved` | Approval |
| `cron.triggered` | Cron |
| `devtools.event` | DevTools |

For stateless callers, `POST /rpc` accepts the same body shape (`{ id, method, params }`) and returns the same `ResponseFrame`. Auth headers: `Authorization: Bearer <token>` or `x-smithers-key: <token>` (or trusted-proxy headers in trusted-proxy mode).

## GatewayOptions

```ts
type GatewayOptions = {
  protocol?: number;                 // default 1
  features?: string[];               // default ["streaming", "runs"]
  heartbeatMs?: number;              // default 15_000
  auth?: GatewayAuthConfig;
  defaults?: { cliAgentTools?: "all" | "explicit-only" };
  maxBodyBytes?: number;             // default 1_048_576 for POST /rpc
  maxPayload?: number;               // default 1_048_576 for WebSocket frames
  maxConnections?: number;           // default 1_000
  eventWindowSize?: number;          // default 10_000 per-run replay frames
  headersTimeout?: number;           // default 30_000
  requestTimeout?: number;           // default 60_000
};

type GatewayAuthConfig =
  | {
      mode: "token";
      tokens: Record<string, { role: string; scopes: string[]; userId?: string }>;
    }
  | {
      mode: "jwt";
      issuer: string;
      audience: string | string[];
      secret: string;                // HS256
      scopesClaim?: string;
      roleClaim?: string;
      userClaim?: string;
      defaultRole?: string;
      defaultScopes?: string[];
      clockSkewSeconds?: number;
    }
  | {
      mode: "trusted-proxy";
      allowedOrigins?: string[];
      trustedHeaders?: string[];     // default ["x-user-id","x-user-scopes","x-user-role"]
      defaultRole?: string;
      defaultScopes?: string[];
    };
```

Runs started through the gateway expose `ctx.auth = { triggeredBy, role, scopes, createdAt }`. `<Approval>` may further restrict decisions with `allowedScopes` and `allowedUsers`, which the gateway enforces before accepting `submitApproval`.

`headersTimeout` and `requestTimeout` are applied to the underlying Node HTTP server when `gateway.listen()` starts. Keep both below the corresponding reverse-proxy idle/read timeouts so slow clients are closed by Smithers first.

## Notes

- Cron: `gateway.register(name, wf, { schedule })` writes a cron row keyed `gateway:<name>`; the gateway polls between 1 s and 15 s (clamped from `heartbeatMs`). Cron-fired runs get `ctx.auth.role = "system"`, `triggeredBy = "cron:gateway"`, `scopes = ["*"]`.
- JWT mode currently validates `alg=HS256`, HMAC, `iss`, `aud`, `exp`, `nbf`. Scope claims may be arrays or space/comma-separated strings.
- Trusted-proxy mode is only safe behind something you control (Cloudflare Access, internal API gateway) that strips and rewrites identity headers.
- DevTools streams: see Versioned wire shapes for re-baseline triggers; over-capacity subscribers receive `BackpressureDisconnect`.

---

## MCP Server

> Expose Smithers as a Model Context Protocol stdio server so any MCP client (Claude Code, Cursor, Codex, or your own agent) can list, run, inspect, and control workflows without shell scripting.

Smithers ships a built-in MCP stdio server. Passing `--mcp` to the CLI speaks the Model Context Protocol over stdin/stdout instead of acting as an interactive CLI. Any MCP-aware client can connect, discover workflows, start runs, watch progress, resolve approvals, and revert bad attempts through structured tool calls.

Use the MCP server when an AI agent should drive Smithers autonomously. Use the HTTP Server for REST endpoints for human-written code or webhooks.

---

## Setup

### Start the server

```bash
smithers --mcp
```

This starts the semantic surface: a stable, structured tool set for AI agent consumption, documented on this page.

Two additional surfaces are available via `--surface`:

```bash
# Semantic tools only (default)
smithers --mcp --surface semantic

# Raw CLI-mirroring tools only
smithers --mcp --surface raw

# Both surfaces registered on the same server
smithers --mcp --surface both
```

Use `--surface raw` only for direct CLI parity. Prefer the semantic surface for new integrations: every tool returns a `{ ok, data, error }` envelope with Zod-validated input and output schemas.

### Register manually

For clients that read JSON config directly:

```json
{
  "mcpServers": {
    "smithers": {
      "command": "bunx",
      "args": ["smithers-orchestrator", "--mcp"]
    }
  }
}
```

Project-scoped install (e.g. a monorepo where Smithers is a dev dependency; ensure `smithers-orchestrator` is in the local `package.json`):

```json
{
  "mcpServers": {
    "smithers": {
      "command": "bunx",
      "args": ["smithers-orchestrator", "--mcp"]
    }
  }
}
```

---

## Tool Registration

On start, each tool is registered with its input schema, output schema, and MCP annotations. Every tool carries:

- **`inputSchema`**: Zod object describing accepted parameters.
- **`outputSchema`**: Zod schema for the structured response envelope.
- **`annotations`**: MCP annotation metadata (`readOnlyHint`, `destructiveHint`, `idempotentHint`, `openWorldHint`).

### Structured tool envelope

Every tool returns the same shape:

```ts
{
  ok: boolean;
  data?: { ... };     // present on success
  error?: {           // present on failure
    code: string;
    message: string;
    details?: Record<string, unknown> | null;
    docsUrl?: string | null;
  };
}
```

The response is also echoed as a `text` content block, so clients that do not parse `structuredContent` still receive the JSON payload.

### Tool annotations

| Annotation | Tools | Meaning |
|---|---|---|
| `readOnlyHint: true` | Most query tools | Tool does not modify state |
| `readOnlyHint: false, openWorldHint: true` | `run_workflow` | Launches external processes |
| `readOnlyHint: false, destructiveHint: true, idempotentHint: false` | `resolve_approval`, `revert_attempt` | Mutates persisted state irreversibly |

---

## Tool Reference

### list_workflows

List all Smithers workflows discovered in the working directory.

**Input:** none

**Output:**

```ts
{
  workflows: Array<{
    id: string;
    metadataVersion: number;
    displayName: string;
    entryFile: string;
    sourceType: string;
    description: string;
    tags: string[];
    aliases: string[];
  }>;
}
```

Use the returned `id` values as the `workflowId` parameter for `run_workflow`.

---

### run_workflow

Start or resume a discovered workflow.

**Input:**

| Parameter | Type | Default | Description |
|---|---|---|---|
| `workflowId` | `string` | required | Workflow ID from `list_workflows` |
| `input` | `Record<string, unknown>` | `{}` | Workflow input object |
| `prompt` | `string` | - | Shorthand: sets `input.prompt` when `input` is not provided |
| `runId` | `string` | auto | Custom run ID |
| `resume` | `boolean` | `false` | Resume an existing run; requires `runId` |
| `force` | `boolean` | `false` | Force-start even if a run with this ID already exists |
| `waitForTerminal` | `boolean` | `false` | Block until the run reaches a terminal state |
| `waitForStartMs` | `number` | `1000` | For background launches, how long to wait for the run row to appear in the database |
| `maxConcurrency` | `number` | - | Max concurrent nodes |
| `rootDir` | `string` | - | Root directory for tool sandboxing and path resolution |
| `logDir` | `string` | - | Directory for log files |
| `allowNetwork` | `boolean` | `false` | Allow network access in `bash` tool |
| `maxOutputBytes` | `number` | - | Cap on node output size |
| `toolTimeoutMs` | `number` | - | Per-tool call timeout |
| `hot` | `boolean` | `false` | Enable hot-reloading of the workflow file |

**Output:**

```ts
{
  workflow: {
    id: string;
    metadataVersion: number;
    displayName: string;
    entryFile: string;
    sourceType: string;
    description: string;
    tags: string[];
    aliases: string[];
  };
  runId: string;
  launchMode: "background" | "waited";
  requestedResume: boolean;
  status: string;
  observedRun: RunSummary | null;
  result: { runId, status, output?, error? } | null;
}
```

**Background vs. waited launch**

By default (`waitForTerminal: false`) the tool fires the workflow and returns immediately with `launchMode: "background"`. `observedRun` reflects the run state polled during `waitForStartMs`. Use `watch_run` to track progress.

Set `waitForTerminal: true` to block until the workflow finishes. `result` is populated and `launchMode` is `"waited"`.

**Run option forwarding**

`rootDir`, `logDir`, `allowNetwork`, `maxOutputBytes`, `toolTimeoutMs`, and `hot` are forwarded verbatim to `runWorkflow`. They override values baked into the workflow file.

---

### list_runs

List recent runs with summary data.

**Input:**

| Parameter | Type | Default | Description |
|---|---|---|---|
| `limit` | `number` (1–200) | `20` | Max runs to return |
| `status` | `string` | - | Filter by status (`running`, `finished`, `failed`, etc.) |

**Output:**

```ts
{
  runs: RunSummary[];
}
```

`RunSummary` fields: `runId`, `workflowName`, `workflowPath`, `parentRunId`, `status`, `createdAtMs`, `startedAtMs`, `finishedAtMs`, `heartbeatAtMs`, `activeNodeId`, `activeNodeLabel`, `pendingApprovalCount`, `waitingTimers`, `countsByState`.

---

### get_run

Get the full detail record for a specific run, including steps, approvals, timers, loop state, lineage, config, and error.

**Input:**

| Parameter | Type | Description |
|---|---|---|
| `runId` | `string` | Run ID |

**Output:**

```ts
{
  run: RunSummary & {
    steps: Array<{ nodeId, iteration, state, lastAttempt, updatedAtMs, outputTable, label }>;
    approvals: PendingApproval[];
    loops: Array<{ loopId, iteration, maxIterations }>;
    continuedFromRunIds: string[];
    activeDescendantRunId: string | null;
    config: unknown | null;
    error: unknown | null;
  };
}
```

---

### watch_run

Poll a run at a fixed interval until it reaches a terminal state or a timeout expires.

**Input:**

| Parameter | Type | Default | Description |
|---|---|---|---|
| `runId` | `string` | required | Run to watch |
| `intervalMs` | `number` | `1000` | Poll interval (minimum enforced by runtime) |
| `timeoutMs` | `number` | `30000` | Wall-clock budget before giving up |

**Output:**

```ts
{
  runId: string;
  intervalMs: number;
  pollCount: number;
  reachedTerminal: boolean;
  timedOut: boolean;
  finalRun: RunSummary;
  snapshots: Array<{ observedAtMs: number; run: RunSummary }>;
}
```

When `timedOut` is `true` the run is still active, so call `watch_run` again or raise `timeoutMs`. Terminal statuses: any status other than `running`, `waiting-approval`, `waiting-event`, or `waiting-timer`, including `finished`, `failed`, `cancelled`, and `continued`.

---

### explain_run

Return a structured diagnosis explaining why a run is blocked, waiting, or stale.

**Input:**

| Parameter | Type | Description |
|---|---|---|
| `runId` | `string` | Run ID |

**Output:**

```ts
{
  diagnosis: {
    runId: string;
    status: string;
    summary: string;
    generatedAtMs: number;
    blockers: Array<{
      kind: string;
      nodeId: string;
      iteration: number | null;
      reason: string;
      waitingSince: number;
      unblocker: string;
      context?: string;
      signalName?: string | null;
      dependencyNodeId?: string | null;
      firesAtMs?: number | null;
      remainingMs?: number | null;
      attempt?: number | null;
      maxAttempts?: number | null;
    }>;
    currentNodeId: string | null;
  };
}
```

`summary` is a human-readable sentence. `blockers` lists every node preventing progress; `unblocker` describes what action or event would unblock it.

---

### list_pending_approvals

List approvals that are waiting for a human decision, optionally filtered by run, workflow, or node.

**Input:** All parameters optional. Omit all to list every pending approval across all runs.

| Parameter | Type | Description |
|---|---|---|
| `runId` | `string` | Filter by run ID |
| `workflowName` | `string` | Filter by workflow name |
| `nodeId` | `string` | Filter by node ID |

**Output:**

```ts
{
  approvals: Array<{
    runId: string;
    nodeId: string;
    iteration: number;
    status: string;
    requestedAtMs: number | null;
    decidedAtMs: number | null;
    note: string | null;
    decidedBy: string | null;
    request: unknown;
    decision: unknown;
    autoApproved?: boolean;
    workflowName: string | null;
    runStatus: string | null;
    nodeLabel: string | null;
  }>;
}
```

---

### resolve_approval

Approve or deny a pending approval. This tool is destructive and non-idempotent.

**Input:**

| Parameter | Type | Description |
|---|---|---|
| `action` | `"approve" \| "deny"` | required, decision to record |
| `runId` | `string` | Filter to a specific run |
| `workflowName` | `string` | Filter by workflow name |
| `nodeId` | `string` | Filter by node ID |
| `iteration` | `number` | Filter by loop iteration |
| `note` | `string` | Optional note to record with the decision |
| `decidedBy` | `string` | Identity of the decision-maker |
| `decision` | `unknown` | Structured decision payload passed back to the workflow |

**Ambiguity guard**

Zero matches errors with `INVALID_INPUT`. More than one match errors with `INVALID_INPUT` and returns matches in `details.matches`; add `runId`, `nodeId`, or `iteration` to narrow the selection. The tool never guesses when multiple approvals match.

**Output:**

```ts
{
  action: "approve" | "deny";
  approval: PendingApproval;   // with updated status, decidedAtMs, note, decidedBy
  run: RunSummary | null;
}
```

---

### ask_human

Block the current run and ask a human to make a decision, then wait for their answer. Use this whenever the agent is blocked, uncertain, missing information, or about to take an irreversible or destructive action, instead of guessing. The tool creates a durable, pending human request and returns only once it is resolved.

When run inside a Smithers task, the run/node context is taken from the `SMITHERS_RUN_ID` / `SMITHERS_NODE_ID` / `SMITHERS_ITERATION` environment variables Smithers injects into the agent; pass `runId`/`nodeId`/`iteration` explicitly to override, or rely on single-active-run autodetection.

An operator resolves the request with `bunx smithers-orchestrator human answer <requestId> --value '<json>'` (or `bunx smithers-orchestrator human cancel <requestId>`); `bunx smithers-orchestrator human inbox` lists everything waiting.

**Input:**

| Parameter | Type | Description |
|---|---|---|
| `prompt` | `string` | required, the decision or question to put to a human |
| `context` | `string` | Extra context appended to the prompt |
| `choices` | `string[]` | Fixed choices; restricts the human's answer to one of these |
| `runId` | `string` | Run to attach to (default: `SMITHERS_RUN_ID` or the single active run) |
| `nodeId` | `string` | Node to attach to (default: `SMITHERS_NODE_ID`) |
| `iteration` | `number` | Loop iteration (default: `SMITHERS_ITERATION` or 0) |
| `timeoutSeconds` | `number` | Seconds before the request expires (0/unset = no timeout) |
| `pollSeconds` | `number` | Poll interval while blocking (default 3s) |

**Output:**

```ts
{
  requestId: string;
  runId: string;
  nodeId: string;
  iteration: number;
  status: "answered" | "cancelled" | "expired" | "missing" | "aborted";
  decision: "approved" | "blocked";   // "blocked" => do not proceed
  response: unknown | null;            // the human's answer when status is "answered"
  answeredBy: string | null;
}
```

---

### get_node_detail

Get enriched detail for a single node, including all attempts, tool calls, token usage, scorer results, and validated output.

**Input:**

| Parameter | Type | Description |
|---|---|---|
| `runId` | `string` | required |
| `nodeId` | `string` | required |
| `iteration` | `number` | Loop iteration (default: latest) |

**Output:**

```ts
{
  detail: {
    node: { runId, nodeId, iteration, state, lastAttempt, updatedAtMs, outputTable, label };
    status: string;
    durationMs: number | null;
    attemptsSummary: { total, failed, cancelled, succeeded, waiting };
    attempts: unknown[];
    toolCalls: unknown[];
    tokenUsage: unknown;
    scorers: unknown[];
    output: {
      validated: unknown | null;
      raw: unknown | null;
      source: "cache" | "output-table" | "none";
      cacheKey: string | null;
    };
    limits: {
      toolPayloadBytesHuman: number;
      validatedOutputBytesHuman: number;
    };
  };
}
```

---

### revert_attempt

Revert the workspace and frame history back to the state captured at a specific attempt. This is destructive and non-idempotent.

**Input:**

| Parameter | Type | Default | Description |
|---|---|---|---|
| `runId` | `string` | required | Run containing the node |
| `nodeId` | `string` | required | Node to revert |
| `iteration` | `number` | `0` | Loop iteration |
| `attempt` | `number` | required | Attempt number to revert to (must be ≥ 1) |

**Output:**

```ts
{
  runId: string;
  nodeId: string;
  iteration: number;
  attempt: number;
  success: boolean;
  error?: string;
  jjPointer?: string;
  run: RunSummary | null;
}
```

---

### list_artifacts

List structured output artifacts produced by nodes in a run.

**Input:**

| Parameter | Type | Default | Description |
|---|---|---|---|
| `runId` | `string` | required | Run ID |
| `nodeId` | `string` | - | Limit to a specific node |
| `includeRaw` | `boolean` | `false` | Include raw (pre-validation) output values |

**Output:**

```ts
{
  artifacts: Array<{
    artifactId: string;   // "<runId>:<nodeId>:<iteration>"
    kind: "node-output";
    runId: string;
    nodeId: string;
    iteration: number;
    label: string | null;
    state: string;
    outputTable: string | null;
    source: "cache" | "output-table" | "none";
    cacheKey: string | null;
    value: unknown | null;
    rawValue?: unknown | null;   // only when includeRaw=true
  }>;
}
```

Only nodes with an `outputTable` and a non-`none` output source are included.

---

### get_chat_transcript

Return the structured agent chat transcript for a run, grouped by attempts.

**Input:**

| Parameter | Type | Default | Description |
|---|---|---|---|
| `runId` | `string` | required | Run ID |
| `all` | `boolean` | `false` | Include all attempts, not just those with known output events |
| `includeStderr` | `boolean` | `true` | Include stderr messages |
| `tail` | `number` | - | Return only the last N messages |

**Output:**

```ts
{
  runId: string;
  attempts: Array<{
    attemptKey: string;
    nodeId: string;
    iteration: number;
    attempt: number;
    state: string;
    startedAtMs: number;
    finishedAtMs: number | null;
    cached: boolean;
    meta: unknown | null;
  }>;
  messages: Array<{
    id: string;
    attemptKey: string;
    nodeId: string;
    iteration: number;
    attempt: number;
    role: "user" | "assistant" | "stderr";
    stream: "stdout" | "stderr" | null;
    timestampMs: number;
    text: string;
    source: "prompt" | "event" | "responseText";
  }>;
}
```

Messages are sorted by `timestampMs`. Use `tail` to limit context window usage on long transcripts.

---

### get_run_events

Return the raw structured event history for a run with optional filtering.

**Input:**

| Parameter | Type | Default | Description |
|---|---|---|---|
| `runId` | `string` | required | Run ID |
| `afterSeq` | `number` | - | Only events with `seq` greater than this value |
| `limit` | `number` (1–10000) | `200` | Max events to return |
| `nodeId` | `string` | - | Filter to events for a specific node |
| `types` | `string[]` | - | Filter to specific event types (e.g. `["NodeFinished", "NodeFailed"]`) |
| `sinceTimestampMs` | `number` | - | Only events at or after this timestamp |

**Output:**

```ts
{
  runId: string;
  events: Array<{
    runId: string;
    seq: number;
    timestampMs: number;
    type: string;
    payload: unknown | null;
  }>;
}
```

Paginate via `afterSeq`: pass the `seq` of the last received event to fetch the next page.

---

## Usage Examples

### List workflows and start a run

```
> list_workflows {}

{
  "ok": true,
  "data": {
    "workflows": [
      { "id": "bugfix", "displayName": "bugfix", "entryFile": "./workflows/bugfix.tsx", "sourceType": "user" }
    ]
  }
}

> run_workflow { "workflowId": "bugfix", "prompt": "Fix the auth token expiry bug" }

{
  "ok": true,
  "data": {
    "runId": "smi_abc123",
    "launchMode": "background",
    "status": "running",
    ...
  }
}
```

### Watch until complete

```
> watch_run { "runId": "smi_abc123", "timeoutMs": 120000 }

{
  "ok": true,
  "data": {
    "reachedTerminal": true,
    "timedOut": false,
    "finalRun": { "status": "finished", ... }
  }
}
```

### Resolve a pending approval

```
> list_pending_approvals { "runId": "smi_abc123" }

{
  "ok": true,
  "data": {
    "approvals": [
      { "nodeId": "deploy", "iteration": 0, "nodeLabel": "Deploy to production", ... }
    ]
  }
}

> resolve_approval { "action": "approve", "runId": "smi_abc123", "nodeId": "deploy", "decidedBy": "alice", "note": "Looks good" }

{
  "ok": true,
  "data": {
    "action": "approve",
    "approval": { "status": "approved", "decidedAtMs": 1707500100000, ... },
    "run": { "status": "running", ... }
  }
}
```

### Debug a blocked run

```
> explain_run { "runId": "smi_abc123" }

{
  "ok": true,
  "data": {
    "diagnosis": {
      "summary": "Run is waiting for a human approval on node 'deploy'.",
      "blockers": [
        {
          "kind": "approval",
          "nodeId": "deploy",
          "reason": "Node requires human approval before proceeding.",
          "unblocker": "Call resolve_approval with action=approve or action=deny."
        }
      ]
    }
  }
}
```

### Revert a failed attempt

```
> get_node_detail { "runId": "smi_abc123", "nodeId": "analyze" }

{
  "ok": true,
  "data": {
    "detail": {
      "attemptsSummary": { "total": 3, "failed": 2, "succeeded": 1 },
      ...
    }
  }
}

> revert_attempt { "runId": "smi_abc123", "nodeId": "analyze", "attempt": 1 }

{
  "ok": true,
  "data": {
    "success": true,
    "run": { "status": "running", ... }
  }
}
```

---

## Error Codes

Errors follow the structured envelope. Common codes:

| Code | Meaning |
|---|---|
| `RUN_NOT_FOUND` | No run or workflow exists with the given ID |
| `INVALID_INPUT` | Missing required field, failed validation, or ambiguous approval filter |
| `WORKFLOW_MISSING_DEFAULT` | Workflow file has no default export |


===============================================================================

# Smithers Effect API

> Smithers Effect-ts authoring API: build workflows as Effect values without JSX or React.

---

## Effect API

> Build Smithers workflows as first-class graph values, no JSX or React.

The Effect API is the lower-level authoring surface for teams that already model
application logic with `Effect`, `Layer`, and `Schema`. It uses the same Smithers
runtime as JSX: steps are persisted in SQLite, completed work is not re-run on
resume, outputs are schema-validated, and dependencies drive scheduling. The
difference is authoring style: every step, approval, sequence, parallel block,
match, branch, loop, worktree, and scope is an ordinary value you can export,
return from a function, or compose with other graph values.

Use JSX for most workflows. Use the Effect API when your workflow lives inside
an Effect service, you want step bodies to return `Effect` values directly, or
you need a React-free API for generated workflow definitions.

## Minimal workflow

```ts
import { Smithers } from "smithers-orchestrator";
import { Effect, Schema } from "effect";

const inputSchema = Schema.Struct({
  repo: Schema.String,
  sha: Schema.String,
});

const analysisSchema = Schema.Struct({
  summary: Schema.String,
  risk: Schema.Literal("low", "medium", "high"),
});

const reportSchema = Schema.Struct({
  markdown: Schema.String,
});

const G = Smithers.workflow({
  name: "repo-review",
  input: inputSchema,
});

const analyze = G.step("analyze", {
  output: analysisSchema,
  timeout: "2m",
  retry: { maxAttempts: 3, backoff: "exponential", initialDelay: "1s" },
  run: ({ input, heartbeat }) =>
    Effect.gen(function* () {
      heartbeat({ phase: "analyzing" });
      yield* Effect.log(`Reviewing ${input.repo}@${input.sha}`);
      return { summary: "Found one risky migration.", risk: "medium" as const };
    }),
});

const report = G.step("report", {
  needs: { analyze },
  output: reportSchema,
  run: ({ analyze }) => ({
    markdown: `# Review\n\n${analyze.summary}\n\nRisk: ${analyze.risk}`,
  }),
});

export const reviewWorkflow = G.from(G.sequence(analyze, report));

const result = await Effect.runPromise(
  reviewWorkflow
    .execute(
      { repo: "acme/api", sha: "abc123" },
      { runId: "review-abc123" },
    )
    .pipe(Effect.provide(Smithers.sqlite({ filename: "smithers.db" }))),
);
```

`Smithers.workflow(opts)` returns a typed handle `G`. Every constructor
(`G.step`, `G.approval`, `G.sequence`, `G.parallel`, `G.match`, `G.branch`,
`G.loop`, `G.worktree`, `G.scope`) returns a graph value. `G.from(graph)`
finalizes the workflow into something `execute`-able.

`execute()` returns an `Effect`. The success value is the decoded output of the
final graph node: a step output for a single step, the last child for a
sequence, a tuple for a parallel block. If the run stops on an approval or
timer, the success value is the normal `RunResult` with a waiting status.

## Steps and dependencies

Steps are values:

```ts
const analyze = G.step("analyze", {
  output: analysisSchema,
  run: ({ input }) => analyzeRepo(input.repo, input.sha),
});
```

`input` is typed from the workflow's input schema. The step's output type is
inferred from the `output` schema and flows into anything that lists this step
in `needs`:

```ts
const report = G.step("report", {
  needs: { analyze },
  output: reportSchema,
  run: ({ analyze }) => ({
    markdown: renderReport(analyze.summary, analyze.risk),
  }),
});
```

Step IDs are durable. Changing an ID creates a new task and leaves the old
persisted output behind. A step can return a plain value, a `Promise`, or an
`Effect`; Smithers decodes the result with the step's `output` schema before
writing it.

The step context includes `input`, dependency values, `executionId`, `stepId`,
`attempt`, `iteration`, `signal`, `heartbeat(data)`, and `lastHeartbeat`.

## Control flow

Use `G.sequence(...nodes)` for ordered work and
`G.parallel(...nodes, { maxConcurrency })` for concurrent work. `G.parallel`
returns a tuple of child results.

`G.match(source, { when, then, else })` selects between two statically-known
branches based on a completed step's output. Both branches are compiled into
the graph; only the matching branch executes.

`G.branch({ condition, needs, then, else })` is the same shape but the
predicate runs against an arbitrary `needs` context, not a single source step.

`G.loop({ id, children, until, maxIterations, onMaxReached })` repeats a fragment until the
predicate returns true. Nested loops are not supported. `onMaxReached` accepts `'fail'` or `'return-last'` and controls behavior when `maxIterations` is exceeded; when omitted the loop returns the last iteration's outputs rather than failing.

```ts
export const reviewWorkflow = G.from(
  G.sequence(
    analyze,
    G.match(analyze, {
      when: (analysis) => analysis.risk === "high",
      then: G.approval("approve-high-risk", {
        needs: { analyze },
        request: ({ analyze }) => ({
          title: "Approve high-risk review",
          summary: analyze.summary,
        }),
        onDeny: "fail",
      }),
      // else: report (omitting `else` means the match falls through to the next sibling in the sequence)
    }),
    report,
  ),
);
```

## Worktrees

`G.worktree({ id, path, branch, skipIf, needs, children })` runs `children`
inside a git worktree. The worktree is created before the children execute and
torn down afterward.

```ts
G.worktree({
  id: "review-shard",
  path: "scratch/review",
  children: G.sequence(read, summarize),
});
```

## Reuse

Static reuse is just a graph value:

```ts
const reviewShard = G.sequence(read, summarize);
```

Parameterized reuse is a function returning a graph value:

```ts
const makeReviewShard = (params: { path: string }) =>
  G.sequence(
    G.step("read", {
      output: diffSchema,
      run: ({ input }) => readDiff(input.repo, params.path),
    }),
    G.step("summarize", {
      needs: { read },
      output: summarySchema,
      run: ({ read }) => summarizeDiff(read),
    }),
  );
```

Multi-mount reuse is `G.scope(instanceId, fragment)`. The compiler applies
`instanceId.` as a durable ID prefix to every step and approval inside the
fragment. For example, `G.scope('api', makeReviewShard(...))` produces step IDs `api.read` and `api.summarize` in the database. The same fragment can be mounted under multiple scopes without collision:

```ts
G.parallel(
  G.scope("api", makeReviewShard({ path: "packages/api" })),
  G.scope("web", makeReviewShard({ path: "apps/web" })),
);
```

## Cross-workflow fragments

For graph fragments that need to live across workflows with different inputs,
build them with `Smithers.fragment(inputSchema)`:

```ts
const F = Smithers.fragment(diffInputSchema);

const readDiff = F.step("read-diff", {
  output: diffSchema,
  run: ({ input }) => readDiff(input.path),
});

const summarize = F.step("summarize", {
  needs: { readDiff },
  output: summarySchema,
  run: ({ readDiff }) => summarizeDiff(readDiff),
});

export const reviewShard = F.sequence(readDiff, summarize);
```

`Smithers.fragment` exposes the same constructors as a workflow handle (`step`,
`approval`, `sequence`, `parallel`, `match`, `branch`, `loop`, `worktree`,
`scope`), but no `from`; fragments are values, not workflows. Compile happens
when they're mounted into a real workflow:

```ts
const G = Smithers.workflow({ name: "repo-review", input: workflowInputSchema });

export const reviewWorkflow = G.from(
  G.parallel(
    G.scope("api", reviewShard),
    G.scope("web", reviewShard),
  ),
);
```

A fragment is typed with an input schema so TypeScript can infer each step's `input` type. At runtime the schema is not read or validated; steps receive the host workflow's input directly. The host workflow's input type must be assignable to the fragment's input schema. This is enforced at compile time: TypeScript will error if you mount a fragment whose input schema has fields the host workflow's input doesn't satisfy.

## Operational notes

- Provide exactly one persistence layer with `Effect.provide(Smithers.sqlite({ filename }))`.
- Keep step IDs stable across releases; use new IDs for materially different work.
- Use `heartbeat()` in long-running steps and honor `signal` in external calls.
- Use `retry`, `retryPolicy`, `timeout`, `skipIf`, and `cache` the same way you
  would on JSX tasks (see JSX Task options for the shared option shape).
- All graph values support `.pipe(...fns)` for future data-last combinators.
- Prefer idempotent step bodies. For external side effects, use `executionId`,
  `stepId`, and `attempt` when constructing idempotency keys.
- `G.match` is graph topology selection: both branches must be statically
  knowable so durable IDs stay stable across resume. It is not Effect's
  `Match` module (which is runtime value pattern matching).


===============================================================================

# Smithers Integrations

> Smithers integrations: agent runtimes (Claude Code, Codex, Gemini, Pi), tool surfaces, ecosystem partners.

---

## Integrations

> Three patterns for connecting Smithers to external services.

Smithers ships no first-party clients for Linear, Notion, Slack, or similar services. Treat them as external integrations your app owns and pick one of three wirings: tools on an SDK agent, skills/plugins/MCP on a CLI agent, or an external CLI run inside a task.

## Pattern 1: Pass tools to an SDK agent

Use this when the agent needs judgment but the external calls should stay explicit and reviewable.

```ts
import { ToolLoopAgent as Agent, tool, zodSchema } from "ai";
import { anthropic } from "@ai-sdk/anthropic";
import { z } from "zod";

const linearGetIssue = tool({
  description: "Fetch a Linear issue",
  inputSchema: zodSchema(z.object({ id: z.string() })),
  execute: async ({ id }) => linearClient.getIssue(id),
});

const opsAgent = new Agent({
  model: anthropic("claude-sonnet-4-20250514"),
  tools: { linearGetIssue },
});
```

## Pattern 2: Pass a skill / plugin / MCP config to a CLI agent

Use this when your CLI agent already supports external integrations and Smithers should only orchestrate the task.

```ts
import { PiAgent } from "smithers-orchestrator";

const pi = new PiAgent({
  provider: "openai",
  model: "gpt-5.2-codex",
  skill: ["./skills/linear", "./skills/notion"],
});
```

```tsx
<Task id="ticket-review" output={outputs.review} agent={pi}>
  {`Use the Linear skill to inspect ${ctx.input.issueId}, then summarize next actions.`}
</Task>
```

## Pattern 3: Run an external CLI in a task

Use this when the step is deterministic and you do not need the model involved.

```tsx
<Task id="load-linear" output={outputs.linearIssue}>
  {async () => {
    const proc = Bun.spawn(["linear", "issue", "view", ctx.input.issueId, "--json"], {
      stdout: "pipe",
      stderr: "pipe",
    });
    const stdout = await new Response(proc.stdout).text();
    const stderr = await new Response(proc.stderr).text();
    if (await proc.exited !== 0) throw new Error(stderr || stdout);
    return JSON.parse(stdout);
  }}
</Task>
```

## Choosing between them

| If you need | Prefer |
|---|---|
| AI judgment over a small integration surface | Pattern 1 (SDK agent with narrow tools) |
| Existing CLI ecosystem support (skills, plugins, MCP) | Pattern 2 (CLI agent) |
| Deterministic sync or publish steps | Pattern 3 (compute task) |


---

## CLI Agents

> Spawn external CLI tools (Claude Code, Codex, Pi, …) and pipe them through the workflow runtime.

CLI-backed agent classes wrap external AI command-line tools and implement the [AI SDK](https://ai-sdk.dev) `Agent` interface. Use them anywhere Smithers accepts an agent, including `<Task>`. Reach for these for a vendor's full CLI surface (sessions, sandboxes, slash commands, MCP). For API-billed provider wrappers, see SDK Agents.

## Quick Start

```tsx
import { ClaudeCodeAgent, Task, Workflow, createSmithers } from "smithers-orchestrator";
import { z } from "zod";

const { smithers, outputs } = createSmithers({
  analysis: z.object({ summary: z.string() }),
});

const claude = new ClaudeCodeAgent({
  model: "claude-sonnet-4-20250514",
  systemPrompt: "You are a careful code reviewer.",
  timeoutMs: 30 * 60 * 1000,
});

export default smithers((ctx) => (
  <Workflow name="review">
    <Task id="analysis" output={outputs.analysis} agent={claude}>
      {`Analyze the codebase and identify potential improvements.`}
    </Task>
  </Workflow>
));
```

## Available agents

```toon
agents[8]{class,cli,defaultModel,hijack,notes}:
  ClaudeCodeAgent,claude,claude-sonnet-4-20250514,native session id,Anthropic Claude Code CLI
  CodexAgent,codex,gpt-5.3-codex,native thread id,OpenAI Codex CLI (codex exec via stdin + JSON stream)
  AntigravityAgent,agy,CLI default,native session id,Google Antigravity CLI
  PiAgent,pi,gpt-5.2-codex,native session id,Pi CLI (text/json/rpc modes + extension UI hooks)
  KimiAgent,kimi,kimi-latest,native session id,Moonshot Kimi CLI (auto-isolates KIMI_SHARE_DIR)
  ForgeAgent,forge,anthropic/claude-sonnet-4-20250514,conversation id,Forge CLI (300+ models)
  AmpAgent,amp,claude-sonnet-4-20250514,thread id,Amp CLI (--execute headless mode)
  OpenCodeAgent,opencode,provider/model string,not yet,OpenCode CLI (opencode run --format json)
```

CLI binaries must be on `PATH`: `claude`, `codex`, `agy`, `pi`, `kimi`, `forge`, `amp`, `opencode`.

## Codex CLI Agent

`CodexAgent` is the Smithers wrapper for OpenAI's `codex` CLI. It runs `codex exec` in non-interactive mode, sends the task prompt over stdin, forces `--json` so Smithers can stream structured progress, and captures the final assistant message via `--output-last-message`.

```tsx
import { CodexAgent, Task, Workflow, createSmithers } from "smithers-orchestrator";
import { z } from "zod";

const { smithers, outputs } = createSmithers({
  patch: z.object({ summary: z.string(), files: z.array(z.string()) }),
});

const codex = new CodexAgent({
  model: "gpt-5.3-codex",
  sandbox: "workspace-write",
  skipGitRepoCheck: true,
  yolo: true,
});

export default smithers(() => (
  <Workflow name="codex-implementation">
    <Task id="implement" output={outputs.patch} agent={codex}>
      Implement the requested change and summarize the edited files.
    </Task>
  </Workflow>
));
```

### Authentication

- Subscription login: run `codex login` once. For isolated accounts, pass `configDir`; Smithers sets `CODEX_HOME` for that invocation.
- API billing: pass `apiKey` or set `OPENAI_API_KEY`; Smithers forwards it to the spawned `codex` process.
- Account registry: `bunx smithers-orchestrator agents add --provider codex ...` registers a subscription config directory, while `--provider openai-api` registers API-key billing for Codex-compatible providers.

### Structured output

- If the Smithers task has an output schema and `outputSchema` is not set, Smithers writes a temporary OpenAI-compatible JSON Schema file and passes it as `--output-schema`.
- Resume attempts use `codex exec resume <thread-id>` and skip `--output-schema`, matching the Codex CLI's resume command surface.
- Hijack opens native Codex with `codex resume <thread-id> -C <cwd>`.

## Common options

All CLI agents accept the same base option surface:

```ts
type BaseCliAgentOptions = {
  id?: string;                       // Agent instance id (default: random UUID)
  model?: string;                    // Model name passed to --model
  systemPrompt?: string;             // Prepended to the user prompt
  instructions?: string;             // Alias for systemPrompt
  cwd?: string;                      // Working directory (default: tool ctx rootDir or process.cwd())
  env?: Record<string, string>;      // Extra env vars merged with process.env
  yolo?: boolean;                    // Skip permission prompts (default: true)
  timeoutMs?: number;                // Hard wall-clock cap
  idleTimeoutMs?: number;            // Inactivity cap; resets on any stdout/stderr
  maxOutputBytes?: number;           // Truncate captured output
  extraArgs?: string[];              // Additional CLI flags appended to the command
};
```

Per-call timeout override:

```ts
await agent.generate({
  prompt: "do the thing",
  timeout: { totalMs: 15 * 60 * 1000, idleMs: 2 * 60 * 1000 },
});
```

## Per-agent extras

`ClaudeCodeAgent` extends the base with Claude Code-specific session and permission flags. Key additions: `permissionMode`, `sessionId`, `mcpConfig`, `resume`.

```ts
import { ClaudeCodeAgent } from "smithers-orchestrator";
new ClaudeCodeAgent({
  permissionMode?: "acceptEdits" | "bypassPermissions" | "default" | "delegate" | "dontAsk" | "plan";
  allowedTools?: string[]; disallowedTools?: string[]; disableSlashCommands?: boolean;
  addDir?: string[]; file?: string[]; fromPr?: string; fallbackModel?: string;
  appendSystemPrompt?: string; agents?: Record<string, { description?: string; prompt?: string }> | string;
  agent?: string; tools?: string[] | "default" | "";
  betas?: string[]; pluginDir?: string[]; resume?: string; sessionId?: string;
  mcpConfig?: string[]; mcpDebug?: boolean; maxBudgetUsd?: number; jsonSchema?: string;
  configDir?: string; apiKey?: string;
  dangerouslySkipPermissions?: boolean; allowDangerouslySkipPermissions?: boolean; chrome?: boolean; noChrome?: boolean;
  continue?: boolean; forkSession?: boolean; noSessionPersistence?: boolean; replayUserMessages?: boolean;
  debug?: boolean | string; debugFile?: string; ide?: boolean; includePartialMessages?: boolean;
  inputFormat?: "text" | "stream-json"; settingSources?: string; settings?: string; strictMcpConfig?: boolean; verbose?: boolean;
  outputFormat?: "text" | "json" | "stream-json"; // default stream-json
});
```

`CodexAgent` extends the base with OpenAI Codex-specific flags. Key additions: `sandbox`, `config`, `outputSchema`.

```ts
import { CodexAgent } from "smithers-orchestrator";
new CodexAgent({
  sandbox?: "read-only" | "workspace-write" | "danger-full-access";
  fullAuto?: boolean; dangerouslyBypassApprovalsAndSandbox?: boolean;
  config?: Record<string, string | number | boolean | object | null> | string[];
  enable?: string[]; disable?: string[];
  oss?: boolean; localProvider?: string;
  image?: string[]; profile?: string; cd?: string; addDir?: string[];
  skipGitRepoCheck?: boolean; color?: "always" | "never" | "auto";
  outputSchema?: string; outputLastMessage?: string; json?: boolean;
  configDir?: string; apiKey?: string;
});
```

`AntigravityAgent` wraps the Google `agy` CLI. Key additions: `allowedMcpServerNames`, `geminiDir`, `conversation`, `continue`, and `resume`.

```ts
import { AntigravityAgent } from "smithers-orchestrator";
new AntigravityAgent({
  model?: string; sandbox?: boolean;
  yolo?: boolean; dangerouslySkipPermissions?: boolean;
  allowedMcpServerNames?: string[]; allowedTools?: string[];
  conversation?: string; continue?: boolean; resume?: string;
  includeDirectories?: string[];
  extensions?: string[]; listExtensions?: boolean;
  listSessions?: boolean; deleteSession?: string;
  screenReader?: boolean; outputFormat?: "text" | "json" | "stream-json";
  debug?: boolean;
  binary?: string; configDir?: string; geminiDir?: string; apiKey?: string;
  // Deprecated and rejected at runtime: debug, screenReader, outputFormat,
  // extensions, listExtensions, listSessions, deleteSession.
});
```

Current `agy` builds changed several Gemini-era flags. Smithers treats that as a
runtime contract, not a best-effort pass-through:

| Smithers option | Emitted `agy` surface |
|---|---|
| `includeDirectories` | `--add-dir` |
| `conversation` / `resume` | `--conversation <id>` |
| `continue` | `--continue` |
| `configDir` / `geminiDir` | `--gemini_dir <dir>` and `GEMINI_DIR=<dir>` |
| prompt text | `-p <prompt>` |

Smithers does not emit `--output-format`, `--include-directories`, `--resume`,
`--screen-reader`, `--debug`, extension flags, session-list flags, or
`--prompt` for Antigravity. Options that would require those removed flags fail
fast with `AGENT_CONFIG_INVALID` and a replacement hint. Plugins are managed
outside workflow launch through `agy plugin`.

`PiAgent` wraps the Pi CLI and adds extension UI hook support. Key additions: `mode`, `onExtensionUiRequest`, `extension`, `thinking`.

```ts
import { PiAgent, type PiExtensionUiRequest, type PiExtensionUiResponse } from "smithers-orchestrator";
new PiAgent({
  provider?: string; apiKey?: string; appendSystemPrompt?: string; mode?: "text" | "json" | "rpc";
  print?: boolean; continue?: boolean; resume?: boolean; session?: string;
  sessionDir?: string; noSession?: boolean;
  models?: string | string[]; listModels?: boolean | string;
  extension?: string[]; skill?: string[]; promptTemplate?: string[]; theme?: string[];
  noExtensions?: boolean; noSkills?: boolean; noPromptTemplates?: boolean; noThemes?: boolean;
  tools?: string[]; noTools?: boolean; files?: string[];
  thinking?: "off" | "minimal" | "low" | "medium" | "high" | "xhigh";
  export?: string; verbose?: boolean;
  onExtensionUiRequest?: (req: PiExtensionUiRequest) => Promise<PiExtensionUiResponse | null> | PiExtensionUiResponse | null;
});
```

`KimiAgent` wraps the Moonshot Kimi CLI with automatic session isolation. Key additions: `thinking`, `agent`, `maxRalphIterations`.

```ts
import { KimiAgent } from "smithers-orchestrator";
new KimiAgent({
  thinking?: boolean; outputFormat?: "text" | "stream-json";
  finalMessageOnly?: boolean; quiet?: boolean;
  agent?: "default" | "okabe"; agentFile?: string;
  workDir?: string; session?: string; continue?: boolean;
  skillsDir?: string; mcpConfigFile?: string[]; mcpConfig?: string[];
  maxStepsPerTurn?: number; maxRetriesPerStep?: number; maxRalphIterations?: number;
  verbose?: boolean; debug?: boolean; configDir?: string;
});
```

`ForgeAgent` wraps the Forge CLI and supports 300+ models via provider/model strings. Key additions: `conversationId`, `provider`, `workflow`.

```ts
import { ForgeAgent } from "smithers-orchestrator";
new ForgeAgent({
  directory?: string; provider?: string; agent?: string;
  conversationId?: string; sandbox?: string; restricted?: boolean;
  verbose?: boolean; workflow?: string; event?: string; conversation?: string;
});
```

`AmpAgent` wraps the Amp CLI in `--execute` headless mode. Key additions: `visibility`, `mcpConfig`, `dangerouslyAllowAll`.

```ts
import { AmpAgent } from "smithers-orchestrator";
new AmpAgent({
  visibility?: "private" | "public" | "workspace" | "group";
  mcpConfig?: string; settingsFile?: string;
  logLevel?: "error" | "warn" | "info" | "debug" | "audit"; logFile?: string;
  dangerouslyAllowAll?: boolean; ide?: boolean; jetbrains?: boolean;
});
```

`OpenCodeAgent` wraps the OpenCode CLI via `opencode run --format json`. Key additions: `agentName`, `continueSession`, `sessionId`. Note: native hijack support is not yet shipped.

```ts
import { OpenCodeAgent } from "smithers-orchestrator";
new OpenCodeAgent({
  model?: string; agentName?: string;
  attachFiles?: string[];
  continueSession?: boolean; sessionId?: string;
  variant?: string;
});
```

## Hijack handoff

Most built-in CLI agents support `bunx smithers-orchestrator hijack RUN_ID`, which relaunches the agent in its native CLI session for interactive takeover.

Smithers persists the native session or conversation id on each task event. On hijack, it waits for a safe boundary between blocking tool calls, then reopens the session via the vendor's resume flag:

| Agent class | Resume flag |
|---|---|
| `ClaudeCodeAgent` | `claude --resume` |
| `CodexAgent` | `codex resume` |
| `AntigravityAgent` | `agy --conversation` |
| `PiAgent` | `pi --session` |
| `KimiAgent` | `kimi --session` |
| `ForgeAgent` | `forge --conversation-id` |
| `AmpAgent` | `amp threads continue` |

On clean exit the workflow resumes in detached mode. OpenCode stream capture is documented above, but native `bunx smithers-orchestrator hijack` support for OpenCode is not shipped yet. See How it works → Durability and resume.

## Notes

- **Yolo defaults.** `yolo: true` (default) maps to each CLI's "skip approvals" flag (`--dangerously-skip-permissions`, `--dangerously-bypass-approvals-and-sandbox`, `--yolo`, `--dangerously-allow-all`). Set `yolo: false` or use the agent-specific approval option for tighter control.
- **Pi rpc mode** sends prompts as JSON over stdin and is required for `onExtensionUiRequest` callbacks; text/json modes pass the prompt as a positional arg with `files` emitted as `@path`.
- **Kimi share dir.** `KimiAgent` auto-creates an isolated `KIMI_SHARE_DIR` per invocation to prevent `kimi.json` corruption under concurrent runs. Set `env.KIMI_SHARE_DIR` to opt out.
- **Antigravity config.** `AntigravityAgent` launches the `agy` binary and passes `configDir`/`geminiDir` as both `--gemini_dir` and `GEMINI_DIR`, matching Antigravity's `~/.gemini/antigravity-cli` config root. Current `agy` prompts use `-p`, extra directories use `--add-dir`, and native resume uses `--conversation`.
- **Non-idempotent retries.** When a `<Task>` retries, Smithers prepends a warning listing previously-called side-effect tools so the agent can verify external state before re-invoking them.

---

## SDK Agents

> Provider-backed AI SDK agent wrappers for Anthropic and OpenAI that work like first-class Smithers agents.

`AnthropicAgent` and `OpenAIAgent` are thin wrappers around the AI SDK `ToolLoopAgent` with class-style ergonomics matching the CLI agents.

## Import

```ts
import {
  AnthropicAgent,
  OpenAIAgent,
  tools,
} from "smithers-orchestrator";
import { stepCountIs } from "ai";
```

## Quick Start

```ts
const claude = new AnthropicAgent({
  model: "claude-opus-4-7",
  tools,
  instructions: "You are a careful planner.",
  stopWhen: stepCountIs(40),
});

const codex = new OpenAIAgent({
  model: "gpt-5.3-codex",
  tools,
  instructions: "You are a precise implementation agent.",
  stopWhen: stepCountIs(40),
});
```

```tsx
{/* outputs comes from createSmithers() */}
<Task id="plan" output={outputs.plan} agent={claude}>
  Analyze the repository and propose a migration plan.
</Task>
```

## Model Input

Both classes accept a model ID string (`"claude-opus-4-7"`, `"gpt-5.3-codex"`) or a prebuilt AI SDK language model instance.

## Options

Constructors forward standard AI SDK `ToolLoopAgent` settings: `instructions`, `tools`, `stopWhen`, `maxOutputTokens`, `temperature`, `providerOptions`, `prepareCall`. The wrapper adds `model`, which resolves model-ID strings automatically.

For `OpenAIAgent`, pass `baseURL` and `apiKey` directly when targeting an OpenAI-compatible endpoint instead of the default OpenAI API. This is the simplest path for local servers such as llama.cpp:

```ts
const local = new OpenAIAgent({
  model: "llama-3.1-8b-instruct",
  baseURL: "http://127.0.0.1:8080/v1",
  apiKey: "none",
  tools,
  instructions: "You are a local coding assistant.",
  stopWhen: stepCountIs(40),
});
```

Set `apiKey: "none"` in the `OpenAIAgent` config when your local server accepts OpenAI-compatible requests but does not require a real key.

Some OpenAI-compatible local servers accept chat requests but do not reliably implement JSON schema structured output. For those servers, keep the output schema on the Smithers task and disable native structured output on the agent so Smithers uses prompt-based JSON extraction instead:

```ts
const local = new OpenAIAgent({
  model: "llama-3.1-8b-instruct",
  baseURL: "http://127.0.0.1:8080/v1",
  apiKey: "none",
  nativeStructuredOutput: false,
  tools,
  instructions: "You are a local coding assistant.",
  stopWhen: stepCountIs(40),
});
```

For advanced provider setup, create the AI SDK OpenAI provider yourself and pass the prebuilt model into `OpenAIAgent`:

```ts
import { createOpenAI } from "@ai-sdk/openai";

const localOpenAI = createOpenAI({
  baseURL: "http://127.0.0.1:8080/v1",
  apiKey: "none",
});

const local = new OpenAIAgent({
  model: localOpenAI("llama-3.1-8b-instruct"),
  tools,
  instructions: "You are a local coding assistant.",
  stopWhen: stepCountIs(40),
});
```

Use the `createOpenAI` path when you need provider-level configuration beyond `baseURL` and `apiKey`; in that form, `apiKey: "none"` belongs in the `createOpenAI` config.

## Hermes

[Hermes](https://github.com/NousResearch/hermes-agent) (Nous Research) exposes an OpenAI-compatible HTTP API, so `HermesAgent` is a convenience subclass of `OpenAIAgent` that points the provider at your Hermes server and disables native structured output by default (a local Hermes server may not honor JSON-schema response formats).

```ts
import { HermesAgent } from "smithers-orchestrator";

const hermes = new HermesAgent({
  baseURL: "http://127.0.0.1:5123/v1", // or set HERMES_BASE_URL
  model: "hermes",                      // whatever model id your server advertises
  tools,
  instructions: "You are a careful implementation agent.",
  stopWhen: stepCountIs(40),
});
```

`baseURL` falls back to the `HERMES_BASE_URL` env var and `apiKey` to `HERMES_API_KEY` (then `"hermes"`). Pass `nativeStructuredOutput: true` if your server does honor JSON-schema output. To use Smithers *from* Hermes instead of running Hermes as a worker, see Agent Support → Hermes.

## Hijack Support

SDK agents do not reopen a provider-native CLI. Smithers persists the agent conversation and reopens it through a Smithers-managed REPL via `bunx smithers-orchestrator hijack RUN_ID`.

Live-run behavior:

- Smithers captures response history after each step via `onStepFinish`.
- `bunx smithers-orchestrator hijack` waits until history is durable, aborts the current agent task (handing it off to the REPL), and opens the REPL.
- On clean REPL exit, Smithers writes updated message history back and resumes the workflow automatically.

Limits:

- Smithers reconstructs the agent from the workflow source on hijack. This means cross-engine hijack is not supported: the REPL will use the same agent class that ran originally.

## CLI vs SDK

| | CLI Agents | SDK Agents |
|---|---|---|
| Billing | Provider subscription / local CLI | API billing |
| Tools | Provider CLI tool ecosystem | Smithers tools sandbox |
| Flexibility | Native CLI flags | AI SDK `providerOptions` |

Pass a raw `ToolLoopAgent` directly if you prefer; the wrappers are convenience, not a separate runtime.

## Example: Dual Setup

```ts
const useCli = process.env.USE_CLI_AGENTS === "1";

export const claude = useCli
  ? new ClaudeCodeAgent({
      model: "claude-opus-4-7",
      dangerouslySkipPermissions: true,
    })
  : new AnthropicAgent({
      model: "claude-opus-4-7",
      tools,
      instructions: "You are a careful planner.",
      stopWhen: stepCountIs(40),
    });
```


---

## Built-in Tools

> Sandboxed file and shell tools for AI agent tasks, with exact input schemas, security policies, and usage examples.

```ts
import { tools, read, write, edit, grep, bash, defineTool, getDefinedToolMetadata } from "smithers-orchestrator";
```

`tools` bundles all five tools keyed by name:

```ts
const { read, write, edit, grep, bash } = tools;
```

The `smithers-orchestrator/tools` subpath also exports lower-level helpers for advanced integrations:

| Export | Purpose |
|---|---|
| `readFileTool`, `writeFileTool`, `editFileTool`, `grepTool`, `bashTool` | Call the underlying implementation directly instead of the AI SDK tool wrapper. |
| `getDefinedToolMetadata(tool)` | Read Smithers metadata (`name`, `sideEffect`, `idempotent`) from a `defineTool()` result. |
| `getToolContext()`, `runWithToolContext(ctx, fn)` | Inspect or provide the task-local tool runtime context. |
| `getToolIdempotencyKey(ctx?)`, `nextToolSeq(ctx)` | Build stable idempotency keys and task-local tool-call sequence numbers. |
| `BASH_TOOL_MAX_*` constants | Upper bounds for bash command length, args, cwd, output bytes, and timeout. |

## Sandboxing

All tools are sandboxed to `rootDir` (defaults to the workflow directory). Paths are resolved relative to this root; escapes via symlinks are rejected.

| Policy | Behavior |
|---|---|
| Path resolution | Relative paths resolve against `rootDir`. Absolute paths must fall within root. |
| Symlinks | Rejected if target is outside sandbox. |
| Output size | Truncated to `maxOutputBytes` (default 200KB). |
| Timeouts | `bash` and `grep` default to 60s; exceeded processes killed with `SIGKILL`. |
| Network | `bash` blocks network commands by default. See bash. |

## Tool Call Logging

Every invocation is logged to `_smithers_tool_calls`:

| Field | Description |
|---|---|
| `runId` | Workflow run ID |
| `nodeId` | Task node that invoked the tool |
| `iteration` | Loop iteration |
| `attempt` | Retry attempt number |
| `seq` | Sequential call counter within the task |
| `toolName` | `read`, `write`, `edit`, `grep`, or `bash` |
| `inputJson` | Serialized input arguments |
| `outputJson` | Serialized output (truncated if over limit) |
| `startedAtMs` | Start timestamp |
| `finishedAtMs` | End timestamp |
| `status` | `"success"` or `"error"` |
| `errorJson` | Error details (if `"error"`) |

## defineTool

`defineTool()` wraps custom AI SDK tools with Smithers runtime context, deterministic idempotency keys, and durable tool-call logging.

```ts
import { defineTool } from "smithers-orchestrator";
import { z } from "zod";

const placeOrder = defineTool({
  name: "wholefoods.place_order",
  description: "Place a grocery order",
  schema: z.object({
    sku: z.string(),
  }),
  sideEffect: true,
  idempotent: false,
  async execute(args, ctx) {
    return await wholeFoods.placeOrder({
      sku: args.sku,
      idempotencyKey: ctx.idempotencyKey,
    });
  },
});
```

- `ctx.idempotencyKey` is stable across retries and resumes for the same task iteration.
- `sideEffect: true` opts the tool into Smithers side-effect tracking.
- `idempotent: false` warns resumed/retried agent loops when the tool was called in a previous attempt.
- Every `defineTool()` call is logged to `_smithers_tool_calls`.

### Side Effects and Idempotency

Every custom tool that modifies external state **must** declare `sideEffect: true`. This is how Smithers protects your workflow during retries and resumes. Without it, Smithers treats the tool as a pure read and replays it freely, potentially sending duplicate emails, double-charging payments, or creating duplicate records.

The two flags work together:

| `sideEffect` | `idempotent` | Smithers behavior |
|---|---|---|
| `false` (default) | `true` (default) | Pure read. Safe to replay on retry. No warnings. |
| `true` | `true` | Mutates external state, but calling twice with the same input produces the same result (e.g. an upsert, a PUT request). Safe to replay. No warnings. |
| `true` | `false` | Mutates external state and is **not** safe to replay (e.g. sending an email, placing an order, charging a payment). On retry, Smithers injects a warning listing the tool as already called so the agent can verify external state before calling again. |

With `sideEffect: true` and `idempotent: false`, Smithers does two things on retry:

1. **Warns the agent.** The retry prompt lists which non-idempotent tools were already called.
2. **Provides a stable idempotency key.** `ctx.idempotencyKey` is deterministic for a given task + iteration; pass it to external APIs that support idempotency (Stripe, AWS) to deduplicate.

If your `execute` function has `sideEffect: true, idempotent: false` but omits the `ctx` parameter, Smithers logs a startup warning. This is almost always a bug: you need `ctx.idempotencyKey` to handle retries safely.

```ts
// ✗ Bad: non-idempotent side effect without ctx
const sendEmail = defineTool({
  name: "email.send",
  schema: z.object({ to: z.string(), body: z.string() }),
  sideEffect: true,
  idempotent: false,
  async execute(args) {  // ← missing ctx parameter, Smithers warns
    await mailer.send(args);
  },
});

// ✓ Good: uses ctx.idempotencyKey to deduplicate
const sendEmail = defineTool({
  name: "email.send",
  schema: z.object({ to: z.string(), body: z.string() }),
  sideEffect: true,
  idempotent: false,
  async execute(args, ctx) {
    await mailer.send({ ...args, idempotencyKey: ctx.idempotencyKey });
  },
});
```

### What counts as a side effect

The rule is simple: **if you cannot undo it with `git reset`, mark it as a side effect.**

A side effect is any mutation the runtime should not blindly repeat on retry. If a custom tool talks to an external API, writes to a database, sends a message, or triggers a webhook, mark it.

The built-in `write` and `edit` tools are registered as `sideEffect: true` and `idempotent: false` because their file mutations are not safe to blindly replay on retry; like `bash`, they are treated conservatively. All three built-in mutating tools (`write`, `edit`, `bash`) are side-effecting.

| Tool | Side effect? | Why |
|---|---|---|
| Built-in `read`, `grep` | No | Pure reads |
| Built-in `write`, `edit` | **Yes** | Sandboxed file writes are tracked by git, but replaying on retry could overwrite content that diverged since the first call |
| Built-in `bash` | **Yes** | Arbitrary shell commands may not be safe to repeat |
| Custom tool calling an external API | **Yes** | Mutates state outside the sandbox |
| Custom tool writing to a database | **Yes** | External persistent state |
| Custom tool sending a Slack message | **Yes** | Irreversible external communication |
| Custom tool creating a GitHub PR | **Yes** | External state visible to others |

---

## read

Read a file from the sandbox.

```ts
{ path: string }  // relative to rootDir or absolute
```

Returns file contents as UTF-8. Throws `"File too large"` if size exceeds `maxOutputBytes`.

```ts
import { ToolLoopAgent as Agent } from "ai";
import { anthropic } from "@ai-sdk/anthropic";
import { read, grep } from "smithers-orchestrator";

const codeAgent = new Agent({
  model: anthropic("claude-sonnet-4-20250514"),
  tools: { read, grep },
});
```

```tsx
{/* outputs comes from createSmithers() */}
<Task id="review" output={outputs.review} agent={codeAgent}>
  Read the file src/auth.ts and identify any security vulnerabilities.
</Task>
```

---

## write

Write content to a file. Creates parent directories as needed.

```ts
{
  path: string      // relative to rootDir or absolute
  content: string
}
```

Returns `"ok"`. Throws `"Content too large"` if content exceeds `maxOutputBytes`. Logs content hash (SHA-256) and byte size; full content is not stored.

```ts
import { ToolLoopAgent as Agent } from "ai";
import { anthropic } from "@ai-sdk/anthropic";
import { write, read } from "smithers-orchestrator";

const writerAgent = new Agent({
  model: anthropic("claude-sonnet-4-20250514"),
  tools: { write, read },
});
```

---

## edit

Apply a unified diff patch to an existing file.

```ts
{
  path: string    // file to patch
  patch: string   // unified diff format
}
```

Returns `"ok"`. The file must exist. Reads current contents, applies the patch via `applyPatch`, writes back. Throws on size limits (`"Patch too large"`, `"File too large"`) or mismatched context (`"Failed to apply patch"`). Logs patch hash and byte size.

```
--- a/src/auth.ts
+++ b/src/auth.ts
@@ -10,3 +10,4 @@
   const token = jwt.sign(payload, secret);
+  logger.info("Token issued", { userId: payload.sub });
   return token;
```

---

## grep

Search for a regex pattern using `ripgrep`.

```ts
{
  pattern: string    // regex
  path?: string      // directory or file (default: rootDir)
}
```

Returns matching lines with file paths and line numbers (`rg -n` format). Exit code 1 (no matches) returns empty string. Exit code 2 throws stderr as error. Requires `ripgrep` in PATH.

```
src/auth.ts:15:  if (token.expired()) {
src/auth.ts:42:  validateToken(token);
tests/auth.test.ts:8:  const token = createTestToken();
```

---

## bash

Execute a shell command.

```ts
{
  cmd: string                     // executable or command
  args?: string[]                 // arguments
  opts?: { cwd?: string }        // working directory (sandboxed)
}
```

Returns combined stdout and stderr. Working directory defaults to `rootDir`. Timeout: 60s (killed with `SIGKILL` via process group). Non-zero exit codes throw.

### Network Blocking

Controlled by `allowNetwork` in `RunOptions`, `--allow-network` on CLI, or server config. Default: blocked.

When blocked, the command string (executable + args) is checked against these fragments:

| Category | Blocked strings |
|---|---|
| HTTP clients | `curl`, `wget` |
| URL prefixes | `http://`, `https://` |
| Package managers | `npm`, `bun`, `pip` |
| Git remote ops | `git push`, `git pull`, `git fetch`, `git clone`, `git remote` |

Local git commands (`git status`, `git diff`, `git log`) are allowed.

```ts
import { ToolLoopAgent as Agent } from "ai";
import { anthropic } from "@ai-sdk/anthropic";
import { bash } from "smithers-orchestrator";

const devAgent = new Agent({
  model: anthropic("claude-sonnet-4-20250514"),
  tools: { bash },
});
```

```tsx
{/* outputs comes from createSmithers() */}
<Task id="lint" output={outputs.lint} agent={devAgent}>
  Run the linter on src/ and report any issues.
</Task>
```

---

## Using Tools with Agents

Pass tools to an AI SDK agent and assign the agent to a `<Task>`:

```tsx
import { ToolLoopAgent as Agent } from "ai";
import { anthropic } from "@ai-sdk/anthropic";
import { createSmithers, read, write, edit, grep, bash, Task } from "smithers-orchestrator";
import { z } from "zod";

const codeAgent = new Agent({
  model: anthropic("claude-sonnet-4-20250514"),
  tools: { read, write, edit, grep, bash },
  instructions: "You are a senior engineer. Use the available tools to complete tasks.",
});

const { Workflow, smithers, outputs } = createSmithers({
  result: z.object({ summary: z.string() }),
});

export default smithers((ctx) => (
  <Workflow name="refactor">
    <Task id="refactor" output={outputs.result} agent={codeAgent}>
      {`Refactor the function in ${ctx.input.file} to improve readability.`}
    </Task>
  </Workflow>
));
```

The full bundle works too:

```ts
import { ToolLoopAgent as Agent } from "ai";
import { anthropic } from "@ai-sdk/anthropic";
import { tools } from "smithers-orchestrator";

const fullAgent = new Agent({
  model: anthropic("claude-sonnet-4-20250514"),
  tools,
});
```

## Configuration

| Option | Default | Description |
|---|---|---|
| `rootDir` | Workflow directory | Sandbox root |
| `allowNetwork` | `false` | Allow network commands in `bash` |
| `maxOutputBytes` | `200000` (200KB) | Max output size per tool |
| `toolTimeoutMs` | `60000` (60s) | Timeout for `bash` and `grep` |

```ts
import { Effect } from "effect";
import { runWorkflow } from "smithers-orchestrator";

const result = await Effect.runPromise(runWorkflow(workflow, {
  input: { file: "src/auth.ts" },
  rootDir: "/home/project",
  allowNetwork: false,
  maxOutputBytes: 500_000,
  toolTimeoutMs: 120_000,
}));
```


---

## Common External Tools

> Patterns for hitting GitHub, Linear, Notion, Slack, Obsidian as tools.

For each external service, you have three choices:

1. **OpenAPI tools**: point `createOpenApiTools()` at the service's OpenAPI spec. See OpenAPI tools.
2. **CLI in a task**: if the service has a CLI (`gh`, `linear`, `notion`, `slack`), run it inside a `<Task>` via the `bash` tool. See side-effect tools with idempotency.
3. **Custom `defineTool`**: wrap the service's REST API in a Zod-validated tool. See `defineTool`.

### Example: CLI in a task (Pattern 2)

```tsx
<Task
  id="open-pr"
  tools={[bash]}
  prompt="Run: gh pr create --title 'Fix login bug' --body 'Closes #42'"
/>
```

Run `gh auth login` once on the host before executing. The `bash` tool executes in the task's sandbox, so credentials set in the environment carry through automatically.

## Quick decision

| Service | Recommended approach |
|---|---|
| GitHub      | `gh` CLI in a task (auth via `gh auth login`) |
| Linear      | `linear` CLI in a task, or OpenAPI tools |
| Notion      | OpenAPI tools (Notion publishes a spec) |
| Slack       | OpenAPI tools or `slack` CLI |
| Obsidian    | `bash` tool with vault path; no API needed |

Always mark side-effecting tools with `sideEffect: true` and use `ctx.idempotencyKey` so retries don't double-fire.

---

## Ecosystem

> Community projects built on Smithers.

## Burns

Workspace-first local control plane for Smithers. Single UI for authoring, running, and supervising workflows across repositories. Register repos, launch runs, stream events, inspect frames, handle approvals.

- React web app, ElectroBun (a Bun-native Electron alternative) desktop shell, or headless CLI
- AI-assisted workflow authoring via local agent CLIs
- SQLite-backed workspace registry

<Card title="Burns" icon="github" href="https://github.com/l3wi/burns">
  github.com/l3wi/burns
</Card>

## Ralphinho

Multi-agent development workflows. Two independent workflows:

- **Ralphinho** (scheduled-work) -- decomposes RFC into work units, runs tier-based quality pipelines (implement, test, review), lands via merge queue with CI verification.
- **Improvinho** (review-discovery) -- three parallel discovery lenses (refactoring, type safety, architecture), deduplicates findings. Optionally pushes to Linear.

Requires Bun and Jujutsu (`jj`). Supports Claude and Codex agents.

<Card title="Ralphinho" icon="github" href="https://github.com/enitrat/ralphinho">
  github.com/enitrat/ralphinho
</Card>

## Cairo Coder

AI-powered Cairo smart contract generator. RAG pipeline (DSPy) converting natural language to Cairo contracts for Starknet. Uses Smithers with Claude and Codex agents.

<Card title="Cairo Coder" icon="github" href="https://github.com/KasarLabs/cairo-coder">
  github.com/KasarLabs/cairo-coder
</Card>

## Agentix

Opinionated RFC-to-production orchestrator. Multi-phase pipelines: research, plan, implement, test, review. Role-based agents, conflict-aware merge queues, security/performance gates. DDD + BDD + TDD by default.

<Card title="Agentix" icon="github" href="https://github.com/AbdelStark/agentix">
  github.com/AbdelStark/agentix
</Card>

## Era

Generic multi-phase development workflow engine. Research, Plan, Implementation, Testing, Review, Fix, Final Review pipeline with outer Loop. Role-based agents, intelligent caching, dual-layer prompts.

<Card title="Era" icon="github" href="https://github.com/ClementWalter/era">
  github.com/ClementWalter/era
</Card>

## Local Isolated Ralph

Kubernetes-native Smithers workflow runner. Runs workflows as isolated K8s Jobs and CronJobs via k3s/k3d. Sandboxed container execution.

<Card title="Local Isolated Ralph" icon="github" href="https://github.com/SamuelLHuber/local-isolated-ralph">
  github.com/SamuelLHuber/local-isolated-ralph
</Card>

## Publishing Workflow Packs

Use the Workflow Catalog metadata when publishing a pack or success story: workflow IDs, required agents, Gateway scopes, inputs, outputs, write behavior, sandbox runtime, and validation command. That keeps community projects installable by operators who did not author the underlying TSX.

---

## PI Integration

> Use PI as a Smithers workflow CLI backend and understand how PI extensibility composes with Smithers declarative orchestration.

Smithers provides deterministic orchestration (workflow graph, approvals, retries, durable state). PI provides adaptive agent capabilities (providers, models, extensions, skills, prompt templates). Use both when you need deterministic execution with flexible agent behavior.

## Integration Modes

### 1) PI as Workflow Agent

```tsx
import { PiAgent } from "smithers-orchestrator";

const pi = new PiAgent({
  provider: "openai",
  model: "gpt-5.2-codex",
  mode: "text",
});

{/* outputs comes from createSmithers() */}
<Task id="implementation" output={outputs.implementation} agent={pi}>
  {`Implement feature X and explain tradeoffs.`}
</Task>
```

`PiAgent` supports all PI CLI flags: provider/model, tools, extensions, skills, prompt templates, themes, sessions. Text mode uses `--print` by default; JSON/RPC modes set `--mode` and omit `--print`.

PI sessions are first-class hijack targets. `bunx smithers-orchestrator hijack RUN_ID --target pi` reopens the PI session for local steering.

### 2) PI Server Client

Drive Smithers server APIs from a PI extension or any Node process via `@smithers-orchestrator/pi-plugin`:

```ts
import { runWorkflow, approve, streamEvents } from "@smithers-orchestrator/pi-plugin";
```

The older `smithers-orchestrator/pi-plugin` and `smithers-orchestrator/pi-extension` subpaths were removed; use the scoped package directly.

### 3) Hybrid: PI Extensibility + Smithers Orchestration

- Keep orchestration in Smithers (`<Sequence>`, `<Parallel>`, `<Branch>`, `<Loop>`).
- Run adaptive logic in PI tasks (extensions/skills/provider overrides).

Patterns:

1. PI skill-driven coding task inside a Smithers `<Task>`.
2. PI extension command that starts/resumes Smithers workflows via server API or `@smithers-orchestrator/pi-plugin`.
3. Smithers workflow output persisted to SQLite and consumed by later PI-assisted tasks.

## Hijacking PI Sessions

PI is a native-session hijack backend.

- Live run: Smithers watches PI's event stream, waits between blocking tool calls, then hands off the session.
- Finished/cancelled run: Smithers reopens the latest persisted PI session.
- Relaunch uses the stored session ID: `pi --session <id>`.
- Clean exit resumes the workflow automatically.

Session persistence:

- `PiAgent` defaults `noSession` to `true` for one-shot calls.
- For workflow hijack/resume/streaming, Smithers keeps session persistence enabled automatically.
- `mode: "json"` is not required for hijack support.

## Setup

1. Install PI CLI and add to `PATH`.
2. Configure PI credentials via env/config (prefer over CLI args for API keys).
3. Instantiate `PiAgent` with explicit options in workflows.
4. For server-driven workflows, use `@smithers-orchestrator/pi-plugin`.

```bash
# Verify PI is installed
pi --version
```

## Design Guidance

| Use `PiAgent` tasks when | Use Smithers-native tasks when |
|---|---|
| You need PI capabilities inside deterministic workflows | You need strict reproducibility and narrow tool contracts |
| You want PI calls as auditable workflow steps | |

## Limitations

Smithers does not provide a chat interface for PI. Chat UI integration is the responsibility of the host application using `@smithers-orchestrator/pi-plugin`.


===============================================================================

# Smithers Events

> Smithers event surface: how to subscribe, the event categories, and the full SmithersEvent discriminated union.

---

## Events

> Subscribe to lifecycle events. Full event union lives in Types.

`SmithersEvent` is a discriminated union of every lifecycle event the runtime emits. The full type definition lives in Types; that's the source of truth for field shapes.

## Subscribe via `onProgress`

```ts
import { runWorkflow } from "smithers-orchestrator";
import { Effect } from "effect";
import workflow from "./workflow";

await Effect.runPromise(runWorkflow(workflow, {
  input: { task: "fix bug" },
  onProgress: (event) => {
    if (event.type === "NodeStarted")  console.log(`▶ ${event.nodeId} (attempt ${event.attempt})`);
    if (event.type === "NodeFinished") console.log(`✓ ${event.nodeId}`);
    if (event.type === "NodeFailed")   console.error(`✗ ${event.nodeId}`, event.error);
  },
}));
```

## Read from the NDJSON log

Events append to `.smithers/executions/<runId>/logs/stream.ndjson` (configure with `logDir` / `--log-dir`; disable with `--no-log`).

```bash
tail -f .smithers/executions/<runId>/logs/stream.ndjson | jq .
jq 'select(.type == "NodeFailed")' .smithers/executions/<runId>/logs/stream.ndjson
jq -r .type .smithers/executions/<runId>/logs/stream.ndjson | sort | uniq -c | sort -rn
```

Or with the CLI:

```bash
bunx smithers-orchestrator events RUN_ID --json
bunx smithers-orchestrator events RUN_ID --type tool-call --node analyze
```

## Common fields

```ts
type CommonFields    = { type: string; runId: string; timestampMs: number };
type NodeScoped      = CommonFields & { nodeId: string; iteration: number };
type AttemptScoped   = NodeScoped   & { attempt: number };
```

Every event includes `type`, `runId`, `timestampMs`. Node-scoped events add `nodeId` and `iteration`. Attempt-scoped add `attempt`.

## Event categories

Used by `bunx smithers-orchestrator events --type <category>` and the metrics layer.

| Category | Events |
|---|---|
| `run` | RunAutoResumed, RunAutoResumeSkipped, RunStarted, RunStatusChanged, RunStateChanged, RunFinished, RunFailed, RunCancelled, RunContinuedAsNew, RunHijackRequested, RunHijacked, RetryTaskStarted, RetryTaskFinished, RunForked, ReplayStarted |
| `frame` | FrameCommitted |
| `node` | NodePending, NodeStarted, TaskHeartbeat, TaskHeartbeatTimeout, NodeFinished, NodeFailed, NodeCancelled, NodeSkipped, NodeRetrying, NodeWaitingApproval, NodeWaitingTimer |
| `approval` | ApprovalRequested, ApprovalGranted, ApprovalAutoApproved, ApprovalDenied |
| `tool-call` | ToolCallStarted, ToolCallFinished |
| `agent` | AgentEvent, AgentTraceEvent, AgentTraceSummary, AgentSessionEvent |
| `output` | NodeOutput |
| `revert` | RevertStarted, RevertFinished, TimeTravelStarted, TimeTravelFinished, TimeTravelJumped |
| `workflow` | WorkflowReloadDetected, WorkflowReloaded, WorkflowReloadFailed, WorkflowReloadUnsafe |
| `scorer` | ScorerStarted, ScorerFinished, ScorerFailed |
| `token` | TokenUsageReported |
| `timer` | TimerCreated, TimerFired, TimerCancelled |
| `memory` | MemoryFactSet, MemoryRecalled, MemoryMessageSaved |
| `openapi` | OpenApiToolCalled |
| `sandbox` | SandboxCreated, SandboxShipped, SandboxHeartbeat, SandboxBundleReceived, SandboxCompleted, SandboxFailed, SandboxDiffReviewRequested, SandboxDiffAccepted, SandboxDiffRejected |
| `snapshot` | SnapshotCaptured |
| `supervisor` | SupervisorStarted, SupervisorPollCompleted |

## Built-in metrics

| Event | Metric |
|---|---|
| `RunStarted` | `smithers.runs.total` |
| `NodeStarted` | `smithers.nodes.started` |
| `NodeFinished` | `smithers.nodes.finished` |
| `NodeFailed` | `smithers.nodes.failed` |
| `ApprovalGranted` / `ApprovalDenied` | Approval counters |
| `TokenUsageReported` | Token usage counters per model/agent |

`trackSmithersEvent` from `smithers-orchestrator/observability` exposes this mapping for custom integrations. See Observability for the full OTLP/Prometheus setup.

---

## Event Types

> Current SmithersEvent variants, event categories, and top-level fields.

`SmithersEvent` is the discriminated union emitted by the runtime, persisted to the event log, passed to `runWorkflow({ onProgress })`, streamed by Gateway, and filtered by `bunx smithers-orchestrator events --type <category>`.

Every variant has `type`, `runId`, and `timestampMs`. Node-scoped variants add `nodeId` and `iteration`. Attempt-scoped variants add `attempt`.

For subscription examples, CLI usage, and built-in metrics, see Events.

## Status Types

```ts
type RunStatus =
  | "running"
  | "waiting-approval"
  | "waiting-event"
  | "waiting-timer"
  | "finished"
  | "continued"
  | "failed"
  | "cancelled";

type RunState =
  | "running"
  | "waiting-approval"
  | "waiting-event"
  | "waiting-timer"
  | "recovering"
  | "stale"
  | "orphaned"
  | "failed"
  | "cancelled"
  | "succeeded"
  | "unknown";
```

## Event Variants

Discriminate on the `type` field, e.g. `event.type === "NodeFailed"`, to handle specific lifecycle moments.

| Event | Top-level fields |
|---|---|
| `SupervisorStarted` | `runId`, `pollIntervalMs`, `staleThresholdMs`, `timestampMs` |
| `SupervisorPollCompleted` | `runId`, `staleCount`, `resumedCount`, `skippedCount`, `durationMs`, `timestampMs` |
| `RunAutoResumed` | `runId`, `lastHeartbeatAtMs`, `staleDurationMs`, `timestampMs` |
| `RunAutoResumeSkipped` | `runId`, `reason`, `timestampMs` |
| `RunStarted` | `runId`, `timestampMs` |
| `RunStatusChanged` | `runId`, `status`, `timestampMs` |
| `RunStateChanged` | `runId`, `before`, `after`, `timestampMs` |
| `RunFinished` | `runId`, `timestampMs` |
| `RunFailed` | `runId`, `error`, `timestampMs` |
| `RunCancelled` | `runId`, `timestampMs` |
| `RunContinuedAsNew` | `runId`, `newRunId`, `iteration`, `carriedStateSize`, `ancestryDepth?`, `timestampMs` |
| `RunHijackRequested` | `runId`, `target?`, `timestampMs` |
| `RunHijacked` | `runId`, `nodeId`, `iteration`, `attempt`, `engine`, `mode`, `resume?`, `cwd`, `timestampMs` |
| `SandboxCreated` | `runId`, `sandboxId`, `runtime`, `configJson`, `timestampMs` |
| `SandboxShipped` | `runId`, `sandboxId`, `runtime`, `bundleSizeBytes`, `timestampMs` |
| `SandboxHeartbeat` | `runId`, `sandboxId`, `remoteRunId?`, `progress?`, `timestampMs` |
| `SandboxBundleReceived` | `runId`, `sandboxId`, `bundleSizeBytes`, `patchCount`, `hasOutputs`, `timestampMs` |
| `SandboxCompleted` | `runId`, `sandboxId`, `remoteRunId?`, `runtime`, `status`, `durationMs`, `timestampMs` |
| `SandboxFailed` | `runId`, `sandboxId`, `runtime`, `error`, `timestampMs` |
| `SandboxDiffReviewRequested` | `runId`, `sandboxId`, `patchCount`, `totalDiffLines`, `timestampMs` |
| `SandboxDiffAccepted` | `runId`, `sandboxId`, `patchCount`, `timestampMs` |
| `SandboxDiffRejected` | `runId`, `sandboxId`, `reason?`, `timestampMs` |
| `FrameCommitted` | `runId`, `frameNo`, `xmlHash`, `timestampMs` |
| `NodePending` | `runId`, `nodeId`, `iteration`, `timestampMs` |
| `NodeStarted` | `runId`, `nodeId`, `iteration`, `attempt`, `timestampMs` |
| `TaskHeartbeat` | `runId`, `nodeId`, `iteration`, `attempt`, `hasData`, `dataSizeBytes`, `intervalMs?`, `timestampMs` |
| `TaskHeartbeatTimeout` | `runId`, `nodeId`, `iteration`, `attempt`, `lastHeartbeatAtMs`, `timeoutMs`, `timestampMs` |
| `NodeFinished` | `runId`, `nodeId`, `iteration`, `attempt`, `timestampMs` |
| `NodeFailed` | `runId`, `nodeId`, `iteration`, `attempt`, `error`, `timestampMs` |
| `NodeCancelled` | `runId`, `nodeId`, `iteration`, `attempt?`, `reason?`, `timestampMs` |
| `NodeSkipped` | `runId`, `nodeId`, `iteration`, `timestampMs` |
| `NodeRetrying` | `runId`, `nodeId`, `iteration`, `attempt`, `timestampMs` |
| `NodeWaitingApproval` | `runId`, `nodeId`, `iteration`, `timestampMs` |
| `NodeWaitingTimer` | `runId`, `nodeId`, `iteration`, `firesAtMs`, `timestampMs` |
| `ApprovalRequested` | `runId`, `nodeId`, `iteration`, `timestampMs` |
| `ApprovalGranted` | `runId`, `nodeId`, `iteration`, `timestampMs` |
| `ApprovalAutoApproved` | `runId`, `nodeId`, `iteration`, `timestampMs` |
| `ApprovalDenied` | `runId`, `nodeId`, `iteration`, `timestampMs` |
| `ToolCallStarted` | `runId`, `nodeId`, `iteration`, `attempt`, `toolCallId`, `toolName`, `seq`, `timestampMs` |
| `ToolCallFinished` | `runId`, `nodeId`, `iteration`, `attempt`, `toolCallId`, `toolName`, `seq`, `status`, `timestampMs` |
| `NodeOutput` | `runId`, `nodeId`, `iteration`, `attempt`, `text`, `stream`, `timestampMs` |
| `AgentEvent` | `runId`, `nodeId`, `iteration`, `attempt`, `engine`, `event`, `timestampMs` |
| `RetryTaskStarted` | `runId`, `nodeId`, `iteration`, `resetDependents`, `resetNodes`, `timestampMs` |
| `RetryTaskFinished` | `runId`, `nodeId`, `iteration`, `resetNodes`, `success`, `error?`, `timestampMs` |
| `RevertStarted` | `runId`, `nodeId`, `iteration`, `attempt`, `jjPointer`, `timestampMs` |
| `RevertFinished` | `runId`, `nodeId`, `iteration`, `attempt`, `jjPointer`, `success`, `error?`, `timestampMs` |
| `TimeTravelStarted` | `runId`, `nodeId`, `iteration`, `attempt`, `jjPointer?`, `timestampMs` |
| `TimeTravelFinished` | `runId`, `nodeId`, `iteration`, `attempt`, `jjPointer?`, `success`, `vcsRestored`, `resetNodes`, `error?`, `timestampMs` |
| `TimeTravelJumped` | `runId`, `fromFrameNo`, `toFrameNo`, `timestampMs`, `caller?` |
| `WorkflowReloadDetected` | `runId`, `changedFiles`, `timestampMs` |
| `WorkflowReloaded` | `runId`, `generation`, `changedFiles`, `timestampMs` |
| `WorkflowReloadFailed` | `runId`, `error`, `changedFiles`, `timestampMs` |
| `WorkflowReloadUnsafe` | `runId`, `reason`, `changedFiles`, `timestampMs` |
| `ScorerStarted` | `runId`, `nodeId`, `scorerId`, `scorerName`, `timestampMs` |
| `ScorerFinished` | `runId`, `nodeId`, `scorerId`, `scorerName`, `score`, `timestampMs` |
| `ScorerFailed` | `runId`, `nodeId`, `scorerId`, `scorerName`, `error`, `timestampMs` |
| `TokenUsageReported` | `runId`, `nodeId`, `iteration`, `attempt`, `model`, `agent`, `inputTokens`, `outputTokens`, `cacheReadTokens?`, `cacheWriteTokens?`, `reasoningTokens?`, `timestampMs` |
| `SnapshotCaptured` | `runId`, `frameNo`, `contentHash`, `timestampMs` |
| `RunForked` | `runId`, `parentRunId`, `parentFrameNo`, `branchLabel?`, `timestampMs` |
| `ReplayStarted` | `runId`, `parentRunId`, `parentFrameNo`, `restoreVcs`, `timestampMs` |
| `MemoryFactSet` | `runId`, `namespace`, `key`, `timestampMs` |
| `MemoryRecalled` | `runId`, `namespace`, `query`, `resultCount`, `timestampMs` |
| `MemoryMessageSaved` | `runId`, `threadId`, `role`, `timestampMs` |
| `OpenApiToolCalled` | `runId`, `operationId`, `method`, `path`, `durationMs`, `status`, `timestampMs` |
| `TimerCreated` | `runId`, `timerId`, `firesAtMs`, `timerType`, `timestampMs` |
| `TimerFired` | `runId`, `timerId`, `firesAtMs`, `firedAtMs`, `delayMs`, `timestampMs` |
| `TimerCancelled` | `runId`, `timerId`, `timestampMs` |
| `AgentTraceEvent` | `runId`, `nodeId`, `iteration`, `attempt`, `trace`, `timestampMs` |
| `AgentTraceSummary` | `runId`, `nodeId`, `iteration`, `attempt`, `summary`, `timestampMs` |
| `AgentSessionEvent` | `runId`, `nodeId`, `iteration`, `attempt`, `transcript`, `timestampMs` |

## Agent Event Payload

This type describes the `.event` field of the `AgentEvent` variant above. For structured traces, prefer `AgentTraceEvent`, `AgentTraceSummary`, and `AgentSessionEvent`.

`AgentEvent.event` is the normalized CLI-agent event payload:

```ts
type AgentCliEvent =
  | { type: "started"; engine: string; title: string; resume?: string; detail?: Record<string, unknown> }
  | {
      type: "action";
      engine: string;
      phase: "started" | "updated" | "completed";
      entryType?: "thought" | "message";
      action: { id: string; kind: AgentCliActionKind; title: string; detail?: Record<string, unknown> };
      message?: string;
      ok?: boolean;
      level?: "debug" | "info" | "warning" | "error";
    }
  | { type: "completed"; engine: string; ok: boolean; answer?: string; error?: string; resume?: string; usage?: Record<string, unknown> };
```
