cladding — Unified Governance for AI-Coupled Engineering

English · 한국어

cladding

The LLM writes the code — cladding owns what comes before and after.
Just as the name suggests, cladding is the verification layer wrapped around your host LLM.

ironclad spec tests detectors license

The official reference implementation of the Ironclad standard.
It feeds the project's intent to the host LLM (Claude Code · Codex · Gemini · Cursor) before work begins,
and verifies the result with 36 detectors and a 15-stage gate after the work is done. A division of labor toward the same goal.

Host LLM before (inject intent) · after (verify) · record (feedback loop) — how cladding wraps the LLM in a collaborative structure

This loop aims at one thing —
turning the AI's "it's done" from a claim into a proof.

Intent is preserved as a record · drift is blocked automatically · completion is proven by a verification signature.
So you can ship AI-written code trusting it as much as code a human wrote.

For you, the developer, this means — less time spent reviewing AI code, the code's why still there six months later,
and no more judging "is it really done?" by gut feeling before you ship.

How cladding works with your host LLM

cladding does not write code. The one writing code is always the host LLM. What cladding handles is the two things LLMs are bad at — making them remember the exact intent when they start and verifying the result mechanically when they finish.

BEFORE — INJECT INTENT
So the LLM starts with the right context
  • Project map injection — at the start of every conversation, "how many features, what's in progress, the last verification result" is handed to the LLM automatically
  • Extract only the intent that matters — only the why, related features, and acceptance criteria for the feature you're working on now (it does not dump the whole spec)
  • Apply project rules — the team's forbidden and preferred patterns go in as standing instructions every time
AFTER — VERIFY THE RESULT
Block the LLM's output when it drifts from the spec
  • 15-stage verification gate — type · lint · tests · coverage · architecture · secrets · E2E · evidence, all in one pass
  • 36 drift checks — automatic cross-checking in every direction that spec ↔ code ↔ tests stay aligned
  • An implementation-blind grader — a separate agent that cannot read the code grades it with tests written from the spec alone
  • Run the deliverable directly — actually executing it to block the "tests pass but the program won't run" situation
RECORD — INPUT FOR THE NEXT TURN
Verification results flow back into the LLM's context
  • Verification signature — a code state that passed every check is stored in the repo as a signature saying "this was verified at this point"
  • Audit ledger — every verification run, completion attempt, and block is recorded with who, when, and what the result was
  • Repair card — if you try to end a conversation while a decisive check (drift · architecture · secrets) is failing, it stops you once and carries the failure summary into the start of the next conversation automatically

While this loop runs, you just develop in natural language as usual — there are no commands to memorize.

Real-time intervention (map injection · instant block · stop block) works fully in Claude Code. On Codex · Gemini · Cursor the same verification runs through in-conversation tool calls and git · CI gates.

done is not declared, it's earned

The chronic disease of AI coding is that "it's done" gets declared without verification. In cladding, a feature's status: done is not a value you write but a value you earn.

One scene — a hook blocks the LLM's done declaration, a RED gate feeds back as a repair card, and done is earned only when GREEN
① If the AI tries to write the completion flag itself → it is blocked on the spot ("earn completion through verification") — real-time in Claude Code; on other hosts the gate · CI plays the same role
② If the AI requests completion → it runs all 9 decisive stages (type · lint · drift · architecture · secrets · tests · coverage · spec conformance · deliverable run) and records done only when they all pass, auto-reverting if even one fails — the E2E · evidence stages are handled by CI's full 15 stages
③ The moment it passes, a verification signature is left behind — committable proof that "this code was verified at this point"
④ If you try to end a conversation while a failure remains → it stops you once (if you end again with the same failure, it records rather than letting it through) and carries the repair card into the next conversation

The limits are disclosed openly too: bypass paths exist that the instant block can't see, and those are caught by after-the-fact verification (the gate · drift checks). The instant block is the first line of defense, after-the-fact verification the second — and neither is a guarantee on its own.

What changes

How behavior differs between a typical AI coding environment and a cladding environment in the same situation.

SituationTypical AI codingcladding
When code drifts from the specFixed if caught in reviewAuto-detected right after editing (alert) · "done" can't pass while drifted
When the AI says "it's done"No choice but to trust its worddone earned only when the gate is GREEN
When you end a session in a failed stateEnds as-is, forgotten next timeStops the exit once and hands off a repair card
Two people add a feature at oncemerge conflicthash-8 ID · separate files → 0 conflicts
Who verifies AI-written code?The AI that wrote it self-verifies (risky)An implementation-blind grader + mechanical gate
When you switch AI toolsReconfigure per tool1 spec → auto-wired to 4 hosts

How it works

Spec → Code → Tests circulates as one cycle — the spec records the why, the gate verifies, and detectors block drift.

Spec → Code → Tests cycle — a 15-stage verification and 36 drift detectors guard the cycle

1. Spec — the single source of intent (SSoT)

The spec records the why (what is being built and why). A 4-tier single source of truth — intent on top, artifacts below.

TierRoleEdit authorityAuthority
A — SpecIntent (what to build)Defined by humansSealed · LLM must not edit
B — DesignDesign (how to build it)Humans edit freelyChecked for consistency with A
C — DerivedArtifacts (code · tests) + attestation (verification signature)LLM · humansAuto-regenerated from the code
D — AuditAudit record (what happened)append-onlyImmutable

A takes precedence over every tier below — if the spec (A) and the code (C) differ, the wrong one is the code. If intent (A) wavers, everything wavers, so it is sealed off where the LLM can't touch it.

Sharding · multi-dev safe — like spec/features/<slug>-<hash>.yaml, with a separate file per feature + an 8-character hash ID (e.g. F-d86375d8). Even if two people create a new feature at the same time, they get different files · different IDs, so merge conflicts are 0. Details in Hash-based feature IDs.

4-tier SSoT — A(Spec) → B(Design) → C(Derived + attestation) → D(Audit), A takes precedence over B

2. Gate — the 15-stage Iron Law

To be recognized as "done", you must pass the strict gate (9 decisive of the 15 stages), and CI runs the full 15 stages including E2E · evidence. The same check engine runs as time-based bundles — a fast 3 stages at commit (when the git hook is installed), 9 stages at push · completion, all 15 in CI. Only the depth differs; the check logic is identical.

15-stage Iron Law gate — static(6) · tests·conformance(4) · E2E(3) · evidence(2), attestation signed when GREEN
StageWhat it checks
1.1 Type · 1.2 LintType errors · code style
1.3 Driftspec ↔ code drift across 36 detectors
1.4 Commit · 1.5 Arch · 1.6 SecretClean working tree · architecture invariant · API key exposure
2.1 Unit · 2.2 CoverageUnit tests pass · coverage drop blocked
2.3 Spec conformance · 2.4 Deliverable smokeThe implementation-blind grader's tests pass · whether the declared deliverable actually runs (blocks the "tests pass but the deliverable won't run" vacuous green)
3.1 Smoke · 3.2 Perf · 3.3 Visuale2e core behavior · performance budget · UI visual regression
4.1 Audit · 4.2 UATAt least 1 piece of evidence per AC (acceptance criterion) · at least 1 piece of evidence per done feature

3. Detector — 36 drift detectors

Automatically detects drift in every direction among spec · code · test. Full catalog: detector catalog.

DirectionWhat it catchesCountRepresentative detectors
spec ↔ codeIn the spec but not the code, or code straying from the spec10MISSING_IMPLEMENTATION, AC_DRIFT, DELIVERABLE_INTEGRITY
code ↔ testCode exists but no tests · coverage drop · secrets6MISSING_TESTS, COVERAGE_DROP, HARDCODED_SECRET
spec ↔ testSpec ACs not verified by tests · false status5UNTESTED_AC, STATUS_DRIFT, SPEC_CONFORMANCE
spec hygieneIntegrity of the spec itself (ID collision · dependency cycle)8ID_COLLISION, SLUG_CONFLICT, DEPENDENCY_CYCLE
environment integrityBuild environment · meta files3HARNESS_INTEGRITY, META_INTEGRITY
verification freshnessWhether code changed after the verification signature1STALE_ATTESTATION (new in 0.6.0)
governance · docsPolicy violations · documentation drift3ABSENCE_OF_GOVERNANCE, PROJECT_CONTEXT_DRIFT

4. Cycle — the lifecycle of one feature

Define → sync → implement → earn. You earn "done" only by passing every check.

Lifecycle of one feature — define → sync → implement → earn, completion earned when all checks pass / auto-reverted on failure

Multi-Agent — separating the maker from the verifier

The making agent and the verifying agent are separated, so no agent can approve its own work by itself. 0.6.0's blind-author goes one step further — the agent that writes the tests has no tool to read the implementation at all (no Read/Grep granted). "Written without seeing the implementation" becomes a structural fact, not a promise. This separation maps directly onto regulatory · audit standards (EU AI Act · SOX).

Persona privilege separation — orchestrator distributes, planner/developer/reviewer work, blind-author writes tests without seeing the implementation, observability observes

Ecosystem

cladding sits at the junction of three existing categories.

Ecosystem Venn — cladding at the junction of three categories: SDD · runners · multi-agent governance

Differences from adjacent tools

cladding's distinction is the combination — tying the core of the above categories into one verification loop.

Install

Two steps — install the infrastructure → create the project spec.

Step 1 — Install the infrastructure (npm)

npm install -g cladding   # install the cladding CLI
cd <project>                # move into your project
clad setup                  # auto-wire AI tools (Claude / Codex / Gemini / Cursor)

A single clad setup auto-detects the installed AI tools and wires them all — no per-tool configuration needed.

Where clad setup wires (4 hosts · 5 connection points)
Host (when detected)Wire locationAuto-activation
Claude Code (~/.claude/)~/.claude/plugins/claddingclaude plugin marketplace add + install
Codex CLI skills (~/.agents/)~/.agents/skills/cladding-*(automatic on Codex restart)
Codex CLI MCP server (~/.codex/)[mcp_servers.cladding] in ~/.codex/config.toml(the TOML entry itself)
Gemini CLI (~/.gemini/)~/.gemini/extensions/claddinggemini extensions link
Cursor (~/.cursor/)mcpServers.cladding in ~/.cursor/mcp.json(the JSON entry itself)

clad setup auto-invokes each host's activation command when the claude / gemini binary is on PATH. It's safe to re-run after an upgrade or after installing a new AI tool.

Verification level (honest disclosure): Claude Code is verified across all features through real-usage campaigns (including real-time intervention). Codex · Gemini CLI have their wiring automated + basic behavior confirmed. Cursor wires automatically but is not yet real-usage verified — to be updated as verification lands.

About the MCP server. All 4 hosts wire cladding as an MCP server — only the wire location differs. MCP is not something the user calls directly — there's no /mcp slash, no manual connection step. Each host's AI calls cladding's features on its own in response to natural-language requests, while the user types just one /cladding:init and ordinary conversation.

Step 2 — Init (create the project spec)

In the project directory, called once from inside the AI tool:

[inside the AI tool] /cladding:init "B2B payments SaaS"

The project's spec.yaml and related docs are created — once per project.

To raise enforcement: clad init --with-hook (installs pre-commit + pre-push git hooks) · clad init --with-ci (scaffolds the CI gate — real enforcement lives in CI).

Three init scenarios

Starting situationCommandWhat happens
When you only have an idea/cladding:init "I'm going to build a B2B payments SaaS"LLM analyzes the domain → auto-generates spec · docs · policy + 2–3 follow-up questions
When you have a planning doc/cladding:init docs/plan.mdRecognizes the file path → auto-loads its content and uses it as intent
Adopting into an existing project/cladding:init "apply cladding to this project"Auto-scans the existing code → combines observed patterns + intent

One init and you're done

Init once and that's it — after that you develop as usual. cladding runs the before/after loop in the background, so there are no commands to memorize.

Upgrade

npm update -g cladding     # 1. install the new version
cd <your project>          # 2. once per project
clad update                # 3. tidy up for the new version

Your code · spec.yaml · docs are left untouched, so it's safe; and if a stricter new version has something to point out, it only tells you (it won't block or fix).

Status

version
v0.6.0
2026-06
conformance
L4
tests
1384/1384
all pass
gate
15 stages
36 detectors
features
171
170 done · self-spec

134 test files · coverage drop blocked by the COVERAGE_DROP detector · install via a single npm path (npm install -g cladding)

The road to Ironclad 1.0 — 1.0 locks only when two independent implementations pass the L4 verification set (GOVERNANCE § 1). cladding is the first.

Docs

License

MIT. LICENSE · related: Ironclad (the standard being implemented) · harness-boot (seed).