cladding — Unified Governance for AI-Coupled Engineering

English · 한국어

cladding

Unified Governance for AI-Coupled Engineering.
AI-generated code, held to the same bar as human code.

ironclad spec tests coverage license

Reference implementation of the Ironclad standard. 27 detectors and a 13-stage gate verify, on every commit, that the code your AI assistant wrote still matches the spec.

Vanilla AI coding
2/8
traps caught · 25%
cladding
8/8
traps caught · 100%

Same spec · same model · event-sourcing store benchmark

Why

The why fades after 3 months

The reason an AI assistant wrote code a certain way doesn't survive in the code alone.

spec/features/*.yaml becomes the permanent record of why
AI context survives time — six months later, the AI reconstructs intent straight from the spec (new hires get the same entry point)
AI gives a different answer each time

The same spec produces code with inconsistent patterns and structure.

→ the spec becomes the fixed reference against which every commit is checked
Enterprise-ready consistency — code style and patterns stay aligned across teams and PRs
AI hallucination

Generated code calls APIs, functions, or options that don't exist.

→ 27 detectors + 13-stage gate block hallucinated code on every commit
Production incidents prevented up front — CI auto-rejects hallucinated code before it merges

What you get

How a vanilla AI coding environment and a cladding environment behave when the same situation comes up.

SituationVanilla AI codingcladding
Code drifts from specfixed if a reviewer noticesauto-blocked on every commit
Two devs build the same feature in parallelmerge conflictshash-based IDs route to separate files → 0 conflicts
Who verifies AI-written code?the AI that wrote it (risky)a separate reviewer agent — duties split
Switching AI tools (Claude → Cursor)reconfigure per toolone spec → mirrored across 4 hosts
Spec authoritythe AI reinterprets it each timethe sealed spec is the single source of truth

The hero's 8/8 vs 2/8 is an early benchmark (details) · larger-scale measurements are in progress.

How it works

Spec → Code → Tests runs as a single cycle — the spec captures the why, Iron Law verifies the implementation, and Drift Detection blocks anything that no longer matches.

Spec → Code → Tests as a single cycle — one feature's lifecycle

1. Spec — SSoT, single source of intent

The spec is where the why (what we're building and why) lives. A 4-tier (A/B/C/D) Single Source of Truth — intent on top, implementation below.

TierRoleWho editsAuthority
A — Specintent (what to build)humans onlysealed · LLMs cannot edit
B — Designdesign (how to build it)humans freelychecked against A
C — Derivedimplementation (code · tests)LLMs and humansregenerated by reading the code
D — Auditaudit log (what actually happened)append-onlyimmutable

A outranks B — if code and spec disagree, the code is wrong. The spec is sealed because changing the why shakes everything downstream, so LLMs are kept out.

Sharded · multi-dev safespec/features/<slug>-<hash6>.yaml puts each feature in its own file with a 6-char hash ID (e.g. F-5f6b45). Two devs creating new features at the same time land in different files with different IDs — zero merge conflicts. Details: Hash-based feature IDs.

4-tier SSoT — A(Spec) → B(Design) → C(Derived) → D(Audit), A outranks B

2. Code — Iron Law (required) gate

Every change has to clear all 13 stages — typically called from CI, a git pre-push hook, or manual clad check. Each stage ships with its own unit tests.

13-stage Iron Law gate — every change must clear static(6) · test(2) · e2e(3) · evidence(2) wherever clad check runs (CI / git hook / manual)
StageWhat it checks
1.1 Type · 1.2 Linttype errors · code style
1.3 Driftspec ↔ code mismatches across 27 detectors
1.4 Commit · 1.5 Arch · 1.6 Secretclean working tree · architecture invariants (forbidden imports, etc.) · leaked API keys
2.1 Unit · 2.2 Covunit tests pass · project coverage threshold
3.1 Smoke · 3.2 Perf · 3.3 Visualend-to-end critical paths · performance budgets · visual regression
4.1 Audit · 4.2 UATevery AC (acceptance criteria) has at least one piece of evidence · every status=done feature has at least one piece of evidence

3. Tests — 27 drift detectors

Seven categories of mismatch across spec · code · test, all caught automatically. Full catalog: src/stages/detectors/README.md.

CategoryWhat it catchesCountRepresentative detectors
spec ↔ code driftsomething in the spec missing from code, or in code with nothing in the spec6UNMAPPED_ARTIFACT, MISSING_IMPLEMENTATION, AC_DRIFT
code ↔ testcode without tests · coverage falling below threshold6MISSING_TESTS, COVERAGE_DROP, HARDCODED_SECRET
spec ↔ testan AC in the spec that no test actually verifies4UNTESTED_AC, STATUS_DRIFT, STALE_EVIDENCE
spec maintenancespec hygiene — slug collisions, ID duplicates4SLUG_CONFLICT, ID_COLLISION
environment integritybuild environment and meta-file integrity3HARNESS_INTEGRITY, META_INTEGRITY
architecture · capabilitycode that breaks the architecture or capability shape declared in the spec2ARCHITECTURE_FROM_SPEC, CAPABILITIES_FEATURE_MAPPING
governance · policycode that breaks an ai_hints policy (e.g. forbidden patterns)2AI_HINTS_FORBIDDEN_PATTERN, ABSENCE_OF_GOVERNANCE

4. Cycle — one feature's lifecycle

The 4 steps that wrap Spec → Code → Test into a single cycle. Merge if drift is 0, block otherwise.

One feature's lifecycle — Define → Sync → Implement → Verify, merge if drift=0 / block otherwise

Multi-Agent Workflow

cladding is a 5-agent system working in concert. Each agent has a clear role under CQS (Command-Query Separation — the agents that do are kept apart from the agents that verify), so no agent can sign off on its own work. This is the foundation that maps cleanly to compliance regimes (EU AI Act · K-AI Framework · SOX).

5 personas with CQS — orchestrator dispatches, librarian/specialist/reviewer act, observability watches metrics

Ecosystem

cladding sits at the intersection of three existing categories.

Ecosystem Venn — cladding sits at the intersection of SDD · Runners · Multi-agent Governance

How cladding differs from the neighbors

cladding's edge is the combination — it folds the strongest parts of all four categories into one verification loop.

Install

Two steps: install the infrastructure, then create the project spec.

Step 1 — Install the infrastructure

Pick the route that fits how you work — both land in the same place:

(a) npm — for terminal / CI users

npm install -g cladding   # install the cladding CLI
cd <project>                # go to your project
clad setup                  # connect your AI tools (Claude / Codex / Gemini)

(b) Marketplace — for AI-tool plugin users

  1. Open the plugin marketplace inside your AI tool (Claude Code · Codex CLI · Gemini CLI)
  2. Search for cladding and install it
  3. No clad setup needed — the plugin manifest wires everything
Where clad setup connects (5 host channels)
Host (when detected)Wired locationAuto-activation
Claude Code (~/.claude/)~/.claude/plugins/claddingclaude plugin marketplace add + install
Codex CLI skills (~/.agents/)~/.agents/skills/cladding-*(auto on Codex restart)
Codex CLI MCP server (~/.codex/)[mcp_servers.cladding] in ~/.codex/config.toml(TOML entry itself)
Gemini CLI (~/.gemini/)~/.gemini/extensions/claddinggemini extensions link
Cursor (~/.cursor/)mcpServers.cladding in ~/.cursor/mcp.json(JSON entry itself)

clad setup invokes the per-host activation commands automatically when claude / gemini binaries are on PATH. Safe to re-run after a cladding upgrade or after installing another AI tool.

About the MCP server. Every host gets cladding wired as an MCP server — only the wire location differs. Claude Code and Gemini CLI auto-start it through the plugin/extension manifest's mcpServers field; Codex through ~/.codex/config.toml [mcp_servers.cladding]; Cursor through ~/.cursor/mcp.json. You never invoke MCP directly — no /mcp slash, no manual server-connect step. The AI in each host calls cladding's tools (clad_create_feature, etc.) in response to natural-language requests; you keep typing /cladding:init plus normal chat.
Benchmark. v0.4.0 measurements show ~60% consistency improvement and ~50% LOC reduction vs unguided AI coding on a fixed task, with 100% drift detection across a 5-iteration dev cycle. Full methodology and honest caveats (some of the consistency gain is the "more-specific-prompt" effect, not exclusively cladding) in docs/benchmarks/v0.4.0-consistency-bench.md.

Step 2 — Init (create the project spec)

Inside your project, run it once from your AI tool:

[inside your AI tool] /cladding:init "B2B payment SaaS"

This creates spec.yaml and the 4-tier docs. One-time per project.

Three init scenarios

/cladding:init takes a natural-language intent and picks the right path on its own. Same command, three starting points.

Starting pointCommandWhat happens
An idea, nothing else/cladding:init "I want to build a B2B payment SaaS"LLM infers the domain → spec · docs · policies generated, with 2–3 follow-up questions printed
A planning doc/cladding:init docs/plan.mdcladding detects the file path, loads its contents, and uses them as the intent (absolute and relative paths both work)
Adopting into an existing project/cladding:init "apply cladding to this project"scans the existing code (≥3 source files trigger it) → observed patterns are merged with the intent

Init once, then carry on

cladding's goal is to be the infrastructure that prevents spec ↔ code drift — after init, you just keep coding. The AI references the spec while it writes, and clad check runs automatically in CI or as a pre-commit hook to block anything that drifts. No extra commands to remember.

Status

version
v0.4.0
2026-05
conformance
L4
top tier · self-declared
tests
973/973
all pass
coverage
93.89%+
enforced
features
136
spec'd

100 test files · installable from the Claude Code · OpenAI Codex · Gemini CLI marketplaces

Road to Ironclad 1.0 — 1.0 locks when two independent implementations pass the L4 conformance fixtures (GOVERNANCE § 1). cladding is the first one.

Docs

License

MIT. LICENSE · Related: Ironclad (the standard cladding implements) · harness-boot (the seed project).