How RCS, MCS, and AMBER relate
The three layers do different things on the same codebase — they are complementary, not sequential checks of each other.
1. RCS Bootstrap
Question: "Where are the capability boundaries and what are they called?"
How: Deterministic cluster detection (routes, folder prefixes, import cohesion) + LLM names & describes each cluster.
Writes: .amber/capabilities.md
2. MCS Verify
Question: "What does the code actually say? Are claims about each cap backed by evidence?"
How: Generates its own typed claims from code (AST/static), attributes them to AMBER caps by file, then runs them through 4 verification layers.
Writes: .mcs/
3. Promote → AMBER
Question: "Which MCS-verified facts should enrich the registry?"
How: Additive merge of verified numeric/policy/integration/behavioral claims into AMBER body / sla / dependencies / risks (contradictions → risks).
Writes: Back into .amber/capabilities.md + audit YAML.
Important nuance: MCS does not directly grade RCS's naming or descriptions. It generates independent claims from code. But the two validate each other indirectly:
- Bad RCS boundaries → many
unknown claims per cap (no proof in that cap's files). - Hallucinated RCS description with numeric claims that disagree with code → MCS
contradicted. - Cap with no MCS verified claims → low confidence → RCS likely over-clustered or mis-named that cap.
- Cap with high MCS confidence (≥85%) → RCS boundary was correct and code backs the description.
Think of it like: RCS writes the table of contents. MCS checks if each chapter has the cited evidence. Promote pulls the strongest evidence into the chapter summaries.
7 Claim Types
Each MCS claim is typed so the verification rules can scale strictness by signal strength.
structural— A file/symbol/route exists.
Pure existence claim — verified by AST presence. Low semantic risk.
behavioral— Users can do X.
Action claim — needs mutation + UI + flow signal triple.
numeric— A constant equals N.
Exact-value claim — must match literal in cited line ±5.
policy— Only admins may X.
Rule claim — needs role guard + flow gate + flow evidence.
integration— Uses Stripe / Supabase / …
External system claim — needs import + client + call site.
business_inference— Capability supports …
Highest-risk synthesis claim — multiple weak signals only.
risk— Offline support is not provable.
Negative-evidence claim — flagged when expected artifact is missing.
5 Decision States
Output of the 4-layer verifier (static · semantic · contradiction · negative-evidence).
verifiedAll proofs passed every layer. Publishable as fact.
inferredStatic passes, semantic skipped (skeleton tier) or weak. Publish with reservation.
unknownInsufficient evidence — couldn't decide either way.
rejectedProofs do not support the claim. Don't publish.
contradictedCode evidence directly contradicts the claim. Permanent — self-correction may NOT soften.
3 Scan Tiers
Trade compute cost against verification depth.
deep— full content + LLM semantic check
Reads each file's full source. Runs the adversarial-prompt semantic verifier through your Settings API key. Claims can reach verified. ~30-90s on small repos, token-bearing.
skeleton— Wave-15 compressed, no LLM
Strips function bodies via the AST chunker (8-12× compression). Static + contradiction + negative-evidence still run. Semantic skipped, so claims cap at inferred. ~10s, zero token cost. Use for large repos and CI gating.
cached— reuse previous decisions where hashes match
Replays the last run for unchanged files (by content hash) and only re-evaluates files that changed. Fastest for incremental scans on dev branches.
Free alternative to deep: download the Claude Code prompt, vote semantic decisions in Claude Code, paste the JSON back. Same end-state, zero API spend.
Confidence Score
Per-cap confidence is a weighted ratio of decisions. Higher = more of the cap's claims are grounded in passed proofs.
conf = (verified·1.0 + inferred·0.6 + unknown·0.2 − contradicted·0.5) / totalClaims
≥ 85% — Publishable for sales / arc42 / compliance. Most claims verified, no contradictions.
60-85% — Skeleton-tier ceiling (no semantic). Sufficient for internal dashboards, navigation. Run deep tier to lift higher.
< 60% — Many unknowns. Either the cap is genuinely under-documented in code, or the scan tier was too shallow. Investigate before publishing.
any · contradiction > 0 — Treat as red regardless of %. A contradicted claim means published copy disagrees with the code. Fix the doc or fix the code before promoting.