MMR Reference

What MMR is

Multi-Model Review runs your changes past several independent AI code reviewers ("channels"), then reconciles their findings into a single de-duplicated list and a verdict that gates the work. No channel ever sees another channel's output — agreement between them is what raises confidence, and disagreement is what surfaces ambiguity.

The core idea in five moves

  1. Resolve a diff — from a PR, staged changes, a branch range, or a piped diff.
  2. Dispatch channels — each channel is a separate subprocess given the same prompt, run in parallel and isolated packages/mmr/src/commands/review.ts:636.
  3. Parse — each channel's raw output is parsed into a common Finding shape.
  4. Reconcile — findings are grouped by a stable key, de-duplicated, and scored for agreement and confidence packages/mmr/src/core/reconciler.ts:43.
  5. Verdict — a severity gate yields pass, degraded-pass, blocked, or needs-user-decision packages/mmr/src/types.ts:25.

Two layers, one mental model. The mmr CLI is the engine that dispatches the built-in channels and computes the verdict. The scaffold run review-pr / review-code wrappers sit on top: they add a Superpowers code-reviewer agent channel via mmr reconcile, handle auth recovery, and drive the fix loop.

End-to-end flow

A single mmr review … --sync run walks the whole pipeline. Channels fan out in parallel; everything converges at reconciliation.

#my-svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#000000;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#my-svg .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#my-svg .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#my-svg .error-icon{fill:#552222;}#my-svg .error-text{fill:#552222;stroke:#552222;}#my-svg .edge-thickness-normal{stroke-width:1px;}#my-svg .edge-thickness-thick{stroke-width:3.5px;}#my-svg .edge-pattern-solid{stroke-dasharray:0;}#my-svg .edge-thickness-invisible{stroke-width:0;fill:none;}#my-svg .edge-pattern-dashed{stroke-dasharray:3;}#my-svg .edge-pattern-dotted{stroke-dasharray:2;}#my-svg .marker{fill:#666;stroke:#666;}#my-svg .marker.cross{stroke:#666;}#my-svg svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#my-svg p{margin:0;}#my-svg .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#000000;}#my-svg .cluster-label text{fill:#333;}#my-svg .cluster-label span{color:#333;}#my-svg .cluster-label span p{background-color:transparent;}#my-svg .label text,#my-svg span{fill:#000000;color:#000000;}#my-svg .node rect,#my-svg .node circle,#my-svg .node ellipse,#my-svg .node polygon,#my-svg .node path{fill:#eee;stroke:#999;stroke-width:1px;}#my-svg .rough-node .label text,#my-svg .node .label text,#my-svg .image-shape .label,#my-svg .icon-shape .label{text-anchor:middle;}#my-svg .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#my-svg .rough-node .label,#my-svg .node .label,#my-svg .image-shape .label,#my-svg .icon-shape .label{text-align:center;}#my-svg .node.clickable{cursor:pointer;}#my-svg .root .anchor path{fill:#666!important;stroke-width:0;stroke:#666;}#my-svg .arrowheadPath{fill:#333333;}#my-svg .edgePath .path{stroke:#666;stroke-width:1px;}#my-svg .flowchart-link{stroke:#666;fill:none;}#my-svg .edgeLabel{background-color:white;text-align:center;}#my-svg .edgeLabel p{background-color:white;}#my-svg .edgeLabel rect{opacity:0.5;background-color:white;fill:white;}#my-svg .labelBkg{background-color:rgba(255, 255, 255, 0.5);}#my-svg .cluster rect{fill:hsl(0, 0%, 98.9215686275%);stroke:#707070;stroke-width:1px;}#my-svg .cluster text{fill:#333;}#my-svg .cluster span{color:#333;}#my-svg div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(-160, 0%, 93.3333333333%);border:1px solid #707070;border-radius:2px;pointer-events:none;z-index:100;}#my-svg .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#000000;}#my-svg rect.text{fill:none;stroke-width:0;}#my-svg .icon-shape,#my-svg .image-shape{background-color:white;text-align:center;}#my-svg .icon-shape p,#my-svg .image-shape p{background-color:white;padding:2px;}#my-svg .icon-shape .label rect,#my-svg .image-shape .label rect{opacity:0.5;background-color:white;fill:white;}#my-svg .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#my-svg .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#my-svg .node .neo-node{stroke:#999;}#my-svg [data-look="neo"].node rect,#my-svg [data-look="neo"].cluster rect,#my-svg [data-look="neo"].node polygon{stroke:url(#my-svg-gradient);filter:drop-shadow( 1px 2px 2px rgba(185,185,185,1));}#my-svg [data-look="neo"].node path{stroke:url(#my-svg-gradient);stroke-width:1px;}#my-svg [data-look="neo"].node .outer-path{filter:drop-shadow( 1px 2px 2px rgba(185,185,185,1));}#my-svg [data-look="neo"].node .neo-line path{stroke:#999;filter:none;}#my-svg [data-look="neo"].node circle{stroke:url(#my-svg-gradient);filter:drop-shadow( 1px 2px 2px rgba(185,185,185,1));}#my-svg [data-look="neo"].node circle .state-start{fill:#000000;}#my-svg [data-look="neo"].icon-shape .icon{fill:url(#my-svg-gradient);filter:drop-shadow( 1px 2px 2px rgba(185,185,185,1));}#my-svg [data-look="neo"].icon-shape .icon-neo path{stroke:url(#my-svg-gradient);filter:drop-shadow( 1px 2px 2px rgba(185,185,185,1));}#my-svg :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;}Resolve diff(--pr / --staged--diff / --base)Build prompt(+ focus, criteria)codexgeminiclaudegrokdoc-conformance(opt-in)Parse FindingReconcile(dedupe + score)Verdict(gate + exit code)

Compensating passes (see Degraded mode below) are injected after the first dispatch round for any channel that was unavailable, then folded back into the same reconcile step.

The mmr review command

One command, several input modes. Pick the flag that matches your target; everything else is control and output options. Type in the box to filter the table.

FlagGroupDescription
--diff <path|->inputRead a unified diff from a file, or - for stdin. Highest-priority input mode.
--pr <n>inputFetch the PR diff via gh pr diff.
--stagedinputReview staged changes (git diff --cached).
--base <ref> [--head <ref>]inputReview a branch range (git diff base...head, head defaults to HEAD).
(no input flag)inputFalls back to unstaged working-tree changes (git diff).
--focus <text>controlFree-text focus areas appended to every channel prompt.
--fix-threshold <P0|P1|P2|P3>controlSeverity gate. Findings at or above this block. Default P2 (from .mmr.yaml).
--channels <names…>controlRun only these channels, overriding config defaults. Abstract channels are filtered out.
--timeout <seconds>controlPer-channel timeout override.
--template <name>controlUse a named review-criteria template from config.
--format <json|text|markdown>outputOutput format. Default json.
--syncmodeRun the full pipeline (dispatch → parse → reconcile → verdict) and return results. Without it, dispatch is fire-and-forget.
--dry-runmodeResolve the diff and assemble the prompt without dispatching any channel.
--session <id>roundsLink this run into a multi-round session; the id must match ^[A-Za-z0-9_-]+$ and not be a reserved name packages/mmr/src/commands/sessions.ts:15.
--round <n>rounds1-based round counter within a session.
--max-rounds <n>roundsHard cap on rounds. Defaults to 5 when --session is set without it.
--accept-new-ackstrustTrust acknowledgment files newly introduced by the diff.
--trust-project-ackstrustTrust working-tree project acks in non-Git / untrusted modes.
--trust-project-configtrustTrust working-tree .mmr.yaml in untrusted modes.
--config-base-ref <ref>trustLoad .mmr.yaml and acks from a trusted Git ref instead of HEAD.

Copy-paste commands by target

# PR review (full pipeline, JSON out)
mmr review --pr 123 --sync --format json

# Staged changes before commit
mmr review --staged --sync --format json

# All tracked uncommitted changes (no untracked)
git diff HEAD | mmr review --diff - --sync --format json

# Branch range
mmr review --base main --head "$BRANCH" --sync --format json

# A single file's current contents, as an "all-added" diff
(diff -u /dev/null path/to/file.ts || true) | mmr review --diff - --sync --format json

# Only specific channels (e.g. just grok + claude)
mmr review --pr 123 --channels grok claude --sync --format json

Other subcommands

CommandPurpose
mmr reconcile <job-id> --channel <name> --input <data>Inject an external channel's findings (e.g. the Superpowers agent) into an existing job and re-run the results pipeline. Input is a file, - for stdin, or inline JSON. packages/mmr/src/commands/reconcile.ts:17
mmr status <job-id>Per-channel status and elapsed time. Exit 0 = all complete, 1 = running, 2 = a channel failed, 5 = not found.
mmr results <job-id> [--raw]Re-run parse → reconcile → format on a completed job. Exit code reflects the verdict.
mmr jobs <list|prune>List jobs, or prune old ones per job_retention_days.
mmr sessions <start|list|show|end> <id>Manage multi-round review sessions (stored under ~/.mmr/sessions/).
mmr config <init|show|validate…>Scaffold and inspect .mmr.yaml (including OSS-runtime example blocks).
mmr ack <add|list|rm|prune>Sticky acknowledgments — silence a finding by its stable key so it stops blocking across rounds.
# Capture a job_id from a review, then fold in an agent channel:
mmr reconcile "$JOB_ID" --channel superpowers --input findings.json

Channel architecture

A channel is pure config data — there is no per-channel code. The dispatcher runs whatever command the channel defines, hands it the prompt, and parses its output with the configured parser. Adding a channel is normally a .mmr.yaml edit, not a code change.

The channel config shape

channels:
  <name>:
    enabled: true                 # run by default?
    command: "codex exec"         # whitespace-split, spawned WITHOUT a shell
    flags: ["--ephemeral"]        # appended after the command tokens
    env: { KEY: value }           # extra environment
    prompt_delivery: stdin        # stdin (default) | prompt-file
    prompt_wrapper: "{{prompt}}"  # template wrapped around the prompt
    output_parser: default        # default | gemini | doc-conformance | {kind:…}
    stderr: capture               # capture | suppress | passthrough
    timeout: 300                  # seconds (falls back to defaults.timeout)
    auth: { check, timeout, failure_exit_codes, recovery }
    extends: base-channel         # inherit from another channel (≤4 levels)
    abstract: false               # template-only; never dispatched directly

Built-in channels

Why grok is different. codex/gemini/claude all read the prompt from stdin. Grok's CLI requires the prompt as an argument and ignores stdin, so its channel uses prompt_delivery: prompt-file — the dispatcher writes the prompt to a temp file and passes its path via the {{prompt_file}} placeholder. Grok wraps its reply in a JSON .text field, which the parser unwraps before extracting findings.

The defaults, commands, and parsers below are the built-in presets packages/mmr/src/config/defaults.ts:32.

ChannelDefaultStrengthPrompt deliveryParser
codexenabledCorrectness, security, API contractsstdindefault
geminienabledArchitecture, broad-context reasoningstdingemini
claudeenabledPlan alignment, code quality, testingstdindefault
grokenabledIndependent second opinion (xAI; proprietary)prompt-fileunwrap $.text → default
doc-conformanceopt-inPRD/stories/standards conformance (LLM-graded)stdindoc-conformance
command: codex exec
flags: [--skip-git-repo-check, -s, read-only, --ephemeral]
auth.check: codex login status        # local file check (fast, 5s)
recovery: codex login
output_parser: default
stderr: suppress
command: gemini                       # NO -p: gemini reads stdin natively
flags: [--output-format, json]
env: { NO_BROWSER: "true" }
auth.check: NO_BROWSER=true gemini -p "respond with ok" -o json   # LLM round-trip, 20s
recovery: gemini -p "hello"
output_parser: gemini                 # unwraps { "response": "…" }
timeout: 360
command: claude -p
flags: [--output-format, json]
auth.check: claude -p "respond with ok"   # LLM round-trip, 20s
recovery: claude login
output_parser: default
command: grok
prompt_delivery: prompt-file
flags: [--prompt-file, "{{prompt_file}}", --output-format, json]
auth.check: grok models                # lists models / login state (no round-trip)
recovery: grok login
output_parser: { kind: unwrap-jsonpath, wrap: "$.text", then: default }

Grok is proprietary (xAI), not open-source — it joins the standard set mechanically as a 4th CLI channel. Disable it with channels_disabled: ["grok"].

enabled: false                         # opt-in: runs up to 3 LLM calls (~3 min)
command: scaffold observe audit --profile=full --scope=all --output-mode=mmr-findings
output_parser: doc-conformance         # expects a JSON array of findings
timeout: 240

Enable with --channels doc-conformance or in .mmr.yaml.

The dispatcher

Adding a new channel — where it's clean vs. hard-coded. Clean (config only): a new subprocess channel (command + flags + auth + output_parser), output reshaping via the unwrap-jsonpath or regex-findings parser kinds, disabling/timeout overrides, and pointing the compensator at a different channel — all pure .mmr.yaml. Needs code: a brand-new named parser must be registered in core/parser.ts packages/mmr/src/core/parser.ts:257; and the COMPENSATING_FOCUS map carries per-channel focus text (falls back gracefully if absent). HTTP-endpoint channels (kind: http) are already supported via dispatchHttpChannel — pure .mmr.yaml, no extra code packages/mmr/src/config/schema.ts:144.

Scaffold wrappers

Direct mmr review runs the built-in CLI channels. The scaffold run wrappers add orchestration on top.

WrapperTargetAdds on top of mmr review
scaffold run review-prA PR (--pr)Auth checks, the Superpowers code-reviewer agent channel via mmr reconcile, consensus/verdict handling, the 3-strike-per-finding round bookkeeping, optional Beads issue bridge.
scaffold run review-codeLocal pre-pushSynthesizes a "delivery candidate" diff (committed + staged + unstaged), gathers file & standards context for the file-blind CLIs, then the same agent channel + round bounding. Untracked files aren't covered.
scaffold run post-implementation-reviewFull codebaseTwo phases — systemic review + per-story functional review via parallel agents — with its own report under docs/reviews/. (See its own doc for the exact channel layout.)

Foreground only. The wrappers' manual fallback runs Codex, Gemini, Claude, and Grok as foreground Bash calls when the mmr CLI isn't available — never in the background. Background execution produces empty output.

Findings, reconciliation & verdicts

The Finding shape

Every channel's output parses into this common shape packages/mmr/src/types.ts:45.

{
  "id": "F-001",
  "category": "security",
  "severity": "P0",
  "location": "src/auth.ts:42",
  "description": "…",
  "suggestion": "…"
}

The location above (src/auth.ts:42) is illustrative. After reconciliation, each finding also carries confidence, sources[], agreement, a stable finding_key, a description_shingle (for fuzzy cross-round matching), and acknowledged packages/mmr/src/types.ts:54.

Stable identity (finding_key)

finding_key = sha1( normLocation | category | sha1(normDescription) | sha1(normSuggestion) )

Line numbers are stripped from the location and severity is excluded, so the same issue at P1 vs P2 collapses to one key packages/mmr/src/core/stable-id.ts:115. A character-5-gram shingle backs a Jaccard ≥ 0.7 fuzzy match. Intra-run, findings group by fuzzy shingle overlap packages/mmr/src/core/reconciler.ts:83; across rounds, the ack store reuses the same threshold so a re-worded finding still matches a prior ack packages/mmr/src/core/ack-store.ts:8.

Agreement & confidence

Agreement and confidence are derived per group during reconciliation packages/mmr/src/core/reconciler.ts:114.

SourcesSeverityAgreementConfidence
2+sameconsensushigh
2+differmajoritymedium
1P0uniquehigh
1compensating-*uniquelow
1otheruniquemedium

The gate & the four verdicts

The gate passes when every unacknowledged finding is below the fix_threshold packages/mmr/src/core/reconciler.ts:229 (default P2 packages/mmr/src/config/defaults.ts:16). Severity tiers run P0 (highest) → P1P2P3 (lowest).

The verdict is derived from gate result + channel health, in this branch order: zero channels completed → needs-user-decision; else a failed gate → blocked; else some channels incomplete → degraded-pass; else pass packages/mmr/src/core/reconciler.ts:247. (The no-completed-channels case short-circuits first, so it outranks blocked.)

VerdictConditionExit
passGate passed, all channels completed0
degraded-passGate passed, but some channels failed / timed out / weren't installed0
blockedAn unacknowledged finding sits at or above the threshold2
needs-user-decisionNo channel completed (can't make a determination)3

Proceed only on pass or degraded-pass. On blocked or needs-user-decision, surface the verdict and findings — don't merge automatically.

Degraded mode, compensation & auth

A channel is "degraded" when it's not_installed (no binary), auth_failed, timeout, skipped, or failed. The review doesn't stop — it compensates and tells you how to recover.

ChannelAuth checkRecovery
codexcodex login statuscodex login
geminigemini -p "respond with ok"gemini -p "hello"
claudeclaude -p "respond with ok"claude login
grokgrok modelsgrok login

Configuration (.mmr.yaml)

Config is layered: built-in defaults → ~/.mmr/config.yaml → project .mmr.yaml → CLI flags. Arrays replace; objects deep-merge.

version: 1
defaults:
  fix_threshold: P2          # gate severity
  timeout: 300               # default per-channel timeout (s)
  parallel: true
channels_disabled: ["grok"]  # opt OUT of a built-in (e.g. no grok installed)
channels:
  doc-conformance:
    enabled: true            # opt IN to a default-off channel
  # Bring-your-own model via channel inheritance:
  qwen-local:
    command: ollama run
    flags: ["qwen2.5-coder:32b", "--format", "json"]
    output_parser: { kind: unwrap-jsonpath, wrap: "$.response", then: default }
    auth: { check: "ollama list", timeout: 5, failure_exit_codes: [1], recovery: "ollama serve" }

Trust boundary. When reviewing a diff, project .mmr.yaml and acks should be read from the diff's base ref, not the working tree — otherwise a PR could add a channel that exfiltrates secrets or self-acknowledge its own findings. Use --config-base-ref / the --trust-project-* flags to control this in untrusted (e.g. CI) contexts.