Knowledge Freshness

What this system does

Knowledge entries under content/knowledge/ declare a volatility tier and a list of sources. A daily cron prefilters at most ten entries that are due — by cadence or by a changed source hash — runs a grounded LLM audit against the prefetched source bodies, opens one PR per drifted entry, and gates that PR on five checks. In parallel, downstream agents emit knowledge_gap_signal events when they hit a topic the KB does not cover; Lens I aggregates those signals into P1/P2 audit findings, suppressing any topic an entry already covers.

Two arms, two outcomes:

Both terminate in a human-merged PR.

SurfaceValueNotes
Volatility tiers3fast-moving / evolving / stable
Audit verdicts4current / minor-drift / major-drift / superseded
Daily audit ceiling10set by --max=10 in the cron workflow; not a yaml knob
PR gates54 blocking + 1 advisory
Signal window90 daysrolling; drives Lens I aggregation

Two subsystems, one config file. Knowledge Freshness and the separate Build Observability system both read .scaffold/observability.yaml. This guide documents Knowledge Freshness; Lens I is the seam where the two meet (it lives in the observability audit but reasons about the KB).

How a gap closes

The full lifecycle, end to end:

  1. Downstream agents emit signals; they accumulate in the rolling 90-day window.
  2. A topic's signal count and distinct-project count cross the threshold.
  3. Lens I emits a P1/P2 finding.
  4. An operator adds content/knowledge/<category>/<slug>.md.
  5. The next audit's knowledge index covers the slug and Lens I suppresses the bucket — the finding disappears.

Signals are not purged when the entry is added. The window is rolling, so yesterday's signals still aggregate tomorrow; suppression filters the emit step, not the aggregation step (src/observability/checks/lens-i-knowledge-gaps.ts:155). Signals only fade as they age out of the 90-day window naturally.

System map

#my-svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#000000;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#my-svg .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#my-svg .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#my-svg .error-icon{fill:#552222;}#my-svg .error-text{fill:#552222;stroke:#552222;}#my-svg .edge-thickness-normal{stroke-width:1px;}#my-svg .edge-thickness-thick{stroke-width:3.5px;}#my-svg .edge-pattern-solid{stroke-dasharray:0;}#my-svg .edge-thickness-invisible{stroke-width:0;fill:none;}#my-svg .edge-pattern-dashed{stroke-dasharray:3;}#my-svg .edge-pattern-dotted{stroke-dasharray:2;}#my-svg .marker{fill:#666;stroke:#666;}#my-svg .marker.cross{stroke:#666;}#my-svg svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#my-svg p{margin:0;}#my-svg .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#000000;}#my-svg .cluster-label text{fill:#333;}#my-svg .cluster-label span{color:#333;}#my-svg .cluster-label span p{background-color:transparent;}#my-svg .label text,#my-svg span{fill:#000000;color:#000000;}#my-svg .node rect,#my-svg .node circle,#my-svg .node ellipse,#my-svg .node polygon,#my-svg .node path{fill:#eee;stroke:#999;stroke-width:1px;}#my-svg .rough-node .label text,#my-svg .node .label text,#my-svg .image-shape .label,#my-svg .icon-shape .label{text-anchor:middle;}#my-svg .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#my-svg .rough-node .label,#my-svg .node .label,#my-svg .image-shape .label,#my-svg .icon-shape .label{text-align:center;}#my-svg .node.clickable{cursor:pointer;}#my-svg .root .anchor path{fill:#666!important;stroke-width:0;stroke:#666;}#my-svg .arrowheadPath{fill:#333333;}#my-svg .edgePath .path{stroke:#666;stroke-width:1px;}#my-svg .flowchart-link{stroke:#666;fill:none;}#my-svg .edgeLabel{background-color:white;text-align:center;}#my-svg .edgeLabel p{background-color:white;}#my-svg .edgeLabel rect{opacity:0.5;background-color:white;fill:white;}#my-svg .labelBkg{background-color:rgba(255, 255, 255, 0.5);}#my-svg .cluster rect{fill:hsl(0, 0%, 98.9215686275%);stroke:#707070;stroke-width:1px;}#my-svg .cluster text{fill:#333;}#my-svg .cluster span{color:#333;}#my-svg div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(-160, 0%, 93.3333333333%);border:1px solid #707070;border-radius:2px;pointer-events:none;z-index:100;}#my-svg .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#000000;}#my-svg rect.text{fill:none;stroke-width:0;}#my-svg .icon-shape,#my-svg .image-shape{background-color:white;text-align:center;}#my-svg .icon-shape p,#my-svg .image-shape p{background-color:white;padding:2px;}#my-svg .icon-shape .label rect,#my-svg .image-shape .label rect{opacity:0.5;background-color:white;fill:white;}#my-svg .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#my-svg .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#my-svg .node .neo-node{stroke:#999;}#my-svg [data-look="neo"].node rect,#my-svg [data-look="neo"].cluster rect,#my-svg [data-look="neo"].node polygon{stroke:url(#my-svg-gradient);filter:drop-shadow( 1px 2px 2px rgba(185,185,185,1));}#my-svg [data-look="neo"].node path{stroke:url(#my-svg-gradient);stroke-width:1px;}#my-svg [data-look="neo"].node .outer-path{filter:drop-shadow( 1px 2px 2px rgba(185,185,185,1));}#my-svg [data-look="neo"].node .neo-line path{stroke:#999;filter:none;}#my-svg [data-look="neo"].node circle{stroke:url(#my-svg-gradient);filter:drop-shadow( 1px 2px 2px rgba(185,185,185,1));}#my-svg [data-look="neo"].node circle .state-start{fill:#000000;}#my-svg [data-look="neo"].icon-shape .icon{fill:url(#my-svg-gradient);filter:drop-shadow( 1px 2px 2px rgba(185,185,185,1));}#my-svg [data-look="neo"].icon-shape .icon-neo path{stroke:url(#my-svg-gradient);filter:drop-shadow( 1px 2px 2px rgba(185,185,185,1));}#my-svg :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;}Gap armRefresh armoperator adds an entrywhose name: matches thebucket PRcron09:00 UTC dailyaudit-prefilter--max=10audit-run-entrygrounded LLMaudit-apply--open-pr5 PR gateshuman merge VERSION bumpgap-signal-tail90 pipeline stepsscaffold observe eventknowledge_gap_signalledgeractivity.jsonlLens I90-day windowfindingP1 / P23-tier --knowledge-rootresolversuppresses covered topics

Three real hooks sit beside the two arms: the phase-audit hook (runs Lens H only, never Lens I), the doc-conformance MMR channel (routes Lens I findings into MMR), and the --fix flow (initial + verifier + postfix audit). They are covered below.

Doc drift on MMR-in-cron. Three docs frame MMR-in-cron differently. The parent spec's locked decision #3 is authoritative: a native knowledge-freshness MMR channel is deferred to Phase 5. The cron today runs only inline gates. Two interim paths give reviewers MMR signal on a freshness PR: (1) the built-in doc-conformance MMR channel (disabled by default; enable with mmr review --channels=doc-conformance); (2) the manual mmr review --diff - command in From candidate to merged PR.

Frontmatter, signals, and resolution

Frontmatter schema

Every knowledge entry's frontmatter is a Zod-validated object with four freshness-relevant fields. The schema is the source of truth and runs as Gate 1 of the PR CI (src/validation/knowledge-frontmatter-validator.ts:42-50); runtime readers tolerate missing optional fields.

FieldTypeDefaultValidationRead by
namestringrequiredregex /^[a-z][a-z0-9-]*$/assembly-loader, Lens I suppression
descriptionstringrequiredwarns if > 200 charsassembly-loader (TOC), audit prompt
topicsstring[][]any stringassembly-loader (auto-selection)
volatilityenumevolvingstable|evolving|fast-movingprefilter cadence
last-reviewedISO datenullYYYY-MM-DD & real calendar dateprefilter cadence
version-pinstringnullany string (e.g. "OWASP Top 10 2021")audit prompt; superseded verdict signals it must advance manually
sources[]object[][]each: url (SSRF-checked at fetch), anchor (optional, starts with #), retrieved (ISO date), hash (sha256)prefilter (hash + cadence), audit runner (prefetch)

name vs. gap-topic regex. An entry name must start with a letter (/^[a-z][a-z0-9-]*$/), but Lens I gap topics allow a leading digit (/^[a-z0-9]+(-[a-z0-9]+)*$/). So a gap signalled for a topic like 3d-rendering cannot be suppressed by an entry of the same name — pick a letter-leading name (and list the numeric form under topics) when closing such a gap.

Anchor semantics. Put fragments in anchor, never inside url. The audit fetches url + (anchor ?? '') and hashes that body; the coverage check (src/knowledge-freshness/audit-apply.ts:82-101) matches the same combined string. Splitting prevents hash drift from spurious URL re-encodings and lets two sources at the same base URL with different #anchors be tracked independently.

Cadence model

Three tiers, three windows — 14 / 60 / 180 days for fast-moving / evolving / stable (src/knowledge-freshness/audit-prefilter.ts:5-7). An entry with no last-reviewed always counts as due. Sources with a changed hash also become candidates regardless of age, but the hash check only runs for entries still inside their cadence window.

Which tier does an entry belong in?

ProvenanceChange frequencyRecommended tier
vendor SDK / API docsquarterly or fasterfast-moving
standards / RFCs, vendor docsyearly-ishevolving
canonical pattern referencemulti-yearstable

Rule of thumb: if a version bump often breaks downstream guidance, lean fast-moving; if drift is extremely rare, stable; otherwise evolving (the default).

Adding a new entry to the KB

  1. Choose a category directory under content/knowledge/<category>/. Many categories exist today (backend, core, cli, research, web-app, web3, …); prefer placing into an existing one. Creating a new category is a separate PR.
  2. File name = entry slug + .md. The basename must match the name: field (e.g. retry-with-jitter.mdname: retry-with-jitter). Lens I's suppression match reads name: only, not the filename — a mismatch silently breaks suppression.
  3. Required frontmatter: name, description. Add volatility + sources[] if you want the cron to audit it — an entry with no sources[] is skipped by the prefilter (src/knowledge-freshness/audit-prefilter.ts:17).
  4. Validate locally: make validate-knowledge.
  5. Confirm the prefilter will pick it up. A fresh entry has no last-reviewed, so it should appear at priority 100:
    node dist/index.js knowledge-freshness audit-prefilter --max=10 \
      | jq '.[] | select(.name=="<your-new-slug>")'
    
    The daily ceiling is 10, so a flood of new entries may queue past the first day.

Gap-signal payload

A gap signal is a ledger event validated by src/observability/engine/event-schemas.ts:191-220 (payload allow-list at src/observability/engine/event-schemas.ts:12):

{
  "event_id":    "<uuid>",
  "worktree_id": "<sha>",
  "actor_label": "agent | bot | …",
  "branch":      "<branch>",
  "task_id":     null,
  "type":        "knowledge_gap_signal",
  "ts":          "<ISO-8601>",
  "payload": {
    "topic":         "<kebab-slug>",
    "source":        "agent_search",
    "project_id":    "<sha256-hex>",
    "step_name":     "tech-stack",
    "agent_excerpt": "…"
  }
}

topic is ≤80 chars matching /^[a-z0-9]+(-[a-z0-9]+)*$/; source ∈ {agent_search, lessons, manual}; project_id is 64-char sha256 hex (or the literal lessons when source=lessons); step_name and agent_excerpt (≤200 chars) are optional.

Suppressing emission in tests/CI. Set SCAFFOLD_GAP_SIGNAL_QUIET=1. The assembly-time tail (src/core/assembly/gap-signal-tail.ts) then renders no emission template into the pipeline step. Default is always-on (locked decision #9) — catch gaps everywhere they occur.

KnowledgeRootResolution shape

The resolver returns a three-field record that threads through the audit run (src/observability/knowledge-index.ts:275-291):

export interface KnowledgeRootResolution {
  /** Validated absolute path to a knowledge directory, or null. */
  root: string | null
  /** Pre-loaded index Set, populated by the validator. Null when root is null.
      Lens I reads this directly — no re-walk. */
  index: Set<string> | null
  /** Audit trail of what was tried. Lens I uses this to compose a precise
      warn-once message when root is null. */
  attempts: KnowledgeRootAttempt[]
}

From candidate to merged PR

The cron is a thin bash loop — the brains live in three CLI subcommands and a meta-prompt that runs a grounded LLM against pre-fetched source bodies.

Prefilter

An entry becomes a candidate when (1) it has at least one source, AND (2) either its last-reviewed is older than the cadence window, OR a source's prefetched hash differs from the stored one. Priority orders highest-score first: unreviewed entries (100), then overdue entries (50 + ageDays, so the oldest rank highest), with in-window hash changes at 75; the top --max win (src/knowledge-freshness/audit-prefilter.ts:14-72):

for (const e of entries) {
  if (e.sources.length === 0) continue            // no sources = no audit
  if (!e.lastReviewed)        { select = true; priority = 100 }
  else if (ageDays > window)  { select = true; priority = 50 + ageDays }
  else {
    // hash check — Promise.all over a small per-entry list (1-3 sources)
    if (anyHashChanged) { select = true; priority = 75 }
  }
}
candidates.sort((a, b) => b.priority - a.priority)
return candidates.slice(0, max)

The hash check is a tiebreaker, not a baseline. Entries already past their window are selected immediately at priority 50 + ageDays — no network cost. The hash check only runs in the else branch (still inside the window), runs Promise.all over the entry's 1–3 sources, and swallows fetch errors so a slow upstream doesn't crash the cron.

Audit verdicts

The meta-prompt at content/tools/knowledge-audit-entry.md instructs the LLM to read pre-fetched source bodies (no web tool available) and emit one of four verdicts. Every verdict opens a PR — the dry-run apply runs first so gates can inspect the proposed diff, then --open-pr creates the branch.

VerdictWhat the PR contains
currentFrontmatter-only: bumps last-reviewed, sources[*].hash, sources[*].retrieved so the entry exits the queue.
minor-driftFrontmatter persistence + findings table as commentary. applyVerdictToEntry refuses any proposed_changes on this verdict (src/knowledge-freshness/audit-apply.ts:54-58); no body edits.
major-driftBody edits land via proposed_changes (H2-heading-anchored splices). Gate 4 blocks if a stable entry's diff exceeds 20% churn without the override label.
supersededA new edition shipped; version-pin must advance. last-reviewed does not advance (src/knowledge-freshness/audit-apply.ts:103-118) — only hash/retrieved update, so the entry stays due until a human re-audits. Prevents a known-stale entry from looking fresh.

PR generation

Branch: knowledge-freshness/<entry>-<YYYY-MM-DD>. renderPrBody renders a summary, the verdict fields, a findings table, the sources, and any preserve warnings (it does not embed the raw verdict JSON). Each candidate gets its own PR off origin/main — the cron git checkout main between iterations and restores the entry between the dry-run apply (for gates) and the final --open-pr call. PRs do not stack; failures isolate per-candidate.

VERSION bump on merge

A dedicated workflow (.github/workflows/knowledge-freshness-version-bump.yml:16) fires on PR closed (merged-only) when the source branch starts with knowledge-freshness/ or the PR carries the knowledge-freshness label. It computes the next SemVer from the PR title and body, writes content/knowledge/VERSION, commits with the prefix chore(knowledge): (deliberately not knowledge-freshness/*) so the commit doesn't re-trigger itself, then git pull --rebase before pushing. Bump rules (src/knowledge-freshness/bump-version.ts:26-45):

MatchBumpNotes
BREAKING CHANGE: anywhere in title, or start-of-line in bodymajorWins over every other prefix
feat(knowledge): / feat(knowledge-freshness): title prefixminorCase-sensitive
chore(knowledge): / chore(knowledge-freshness): title prefixpatchUsed by the bump commit itself
Anything else (including fix(knowledge):)patchLogs a ::notice:: for unrecognized prefixes

The start-of-line anchor on the BREAKING CHANGE body match (/^BREAKING CHANGE:/m) is deliberate — a freshness PR's body embeds an LLM-generated findings table whose evidence excerpts could otherwise mention "BREAKING CHANGE:" and trigger an accidental major bump.

MMR corroboration (manual)

The cron does not dispatch MMR today — the workflow only runs inline gates. To corroborate a freshness PR locally:

git diff origin/main...HEAD -- 'content/knowledge/**/*.md' \
  | mmr review --diff - --focus knowledge-freshness --sync --format json

A native knowledge-freshness MMR channel is the Phase 5 plan. See the MMR guide for the channel architecture.

The five PR gates

The cron's GITHUB_TOKEN-opened PRs don't fire downstream workflows, so the cron also runs the gate code inline (same CLI surface). Human-opened freshness PRs get gated by the workflow at .github/workflows/knowledge-freshness-gates.yml:17.

#GateWhat it checksModeSource
1Frontmatter validatorZod schema parse over every entry (excludes README). Strict calendar-date refinement; SSRF guard on source URLs.blockingsrc/validation/knowledge-frontmatter-validator.ts:42-50
2Source link-checkEvery sources[*].url returns 2xx. Operates on the changed-files list via --files-from.blocking.github/workflows/knowledge-freshness-gates.yml:117-123
3Unsourced-claims lintNew normative claims must have a sources[] entry. Runs even when 1/2 failed.advisory.github/workflows/knowledge-freshness-gates.yml:126-135
4Anti-over-rewriteStable entries reject diffs deleting >20% of lines unless the override:anti-over-rewrite label is applied. Cron-opened knowledge-freshness/* branches only.blocking.github/workflows/knowledge-freshness-gates.yml:137-152
5Deep Guidance preservedLiteral ## Deep Guidance heading must survive — the assembly engine pulls just that section.blocking.github/workflows/knowledge-freshness-gates.yml:154-160

Spec drift on the Gate 4 override. The parent spec describes the override as a marker in the PR description; the shipped mechanism (.github/workflows/knowledge-freshness-gates.yml:148-152) reads a maintainer-applied PR label (override:anti-over-rewrite) via --pr-labels. The shipped behavior is authoritative; the spec text is stale.

Anti-tamper checkout (known gap). The gate workflow builds the gate code from HEAD, not from origin/main (.github/workflows/knowledge-freshness-gates.yml:42-53). The hardening — build from base, overlay only PR HEAD's content/knowledge/ — is deferred because the bootstrap PR introduced the gate code itself. Risk is mitigated by mandatory PR review until a follow-up flips the checkout strategy.

Lens I — gap detection + suppression

Lens I runs under --scope=docs and --scope=all (src/observability/checks/lens-i-knowledge-gaps.ts:43). It collects signals from the ledger (rolling 90-day window, src/observability/checks/lens-i-knowledge-gaps.ts:52) plus synthetic signals from tasks/lessons.md, buckets them by normalized topic, applies the threshold matrix, and suppresses buckets whose topic an entry already covers.

Where Lens I sits in the taxonomy. "Lens" is scaffold's name for an audit check function inside scaffold observe audit. The full set is A–I; Lens I (I-knowledge-gaps) is this one. The other seven plus Lens H are documented in the Build Observability guide.

Threshold matrix

The rules (src/observability/checks/lens-i-knowledge-gaps.ts:148-149):

signal_countdistinct_projectsSeverity
≥ 5≥ 3P1
≥ 3≥ 2P2
below bothno finding

Topic normalization

Lens I normalizes the raw topic before bucketing, then validates the result. Two distinct steps: normalizeTopic (src/observability/checks/lens-i-lessons-scanner.ts:32-38) always produces a (possibly empty) string; isValidTopic (src/observability/checks/lens-i-lessons-scanner.ts:114-116) decides whether to accept it. Normalization lowercases, strips apostrophes, replaces every other non-slug run with a single hyphen, collapses repeats, and trims. The validator additionally enforces ≤ 80 chars and /^[a-z0-9]+(-[a-z0-9]+)*$/. So Agent Eval Harnesses!agent-eval-harnesses (valid), but !!! → `` (rejected).

What the lessons.md scanner sees

Lens I synthesizes signals from tasks/lessons.md at audit time (read inline, no ledger writes) via two passes per non-fenced line — code-fenced blocks are skipped (src/observability/checks/lens-i-lessons-scanner.ts:4):

  1. Explicit marker<!-- gap-topic: <slug> --> (slug must already be kebab-case; the marker regex enforces it).
  2. Heuristic phrases (case-insensitive): "would have helped to have a guide on X", "missing knowledge entry for X", "no knowledge entry for X" / "no kb entry for X", "missing knowledge: X".

Captured topics run through the same normalizeTopic / isValidTopic. Synthetic signals carry project_id: "lessons" and are excluded from the distinct-projects count by the aggregator's delete('lessons') rule (decision #6) — they corroborate but don't independently satisfy the threshold.

3-tier --knowledge-root resolution

Lens I must know where the KB lives to skip already-covered topics. The resolver (resolveKnowledgeRoot at src/observability/knowledge-index.ts:326-379) tries three tiers in order:

TierSourceOn failure
1--knowledge-root CLI flag (resolved against process.cwd())hard error before any lens runs (KnowledgeRootCliInvalidError)
2lenses.I-knowledge-gaps.knowledge_root in yaml (resolved against cwd)soft-fail; records {outcome: 'invalid', reason} in the attempts trail
3auto-detect — findScaffoldKnowledgeRoot walks parents for package.json#name === '@zigrivers/scaffold' (src/observability/knowledge-index.ts:164-178)returns null if no install is found

The sharp asymmetry is intentional: an operator who typed a --knowledge-root gets a hard error on a bad path; yaml and auto-detect soft-fail so suppression degrades gracefully. The most instructive case: yaml invalid + auto-detect found — the trail records the yaml failure and the auto-detect success, root is the auto-detect path, and a one-line stderr note points at the stale yaml. This is what an operator sees when npm update -g @zigrivers/scaffold moved the install out from under a pinned yaml path.

Warning policy

KeyStatusWhen emitted
lens-i:no-rootactiveLens I runs, no root resolved, lens enabled. Per-audit deduped via warnedKeys: Set<string>. If yaml failed validation, the message gains a clause quoting the bad path + reason.
lens-i:index-load-failedreservedNever emitted today — validateKnowledgeRoot exercises the loader at resolution time, foreclosing this path.
(none)no-warnLens I disabled — resolver runs but no warning surfaces (decisions #4 / #11).

emitOnceForAudit (src/observability/knowledge-index.ts:251-259) reads a caller-provided Set created fresh in each runAudit (src/observability/engine/api.ts:114), so the --fix flow's three internal audits each get their own dedup scope.

What a Lens I finding looks like

A single finding excerpt from the audit sidecar (docs/audits/<id>.json):

{
  "id": "a3f2c1d4...",
  "lens_id": "I-knowledge-gaps",
  "severity": "P2",
  "title": "Knowledge base lacks coverage for \"agent-eval-harnesses\" — 4 signals across 2 projects",
  "source_doc": "",
  "evidence": {
    "kind": "knowledge_gap",
    "topic": "agent-eval-harnesses",
    "signal_count": 4,
    "distinct_project_count": 2,
    "distinct_projects": ["a3f2...", "1c4e..."],
    "first_seen": "2026-04-12T09:00:00Z",
    "last_seen": "2026-05-21T14:30:00Z",
    "example_excerpts": ["No knowledge entry for agent eval harnesses"]
  },
  "confidence": "medium",
  "fix_hint": {
    "kind": "edit_doc",
    "target": "content/knowledge/<category>/agent-eval-harnesses.md",
    "prompt": "Propose a new knowledge entry for \"agent-eval-harnesses\". Evidence: 4 signals from 2 projects in the last 90 days."
  }
}

Phase audits don't trigger Lens I. The phase-boundary hook (StateManager.markCompletedrunPhaseAudit at src/observability/engine/phase-audit.ts:63) fires only Lens H-cross-doc (lensIds: ['H-cross-doc'] at src/observability/engine/phase-audit.ts:77). Lens I never runs at phase boundaries. A phase-audit run that surfaces zero findings does not mean Lens I is happy — it means Lens I never ran. To see Lens I findings, invoke scaffold observe audit --scope=docs (or --scope=all) explicitly, or run it through --fix.

The allowlist

Out-of-allowlist sources warn but don't block (decision #4). Bare hostnames match subdomains; host/path entries additionally require the URL path to start with the prefix; github_repos is locked to specific owner/repo.

The off-allowlist warning is advisory and is surfaced by the frontmatter-validation path (validateKnowledgeFile), not by a gate. Gate 3 (lint-unsourced) is a separate advisory check that flags nearby links not covered by the entry's declared sources[] domains. Off-allowlist sources still get fetched, hashed, and audited — they just warn. It is not a security boundary: the SSRF guard (src/knowledge-freshness/source-url-validator.ts) runs independently, so a new host never unlocks private-IP fetches. The editorial bar is: "would the maintainers want this URL to be the verbatim grounding for a P0/P1 finding?"

Most-cited hosts

Counted live from every entry's sources[*].url at build time.

martinfowler.com
developer.mozilla.org
modelcontextprotocol.io
owasp.org
developer.android.com
the-turing-way.netlify.app
developer.apple.com
developer.chrome.com
sre.google
w3.org
microservices.io
rfc-editor.org
ethereum.org
consensys.github.io
docs.openzeppelin.com
HostCitations
martinfowler.com37
developer.mozilla.org24
modelcontextprotocol.io19
owasp.org17
developer.android.com15
the-turing-way.netlify.app15
developer.apple.com14
developer.chrome.com14
sre.google12
w3.org12
microservices.io11
rfc-editor.org11
ethereum.org10
consensys.github.io9
docs.openzeppelin.com9

The full allowlist

Every host plus its category, and the pinned GitHub repos.

47 allowlisted hosts and 3 GitHub repos. Out-of-list sources warn (they do not block).

HostCategory
ai.google.devai-ml
anthropic.comai-ml
docs.wandb.aiai-ml
mlflow.orgai-ml
modelcontextprotocol.ioai-ml
platform.openai.comai-ml
spec.graphql.orgapi
spec.openapis.orgapi
developer.chrome.combrowser-ext
docs.aws.amazon.comcloud-ops
opentelemetry.iocloud-ops
sre.googlecloud-ops
aicpa-cima.comcompliance
aicpa.orgcompliance
eur-lex.europa.eucompliance
pcisecuritystandards.orgcompliance
www.finra.orgcompliance
www.sec.govcompliance
developer.android.commobile
developer.apple.commobile
adr.github.iopatterns
agilealliance.orgpatterns
conventionalcommits.orgpatterns
google.github.iopatterns
martinfowler.compatterns
microservices.iopatterns
thoughtworks.compatterns
the-turing-way.netlify.appresearch
nist.govsecurity
openid.netsecurity
owasp.orgsecurity
consensys.github.iosmart-contracts
docs.openzeppelin.comsmart-contracts
docs.safe.globalsmart-contracts
ethereum.orgsmart-contracts
swcregistry.iosmart-contracts
ietf.org/rfcstandards
www.iso.orgstandards
www.rfc-editor.orgstandards
docs.pact.iotesting
docs.astral.shtooling
git-scm.comtooling
peps.python.orgtooling
www.postgresql.orgtooling
developer.mozilla.orgweb-standards
tr.designtokens.orgweb-standards
www.w3.orgweb-standards

GitHub repos: modelcontextprotocol/specification, steveyegge/beads, joelparkerhenderson/architecture-decision-record

KB inventory

Totals over content/knowledge/, broken down per category.

278 entries across 20 categories:

CategoryEntries
core35
game25
research25
backend22
review20
web-app17
web314
data-science13
browser-extension12
data-pipeline12
library12
mcp-server12
ml12
mobile-app12
cli10
validation7
product6
execution5
tools4
finalization3

How to expand the allowlist

Adding a host is a one-line PR to docs/knowledge-freshness/authoritative-sources.yaml:

 hosts:
   - owasp.org
+  - developers.cloudflare.com
   - nist.gov
  1. Pick the form. Bare hostname for vendor docs whose path layout changes; host/path prefix for shared-tenancy hosts where you only trust a sub-path; owner/repo under github_repos: for specific GitHub repos. Skip www. (bare entries auto-match subdomains).
  2. Verify the host is livecurl -sI https://<host>/<path> should return 2xx (or a 3xx that ultimately resolves).
  3. Mirror the category in CATEGORY_MAP in scripts/build-freshness-reference.mjs — otherwise the regenerated allowlist table shows the new host as other.
  4. Open a normal PR. Allowlist additions are not a separate trust delegation; any maintainer can review.

Anthropic vs DeepSeek (cron uses DeepSeek)

The cron switched to DeepSeek HTTP to remove the local claude CLI dependency from CI. Local audits keep using whichever provider is configured. Precedence is resolved by resolveProvider (src/knowledge-freshness/providers/index.ts:36):

  1. --provider <name> — explicit flag, operator override
  2. KNOWLEDGE_FRESHNESS_PROVIDER env var
  3. A single API key in env — inferred
  4. Both API keys present → error (ambiguous)
  5. No env, claude on PATH → anthropic (subprocess uses keychain)
  6. Nothing → error (no provider configured)

Subprocess: claude -p --tools "" (empty-tools disables WebFetch so the model can only read the prefetched bodies). Requires the claude CLI on PATH regardless of how the provider was chosen — the resolver throws (src/knowledge-freshness/providers/index.ts:44-56) if anthropic is picked via flag, env, or API-key inference and claude isn't installed. ANTHROPIC_API_KEY alone is not sufficient. Source: src/knowledge-freshness/providers/anthropic.ts.

HTTP. No subprocess; works in CI without the Claude CLI.

  • Auth: requires DEEPSEEK_API_KEY.
  • Default model: deepseek-v4-flash.
  • Override: set KNOWLEDGE_FRESHNESS_DEEPSEEK_MODEL to deepseek-v4-pro. Other values throw at dispatcher-build time (src/knowledge-freshness/providers/deepseek.ts:54-58).
  • Thinking mode: hardcoded thinking: { type: 'disabled' }.
  • URL: hardcoded to https://api.deepseek.com/chat/completions; project-local config cannot redirect (decision #7 invariant).

Why the DeepSeek URL is hardcoded. An untrusted project's .scaffold/observability.yaml could otherwise redirect the LLM dispatcher at an attacker-controlled host that captures DEEPSEEK_API_KEY from request headers. Hardcoding closes that exfiltration path — the same threat model that hardcodes Lens H's claude -p command in the Build Observability audit.

The cron wires DeepSeek explicitly (.github/workflows/knowledge-freshness-audit.yml:70):

env:
  DEEPSEEK_API_KEY:              ${{ secrets.DEEPSEEK_API_KEY }}
  KNOWLEDGE_FRESHNESS_PROVIDER:  deepseek
  GH_TOKEN:                      ${{ secrets.GITHUB_TOKEN }}

A missing DEEPSEEK_API_KEY fails the run loudly at preflight rather than silently exiting 0 with zero PRs.

Every command that touches the system

All commands ship in the published CLI.

Refresh-arm commands

CommandPurpose
scaffold knowledge-freshness audit-prefilter [--max=N]Walk content/knowledge/, apply cadence + hash check, print a JSON candidate array. --max default 10 (src/cli/commands/knowledge-freshness-audit-prefilter.ts:18); the CLI emits only { name, path } per candidate (src/cli/commands/knowledge-freshness-audit-prefilter.ts:43).
scaffold knowledge-freshness audit-run-entry <path>Pre-fetch each source through SSRF guards, dispatch the grounded audit, print verdict JSON. --provider anthropic|deepseek overrides env precedence.
scaffold knowledge-freshness audit-apply <path> <verdict.json> [--open-pr]Patch frontmatter + apply proposed_changes by H2 heading. The wrapper re-fetches every checked URL and computes its own sha256 (src/knowledge-freshness/audit-apply.ts:82-101), so persisted hashes are deterministic, not the LLM's claim. Refuses to advance last-reviewed unless every declared source is covered.
make validate-knowledgeGate 1 — runs the Zod validator over every entry (README excluded).

Gap-arm commands

CommandPurpose
scaffold observe audit --lens I-knowledge-gaps [--knowledge-root <path>] [--fix]Run the gap-detection lens against the local ledger + tasks/lessons.md. --knowledge-root overrides yaml + auto-detect for suppression. --fix dispatches the fix flow; the override threads through all three audits.
scaffold observe event knowledge_gap_signal --topic=<slug> --source=<…> --project-id=<sha> …Write one validated gap signal to the ledger. Used by the assembly-time tail and by operators backfilling synthetic signals.
scaffold observe ack <prefix-or-id>Acknowledge (or reopen) a finding so it stops surfacing. Use when a Lens I topic is deliberately out of scope.

The --fix flow (runFixFlow at src/observability/engine/fix-flow.ts:71) runs a three-audit loop: (1) the initial audit produces a fix plan; (2) for each blocking finding, dispatch a fix agent then re-audit just that finding (the verifier); (3) one postfix audit runs everything for the final report. The --knowledge-root override threads into all three (decision #20) so suppression is consistent throughout.

Gate-side subcommands (also runnable locally for triage)

CommandGatePurpose
knowledge-freshness link-check [<path>] [--files-from <json>]2HTTP-HEAD every sources[*].url; 2xx passes, else exit 1.
knowledge-freshness lint-unsourced [<path>] [--files-from <json>] [--diff <patch>]3Heuristic scan for normative language in new lines without a sources[] reference. Advisory: prints findings but always exits 0.
knowledge-freshness anti-over-rewrite [--files-from <json>] [--diff <patch>] [--pr-labels <csv>]4For each changed stable entry, compare deleted-line count to 20% of the body; exit 1 if crossed without override:anti-over-rewrite. The cron passes --pr-labels "" (it can't self-apply labels).
knowledge-freshness deep-guidance-check [<path>] [--files-from <json>]5Assert each changed entry still contains a ## Deep Guidance heading (case-sensitive).
knowledge-freshness bump-version --title <str> --body <str>Pure-function dry-run of deriveBumpKind + bumpSemver; prints bump: and next: lines parsed by the version-bump workflow.

Operations cheat sheet

An entry's audit failed in the cron

The cron logs audit failed for <name> — moving on and continues; the entry stays in tomorrow's queue. Causes: provider auth (key rotated), source URL now 404s, a fetch/HTTP error or the 5 MiB fetch-and-hash cap, dispatcher error, or LLM timeout. (A source body over the 96 KiB embed cap is truncated and flagged truncated: true — it does not fail the audit.) Reproduce locally:

DEEPSEEK_API_KEY=sk-… node dist/index.js knowledge-freshness \
  audit-run-entry content/knowledge/<cat>/<name>.md
# read stderr to see if it's a URL issue or a provider issue

Lens I keeps surfacing a topic the KB already covers

Suppression didn't match. Either the resolver returned root: null (look for [Lens I] knowledge-root not located in stderr) or the entry's name: doesn't normalize to the same slug as the bucket topic — the match is exact and post-normalize.

scaffold observe audit --lens I-knowledge-gaps --json \
  --knowledge-root /path/to/content/knowledge \
  | jq '.findings[] | select(.lens_id=="I-knowledge-gaps")'
grep -A1 "^---" content/knowledge/<cat>/<slug>.md | grep "^name:"

Downstream auto-detect can't find the KB

findScaffoldKnowledgeRoot walks parents from the CLI install's module location looking for package.json#name === '@zigrivers/scaffold'. Symlinked or repackaged installs may miss. Pin it via the tier-2 yaml:

lenses:
  I-knowledge-gaps:
    knowledge_root: /opt/homebrew/lib/node_modules/@zigrivers/scaffold/content/knowledge

Yaml knowledge_root stops working after an upgrade

The yaml tier soft-fails and records the reason in the attempts trail; Lens I appends it to the warning. Validation requires all four: the path exists, is a directory, contains a <path>/VERSION marker, and loadKnowledgeIndex runs without throwing (an empty index is OK). The usual cause after an upgrade is a moved install path:

find / -name VERSION -path '*content/knowledge*' 2>/dev/null

Then update lenses.I-knowledge-gaps.knowledge_root to the new path.

A source URL fetches in curl but the cron rejects it

The SSRF guard re-resolves the hostname at fetch time and rejects any IP in a non-globally-routable range (RFC1918, link-local, loopback, CGNAT, ULA, IPv4-mapped IPv6, …). Common cause: an internal DNS view returning a private IP for an outwardly-public hostname.

node -e 'require("node:dns").promises.lookup("<host>", { all: true }).then(console.log)'

Fix: move the source to a globally-routable host, or remove it. Allowlisting does not bypass the SSRF guard.

--knowledge-root resolves to a path you didn't expect

Auto-detect may pick a stale npm-global install. The successful-resolution path doesn't log its attempts trail today (only the failure path warns), so pin and compare:

scaffold observe audit --lens I-knowledge-gaps --json \
  --knowledge-root /path/you/expected/content/knowledge \
  | jq '.findings[] | select(.lens_id=="I-knowledge-gaps") | .evidence.topic'
# compare against the unset behavior; if the lists differ, auto-detect picked a different KB

Fix: pin lenses.I-knowledge-gaps.knowledge_root in .scaffold/observability.yaml. A pinned yaml path takes precedence over auto-detect.

Config reference

Everything operator-tunable lives in .scaffold/observability.yaml. Anything outside this list is hardcoded (decision #7 invariant) so an untrusted project can't redirect dispatch commands or LLM URLs.

lenses:
  I-knowledge-gaps:
    knowledge_root: /path/to/content/knowledge   # tier-2 resolver override

disabled_lenses: [I-knowledge-gaps]              # opt-out

phase_audit:
  enabled: true                                  # default
  timeout_s: 60
  detached: false                                # fire-and-forget when true

fix:
  dispatcher_command: "claude -p"                # default
  timeout_s: 300
  per_finding_max_attempts: 3

The daily audit ceiling is NOT in yaml. The parent spec's decision #8 reads "10 grounded audits per day; configurable via .scaffold/observability.yaml", but the yaml knob was never implemented. The ceiling is the --max=10 flag in .github/workflows/knowledge-freshness-audit.yml:67; the CLI default at src/cli/commands/knowledge-freshness-audit-prefilter.ts:18 is the only fallback. To lower it for your fork, edit the workflow — nothing in yaml will help.

Roadmap and known divergences

Phase 5 (planned)

Known divergences

The reference page's own audits surfaced these doc-vs-code mismatches; the code is ground truth: