# ai-toolkit

> Professional-grade Claude Code toolkit: skills, agents, machine-enforced constitution, quality hooks.

## Documentation

- [README](README.md): Installation, usage, and feature overview
- [CHANGELOG](CHANGELOG.md): Version history
- [ARCHITECTURE](app/ARCHITECTURE.md): System design
- [CONSTITUTION](app/constitution.md): Safety rules

## Knowledge Base

- [Best Practices](kb/best-practices/README.md)
- [No Hardcoded Counts in Secondary Docs](kb/best-practices/no-hardcoded-counts.md)
- [Plan: Deep Coverage v3.0 — 100% Native Surface Utilization](kb/history/completed/deep-coverage-v3-20260423.md)
- [Plan: Ecosystem Deep Sweep — All 12 Supported Tools](kb/history/completed/ecosystem-deep-sweep-20260423.md)
- [Plan: Enterprise Config Inheritance — Multi-Repo Governance with `extends`](kb/history/completed/enterprise-config-inheritance-plan-20260412.md)
- [Spike: F2 MCP Context Trim — Hook Feasibility & Path Decision](kb/history/completed/f2-mcp-trim-spike-20260504.md)
- [Plan: Offline-First SLM Profile — Lightweight Mode for Local Models](kb/history/completed/offline-slm-profile-plan-20260411.md)
- [Plan: Output & Token Discipline](kb/history/completed/output-token-discipline-plan-20260504.md)
- [How-To Guides](kb/howto/README.md)
- [Plan: Cloud Security Pack — Multi-Cloud Audit](kb/planning/cloud-security-pack-plan.md)
- [PRD: MCP Context Trim v4.0](kb/planning/mcp-context-trim-v4-prd.md)
- [SOP: Ecosystem Sync](kb/procedures/ecosystem-sync-sop.md)
- [SOP: Claude Toolkit Maintenance](kb/procedures/maintenance-sop.md)
- [SOP: Release Preparation](kb/procedures/release-preparation-sop.md)
- [SOP: Release Verification](kb/procedures/release-verification-sop.md)
- [Agents Catalog](kb/reference/agents-catalog.md)
- [Anti-Pattern Registry Format](kb/reference/anti-pattern-registry-format.md)
- [AI Toolkit Architecture](kb/reference/architecture-overview.md)
- [Config Benchmark](kb/reference/benchmark-config.md)
- [CI Integration](kb/reference/ci-integration.md)
- [Claude Ecosystem Benchmark Snapshot](kb/reference/claude-ecosystem-benchmark-snapshot.md)
- [Claude Ecosystem Expansion Foundations](kb/reference/claude-ecosystem-expansion-foundations.md)
- [CLI Reference](kb/reference/cli-reference.md)
- [AI Toolkit - Codex CLI Compatibility](kb/reference/codex-cli-compatibility.md)
- [Ecosystem Comparison](kb/reference/comparison.md)
- [Plan: Competitive Features — ai-toolkit](kb/reference/competitive-features-implementation.md)
- [Distribution Model](kb/reference/distribution-model.md)
- [Enterprise Config Inheritance Guide](kb/reference/enterprise-config-guide.md)
- [Extension API Reference](kb/reference/extension-api.md)
- [Global Install Model](kb/reference/global-install-model.md)
- [Hierarchical Override Pattern](kb/reference/hierarchical-override-pattern.md)
- [Hooks Catalog](kb/reference/hooks-catalog.md)
- [External Integrations](kb/reference/integrations.md)
- [Language Plugin Packs](kb/reference/language-packs.md)
- [Language Rules System](kb/reference/language-rules.md)
- [Manifest-Driven Install System](kb/reference/manifest-install.md)
- [MCP Editor Compatibility](kb/reference/mcp-editor-compatibility.md)
- [MCP Server Templates](kb/reference/mcp-templates.md)
- [Medplum Documentation Map](kb/reference/medplum-docs-map.md)
- [Merge-Friendly Install Model](kb/reference/merge-friendly-install-model.md)
- [AI Toolkit - opencode Compatibility](kb/reference/opencode-compatibility.md)
- [Plugin Pack Conventions](kb/reference/plugin-pack-conventions.md)
- [Quick Wins Implementation Summary](kb/reference/quick-wins-implementation-summary.md)
- [Skill Templates](kb/reference/skill-templates.md)
- [Skills Catalog](kb/reference/skills-catalog.md)
- [Skills Unification Model](kb/reference/skills-unification.md)
- [Usage Statistics](kb/reference/stats.md)
- [Supported Tools Registry](kb/reference/supported-tools-registry.md)
- [Config Sync](kb/reference/sync.md)
- [Unique Features & Differentiators](kb/reference/unique-features.md)
- [Windows Support](kb/reference/windows-support.md)
- [Troubleshooting](kb/troubleshooting/README.md)

## Skills

- **a11y-validate**: Accessibility validator: WCAG 2.1 AA, EN 301 549, EAA. Triggers: a11y, accessibility, WCAG, EAA, ARIA, contrast, keyboard, screen reader.
- **agent-creator**: Creates new specialized agents with frontmatter, tools, delegation. Triggers: new agent, create agent, agent scaffold, specialized agent.
- **analyze**: Analyzes code quality, complexity, patterns across codebase. Triggers: quality report, hotspot scan, code analysis, architecture signal.
- **api-patterns**: REST/GraphQL API design: naming, versioning, pagination, idempotency, OpenAPI. Triggers: API design, REST, GraphQL, OpenAPI, Swagger, idempotency, rate limit.
- **app-builder**: App scaffolding: Next.js, Vite, Nuxt, Astro, FastAPI, Django, Laravel, RN, Flutter. Triggers: scaffold, bootstrap, new project, starter, dashboard, mobile app.
- **architecture-audit**: Audits codebase for architectural friction, shallow modules; proposes RFCs. Triggers: improve architecture, shallow modules, deepen modules, reduce coupling.
- **architecture-decision**: Architecture decisions in ADR/RFC/RFD format: context, constraints, options, recommendation. Triggers: ADR, RFC, RFD, trade-offs, design choice, pick between, evaluate approach.
- **biz-scan**: Scans codebase for revenue opportunities, KPIs, monetization gaps. Triggers: business metrics, KPI, analytics gaps, monetization, revenue.
- **brand-voice**: Direct technical voice for docs, README, user-facing text. Concise/strict modes. Triggers: documentation, README, content, output-mode, voice, prose style.
- **briefing**: Executive daily briefing aggregating reports from all agents into decision-focused summary. Triggers: briefing, daily summary, status across system, executive update.
- **build**: Builds project with auto-detected toolchain (npm, poetry, cargo, go, flutter, Docker). Triggers: build, compile, bundle, produce artifacts.
- **chaos**: Injects controlled faults for resilience testing on non-prod. Triggers: chaos, fault injection, latency injection, dependency kill, resilience test.
- **ci**: Detect/generate/debug CI pipeline config (GitHub Actions, GitLab CI). Triggers: CI setup, build pipeline, GitHub Actions config, debug CI, GitLab CI.
- **ci-cd-patterns**: CI/CD: GitHub Actions, GitLab CI, Jenkins, caching, blue-green, canary. Triggers: CI, CD, pipeline, GitHub Actions, workflow YAML, release, canary, rollout.
- **clean-code**: Code quality: meaningful names, SRP, DRY, small functions, guard clauses, refactoring. Triggers: clean code, naming, code smell, SRP, DRY, long function, god class, dead code.
- **command-creator**: Creates new Claude Code slash commands with frontmatter and validation. Triggers: new slash command, create command, command scaffold.
- **commit**: Creates Conventional Commits with pre-commit validation. Triggers: commit, conventional commit, git commit, message.
- **content-moderation-patterns**: Content moderation with Claude: pre-filter vs LLM-classify, categories, thresholds, HITL. Triggers: moderation, safety filter, policy enforcement, content classifier.
- **council**: 4-perspective decision evaluation for architecture choices. Triggers: council, evaluate decision, pros cons, multi-angle, alternatives.
- **cpp-rules**: C++ coding rules: style, patterns, security, testing. Triggers: .cpp, .cc, .cxx, .hpp, .h, CMakeLists.txt, Makefile, GoogleTest, clang-tidy.
- **csharp-patterns**: C#/.NET: LINQ, async/await, DI, records, nullable refs, ASP.NET Core, EF Core, MediatR. Triggers: C#, .NET, dotnet, ASP.NET, EF Core, LINQ, record type, IServiceCollection.
- **csharp-rules**: C#/.NET coding rules: style, patterns, security, testing. Triggers: .cs, .csproj, .sln, ASP.NET, ASP.NET Core, EF Core, LINQ, NUnit, xUnit, dotnet.
- **cve-scan**: Scans deps for known CVEs via native audit (npm, pip, composer, cargo, go, bundler, dart). Triggers: CVE scan, vulnerability scan, npm audit, pip audit.
- **dart-rules**: Dart/Flutter coding rules: style, patterns, security, testing. Triggers: .dart, pubspec.yaml, Flutter, Riverpod, Bloc, widget, StatelessWidget, StatefulWidget.
- **database-patterns**: DB schema design and query tuning: normalization, indexing, N+1, transactions, EXPLAIN. Triggers: schema, index, slow query, N+1, PostgreSQL, MySQL, EXPLAIN, deadlock, query plan.
- **debug**: Systematic debugging via logs, health checks, hypothesis-driven investigation. Triggers: debug, error, trace root cause, fix bug, reproduce symptom, investigation.
- **deploy**: Deploys with pre-flight checks and health verification. Triggers: deploy, deployment, ship, release, push to prod.
- **design-an-interface**: Generates and compares parallel interface designs (Ousterhout 'Design It Twice'). Triggers: design API, interface options, compare modules, design it twice.
- **design-engineering**: UI craftsmanship: animation rules, easing, micro-interactions, state polish. Triggers: animation, transition, ease-out, motion, micro-interaction, hover, loading state, UI polish.
- **docker-devops**: Docker/K8s: Dockerfile, multi-stage, compose, manifests, Helm. Triggers: Docker, Dockerfile, container, Kubernetes, k8s, compose, Helm, pod.
- **docs**: Generates/updates README, API docs, architecture notes. Triggers: docs, README, API docs, architecture note, documentation.
- **documentation-standards**: KB conventions: YAML frontmatter, 5-category taxonomy (reference/howto/procedures/troubleshooting/best-practices). Triggers: kb/, SOP, runbook, howto, frontmatter, knowledge base.
- **ecommerce-patterns**: E-commerce: cart, checkout, payments (Stripe/Adyen), order state, inventory, promos, tax. Triggers: cart, checkout, SKU, payment, Stripe, Shopify, Medusa, Magento, coupon, refund.
- **evaluate**: Evaluates RAG retrieval and LLM-as-judge metrics (faithfulness, relevancy, context precision). Triggers: measure RAG quality, knowledge gap, RAG eval, golden dataset.
- **evolve**: Analyzes agent/skill failures, drafts prompt/permission fixes. Triggers: improve agent, refine skill, system prompt, optimize agent.
- **explain**: Explains code/architecture with Mermaid diagrams and sequence flows. Triggers: what does X do, how does Y work, explain code, sequence diagram.
- **explore**: Explores codebase structure, stack, and architecture. Triggers: explore codebase, project structure, stack overview, architecture map.
- **fix**: Applies targeted fix to known bug/lint error, verifies with same command that surfaced it. Triggers: fix, apply fix, fix bug, fix lint, targeted fix.
- **flutter-patterns**: Flutter/Dart: widgets, state mgmt (Riverpod/Bloc), navigation, platform channels. Triggers: Flutter, Dart, widget, Riverpod, Bloc, pubspec, hot reload.
- **git-mastery**: Advanced Git: rebase, bisect, reflog, cherry-pick, worktrees, LFS. Triggers: rebase, bisect, cherry-pick, reflog, force push, merge conflict, worktree.
- **golang-rules**: Go coding rules: style, patterns, security, testing. Triggers: .go, go.mod, go.sum, Gin, Echo, Gorilla, testing, gofmt.
- **grill-me**: Stress-tests a plan via Socratic questioning down each decision branch. Triggers: stress-test, grill me, validate assumptions, challenge plan, socratic review.
- **health**: Service/infra health via liveness/readiness checks, resource usage, quick diagnostics. Triggers: health check, services up, system status, infra health, degraded service.
- **hipaa-validate**: HIPAA validator: PHI exposure, audit logging, encryption, access control, BAA refs. Triggers: HIPAA, PHI, healthcare compliance, audit log, BAA.
- **hook-creator**: Create new Claude Code lifecycle hook (PreToolUse/PostToolUse/Stop/SessionStart) with bash + hooks.json. Triggers: create hook, lifecycle hook, PreToolUse, PostToolUse, hook event.
- **index**: Reindexes KB for semantic search via vector store (Qdrant). Triggers: reindex KB, rebuild index, vector reindex, refresh embeddings.
- **instinct-review**: Reviews/promotes/removes instincts from `.claude/instincts/*.md`. Triggers: instinct review, curate instincts, manage instincts, promote instinct.
- **introspect**: Agent self-debugging and recovery. Use when stuck in loops, making repeated errors, or quality degrades. Triggers: introspect, self-debug, stuck, loop, why failing.
- **java-patterns**: Java: Spring Boot, CompletableFuture, records, sealed types, JPA/Hibernate, virtual threads. Triggers: Java, Spring, JPA, Hibernate, Maven, Gradle, virtual thread, sealed class.
- **java-rules**: Java coding rules: style, patterns, security, testing. Triggers: .java, pom.xml, build.gradle, Spring, Spring Boot, JPA, Hibernate, JUnit, Maven, Gradle.
- **json-mode-patterns**: Structured JSON output from Claude: tool-use-as-JSON, schema, parsing, partial recovery. Triggers: JSON mode, structured output, schema validation, JSON parsing.
- **kotlin-patterns**: Kotlin: coroutines, Flow, sealed/data classes, null safety, Ktor, Compose, KMP. Triggers: Kotlin, coroutine, Flow, suspend, Ktor, Jetpack Compose, KMP, kotlinx.
- **kotlin-rules**: Kotlin coding rules: style, patterns, security, testing. Triggers: .kt, .kts, build.gradle.kts, Ktor, Jetpack Compose, coroutines, kotlinx.
- **lint**: Runs linter+typechecker with auto-detected toolchain (ruff/mypy, eslint/tsc, phpstan, golangci-lint, clippy). Triggers: lint, typecheck, static analysis.
- **mcp-builder**: Builds production MCP servers via 4-phase methodology: research, implement, test, evaluate. Triggers: build MCP, new MCP, MCP integration, MCP server scaffold.
- **mcp-patterns**: MCP server design: tool schemas, resources, stdio/SSE, capability negotiation. Triggers: MCP, Model Context Protocol, JSON-RPC, stdio, SSE, Claude Desktop.
- **medplum-rules**: Medplum (FHIR healthcare) coding rules: style, patterns, security, testing. Triggers: medplum.config.mts, medplum.config.ts, FHIR, Medplum, Bot, Subscription, Questionnaire.
- **mem-search**: Searches past coding sessions for observations, decisions, context. Triggers: mem-search, recall session, past work, prior decisions, session history.
- **migrate**: Run/create DB migrations (Alembic, Prisma, Laravel, Django, Flyway, Drizzle); checks backup. Triggers: apply migration, rollback, generate migration.
- **migration-patterns**: Zero-downtime DB migrations: expand-contract, double-write, backfill, blue-green. Triggers: migration, schema change, backfill, ALTER TABLE, online DDL.
- **model-routing-patterns**: Multi-model pipelines (Haiku/Sonnet/Opus): cost routing, escalation, fallback chains. Triggers: model routing, Haiku, Sonnet, Opus, escalation, fallback chain.
- **night-watch**: Autonomous maintenance (dep updates, dead code, small refactors) in isolated branch, off-hours. Triggers: night watch, autonomous maintenance, dep updates.
- **observability-patterns**: Observability: structured logs, metrics (RED/USE), tracing, SLO/SLI. Triggers: logging, metrics, Prometheus, Grafana, OpenTelemetry, trace, monitoring.
- **onboard**: Sets up ai-toolkit in a project: symlinks, CLAUDE.md, intent interview. Triggers: onboard, setup project, install ai-toolkit, migrate project.
- **orchestrate**: Coordinates multiple specialized agents in parallel. Triggers: orchestrate, multi-agent, parallel agents, coordinate agents.
- **panic**: Emergency kill switch — halts all agents via lockfile gate. Triggers: panic, stop everything, kill switch, halt agents, agents looping.
- **performance-profiling**: Performance: golden signals, p50/p95/p99, flame graphs, load testing. Triggers: performance, slow, latency, p99, flame graph, bottleneck, memory leak.
- **persona**: Switches engineering persona at runtime. Triggers: persona, switch role, backend-lead, frontend-lead, devops-eng, junior-dev.
- **php-rules**: PHP coding rules: style, patterns, security, testing. Triggers: .php, composer.json, Laravel, Symfony, PHPUnit, PSR-12, Composer.
- **plan**: Breaks features/goals into phased plans with task lists, agent assignments, dependencies. Triggers: plan feature, implementation roadmap, break down task, project phases.
- **plugin-creator**: Creates opt-in plugin packs with manifests + module scaffolding for Claude/Codex. Triggers: new plugin, plugin pack, plugin scaffold.
- **pr**: Creates GitHub PR after pre-flight checks (lint/typecheck/tests), structured summary from commits. Triggers: pr, pull request, create PR, ready to merge.
- **prd-to-issues**: Splits a PRD into vertical-slice GitHub issues with HITL/AFK tagging and dependencies. Triggers: PRD to issues, create tickets, break down PRD, work items.
- **prd-to-plan**: Converts PRD into phased plan via tracer-bullet vertical slices. Triggers: PRD to plan, break down PRD, implementation plan, tracer bullets, phased plan.
- **predict**: Analyzes diffs for regression risk and blast radius, generates risk-scored impact report. Triggers: PR review, code change risk, breaking change, blast radius, regression check.
- **prompt-caching-patterns**: Anthropic API prompt caching: TTL, breakpoints, stacking, invalidation, hit rate. Triggers: prompt caching, cache_control, cache breakpoint, cache TTL, hit rate.
- **python-rules**: Python coding rules: style, patterns, security, testing. Triggers: .py, .pyi, pyproject.toml, requirements.txt, Pipfile, FastAPI, Django, Flask, pytest, SQLAlchemy, ruff, mypy.
- **qa-session**: Interactive QA: user reports bugs conversationally, agent files GitHub issues. Triggers: QA session, report bug, file issue, conversational QA, bug intake.
- **rag-patterns**: RAG: embeddings, chunking, hybrid search (BM25+vector), reranking, CRAG, multi-hop. Triggers: RAG, embedding, pgvector, Qdrant, Pinecone, Weaviate, reranker, semantic search.
- **refactor**: Refactors code for quality and maintainability. Triggers: refactor, clean up, restructure, improve code, modernize.
- **refactor-plan**: Creates detailed refactor plan with tiny commits via interview, files as GitHub RFC. Triggers: refactor plan, refactoring RFC, incremental refactor, safe steps.
- **repeat**: Runs prompt/slash command on recurring interval until done or limit. Triggers: repeat, recurring task, poll status, run every N minutes, interval.
- **research-mastery**: Hierarchical retrieval: KB → MCP/Context7 → web. Triggers: research, fact-check, verify, synthesize, cross-reference, multi-source, cite sources.
- **review**: Reviews code for quality, security, correctness. Triggers: code review, quality review, security review, review PR, review branch.
- **rollback**: Rolls back git commit, DB migration, or deploy to known-good with safety + health checks. Triggers: rollback, revert deploy, revert migration, rollback commit, git revert.
- **ruby-patterns**: Ruby/Rails: blocks, metaprogramming, ActiveRecord, Sidekiq, RSpec, Sorbet, Hanami. Triggers: Ruby, Rails, ActiveRecord, Sidekiq, RSpec, Gemfile, bundler, Hanami, Sorbet.
- **ruby-rules**: Ruby coding rules: style, patterns, security, testing. Triggers: .rb, Gemfile, .gemspec, Rails, ActiveRecord, Sidekiq, RSpec, Sorbet, rubocop.
- **rust-patterns**: Rust: ownership, lifetimes, async (Tokio), Result/anyhow/thiserror, traits, unsafe. Triggers: Rust, borrow checker, lifetime, Tokio, cargo, trait, impl, Result, unsafe, clippy.
- **rust-rules**: Rust coding rules: style, patterns, security, testing. Triggers: .rs, Cargo.toml, Cargo.lock, Tokio, Axum, Serde, clippy, cargo test.
- **security-patterns**: App security: OWASP, authN/authZ, input validation, secrets, TLS, CSRF/XSS/SQLi, JWT, CSP. Triggers: security, OWASP, auth, JWT, CSRF, XSS, SQL injection, secrets, TLS, CSP, CORS.
- **seo-validate**: SEO validator: meta/OG, Schema.org, hreflang, Core Web Vitals, crawlability. Triggers: SEO, meta tags, Schema.org, hreflang, LCP, INP, CLS, Core Web Vitals, sitemap, crawlability.
- **skill-audit**: Scans skills/agents for security risks: dangerous patterns, secrets, excessive perms. Triggers: skill audit, security scan, agent audit, dangerous pattern.
- **skill-creator**: Creates new skills from templates via guided workflow. Triggers: new skill, create skill, skill scaffold, skill template.
- **subagent-development**: Executes plans via fresh subagents per task with two-stage review (spec → quality). Triggers: subagent execution, execute plan, fresh agent per task, spec compliance review.
- **swarm**: Runs tasks via Map-Reduce, Consensus, or Relay swarms. Triggers: swarm, map-reduce, consensus swarm, relay swarm, parallel agents.
- **swift-patterns**: Swift/iOS: SwiftUI, Combine, async/await, actors, SPM, Core Data, UIKit interop. Triggers: Swift, SwiftUI, Combine, iOS, Xcode, actor, Core Data, @MainActor, @State.
- **swift-rules**: Swift coding rules: style, patterns, security, testing. Triggers: .swift, Package.swift, .xcodeproj, SwiftUI, Combine, async/await, XCTest.
- **tdd**: TDD with red-green-refactor loop and vertical slices. Triggers: TDD, test-first, red-green-refactor, test driving development.
- **test**: Runs project test suite with coverage, auto-detects framework (pytest, vitest, jest, flutter, go, cargo, phpunit). Triggers: run tests, test suite, coverage report.
- **testing-patterns**: Testing strategy: pyramid, AAA, mocks/fakes/stubs, flaky tests, coverage. Triggers: test, fixture, mock, stub, e2e, TDD, Playwright, Cypress, flaky, coverage, property-based.
- **triage-issue**: Bug triage: explores codebase for root cause, files GitHub issue with TDD fix plan. Triggers: triage, investigate bug, fix plan, root cause, file issue, bug report.
- **typescript-patterns**: TypeScript types: generics, discriminated unions, Zod, satisfies, branded types. Triggers: TypeScript, TS, generics, Zod, satisfies, strict mode.
- **typescript-rules**: TypeScript/JavaScript coding rules: style, patterns, security, testing. Triggers: .ts, .tsx, .js, .jsx, package.json, tsconfig.json, React, Next.js, Vue, Vite, Vitest, Jest, ESLint.
- **ubiquitous-language**: Extracts DDD ubiquitous language glossary, flags ambiguities, saves to UBIQUITOUS_LANGUAGE.md. Triggers: define domain terms, build glossary, harden terminology, DDD, domain model.
- **verification-before-completion**: Forces verification commands before success claims. Evidence before assertions. Triggers: complete, fixed, passing, done, ready, verified.
- **workflow**: Starts and manages autonomous agent workflows. Triggers: workflow, start workflow, autonomous agents, agent pipeline.
- **write-a-prd**: Creates PRD via interactive interview + codebase exploration + module design. Triggers: write PRD, product requirements, plan new feature, PRD interview.

## Agents

- **ai-engineer**: AI/ML integration specialist. Use for LLM integration, vector databases, RAG pipelines, embeddings, AI agent orchestration, document indexing, semantic search, hybrid retrieval, and answer generation. Triggers: ai, ml, llm, embedding, vector, rag, agent, openai, anthropic, search, retrieval, indexing, chunking, reranking.
- **backend-specialist**: Expert backend architect for Node.js, Python, PHP, and modern serverless systems. Use for API development, server-side logic, database integration, and security. Triggers: backend, server, api, endpoint, database, auth, fastapi, express, laravel.
- **business-intelligence**: Opportunity Discovery agent. Scans data models and code to identify missing business metrics, KPIs, and opportunities for value creation.
- **chaos-monkey**: Resilience testing agent. Use to inject faults, latency, and failures into the system to verify robustness and recovery mechanisms.
- **chief-of-staff**: Executive Summary agent. Aggregates reports from all other agents to reduce noise and present a single, actionable daily briefing to the user.
- **code-archaeologist**: Legacy code investigation and understanding specialist. Trigger words: legacy code, code archaeology, dead code, technical debt, dependency analysis, refactoring, code history
- **code-reviewer**: Code review and security audit expert. Use for security reviews, Devil's Advocate analysis, quality audits, best practices validation. Triggers: review, security, audit, quality, best practices, vulnerability.
- **command-expert**: CLI commands and shell scripting specialist. Trigger words: bash, shell, CLI, script, automation, command line, build script, deployment script
- **data-analyst**: Data analysis and visualization expert. Use for SQL queries, data exploration, analytics, reporting, and insights. Triggers: data, analysis, sql, query, visualization, metrics, dashboard, pandas, report.
- **data-scientist**: Statistical analysis and data insights specialist. Use for statistical analysis, data visualization, EDA, A/B testing, and predictive modeling. Triggers: statistics, visualization, eda, analysis, hypothesis testing, ab test.
- **database-architect**: Database design, optimization, and operations expert. Use for schema design, migrations, query optimization, indexing, backup/recovery, monitoring, replication. Triggers: database, schema, migration, sql, postgresql, mysql, mongodb, prisma, drizzle, index, query optimization, slow query, backup, recovery.
- **debugger**: Root cause analysis expert. Use for cryptic errors, stack traces, intermittent failures, silent bugs, and systematic debugging. Triggers: debug, error, exception, traceback, bug, failure, root cause.
- **devops-implementer**: Infrastructure implementation expert. Use for writing Terraform, Ansible, Docker, and shell scripts based on approved architecture notes and implementation summaries. Triggers: terraform, ansible, docker, kubernetes, shell, infrastructure, deployment, configuration.
- **documenter**: Documentation and KB expert. Use for architecture notes, runbooks, changelogs, KB updates, how-to guides, API docs, READMEs, tutorials, SOP creation, KB organization, content quality review. Triggers: document, documentation, architecture-note, runbook, changelog, howto, readme, kb, sop, technical writing.
- **explorer-agent**: Codebase exploration and discovery agent. Use for mapping project structure, finding dependencies, understanding architecture, and research. Does NOT write code - only reads and analyzes.
- **fact-checker**: Claim verification expert. Use for verifying facts, source validation, RAG result accuracy checking. Triggers: fact check, verify, accuracy, claim, source validation.
- **frontend-specialist**: Senior Frontend Architect for React, Next.js, Vue, and modern web systems. Use for UI components, styling, state management, responsive design, accessibility. Triggers: component, react, vue, ui, ux, css, tailwind, responsive, nextjs.
- **game-developer**: Game development across all platforms (PC, Web, Mobile, VR/AR). Use for Unity, Godot, Unreal, Phaser, Three.js. Covers game mechanics, multiplayer, optimization, 2D/3D graphics.
- **incident-responder**: Production incident response expert. Use for P1-P4 incidents, outages, emergency fixes, and postmortem documentation. Triggers: incident, outage, production down, emergency, P1, alert, monitoring.
- **infrastructure-architect**: System design expert. Use for architectural decisions, architecture notes, trade-off analysis, technology selection. Triggers: architecture, design, decision, trade-off, scalability, infrastructure planning.
- **infrastructure-validator**: Deployment validation expert. Use for deployment verification, health checks, testing, rollback procedures. Triggers: validate, deploy, deployment, health check, smoke test, rollback.
- **llm-ops-engineer**: LLM operations expert. Use for LLM caching, fallback strategies, cost optimization, observability, and reliability. Triggers: llm, language model, openai, ollama, caching, fallback, token, cost.
- **mcp-specialist**: MCP server design, implementation, client configuration, and integration troubleshooting. Triggers: mcp, model context protocol, json-rpc, sse, stdio, mcp server, mcp config, mcp integration, mcp connection, claude desktop, mcp client.
- **mcp-testing-engineer**: MCP protocol testing expert. Use for MCP server testing, protocol compliance, transport validation, integration testing. Triggers: mcp test, protocol compliance, mcp validation, transport testing.
- **meta-architect**: Self-Optimization agent. Analyzes system performance and mistakes to update agent definitions and instructions. The only agent allowed to modify .claude/agents/*.
- **ml-engineer**: Machine learning systems specialist. Use for model training, data pipelines, MLOps, and model deployment. Triggers: ml, machine learning, model training, mlops, tensorflow, pytorch, scikit-learn.
- **mobile-developer**: Expert in React Native, Flutter, and native mobile development. Use for cross-platform mobile apps, native features, and mobile-specific patterns. Triggers: mobile, react native, flutter, ios, android, app store, expo, swift, kotlin.
- **night-watchman**: Autonomous maintenance agent. Use for automated dependency updates, dead code removal, refactoring, and project hygiene tasks. Typically scheduled to run off-hours.
- **nlp-engineer**: Natural Language Processing specialist. Use for text processing, NER, text classification, information extraction, and language model fine-tuning. Triggers: nlp, ner, tokenization, text classification, sentiment, spacy, transformers.
- **orchestrator**: Multi-agent coordination and task orchestration. Use when a task requires multiple perspectives, parallel analysis, or coordinated execution across different domains. Invoke for complex tasks benefiting from security, backend, frontend, testing, and DevOps expertise combined.
- **performance-optimizer**: Performance optimization expert. Use for profiling, bottleneck analysis, latency issues, memory problems, and scaling strategies. Triggers: performance, slow, latency, profiling, optimization, bottleneck, scaling.
- **predictive-analyst**: Precognition agent. Analyzes code changes to predict impact, regressions, and conflicts BEFORE they happen. Uses dependency graphs and historical data.
- **product-manager**: Product management and value maximization expert. Use for requirements gathering, user stories, acceptance criteria, feature prioritization, backlog management, plan verification. Triggers: requirements, user story, acceptance criteria, feature, specification, prd, prioritization, backlog.
- **project-planner**: Smart project planning agent. Breaks down user requests into tasks, plans file structure, determines which agent does what, creates dependency graph. Use when starting new projects or planning major features.
- **prompt-engineer**: LLM prompt design and optimization specialist. Trigger words: prompt, LLM, chain-of-thought, few-shot, system prompt, prompt engineering, token optimization
- **qa-automation-engineer**: Test automation and QA specialist. Use for E2E testing, API testing, performance testing, and CI/CD test integration. Triggers: e2e, playwright, cypress, selenium, api test, performance test, automation.
- **search-specialist**: Information retrieval and search optimization specialist. Trigger words: search, query, semantic search, information retrieval, relevance, ranking, search optimization
- **security-architect**: Proactive security design expert. Use for Threat Modeling, architecture security reviews, and designing secure systems (AuthN/AuthZ, Crypto).
- **security-auditor**: Security expert. Use for OWASP Top 10, CVE analysis, security audits, penetration testing, vulnerability assessment, hardening. Triggers: security, owasp, cve, vulnerability, audit, hardening, penetration, pentest, injection test, api security.
- **seo-specialist**: Search engine + generative engine optimization specialist. Trigger words: SEO, GEO, AEO, search engine, meta tags, structured data, Core Web Vitals, sitemap, robots.txt, schema.org, llms.txt, ChatGPT visibility, Claude citation, Perplexity ranking, AI Overviews, topical authority, chunk architecture, semantic triples, query fan out
- **system-governor**: The Guardian of the Constitution. Validates all evolutionary changes and enforces immutable rules. Has VETO power.
- **tech-lead**: Technical authority for code quality, architecture patterns, and stack decisions. Use for code reviews, technological disputes, and standards enforcement.
- **technical-researcher**: Deep technical investigation and multi-source research synthesis specialist. Trigger words: technical research, feasibility study, root cause analysis, API investigation, compatibility research, comparison matrix, synthesize, aggregate, report, executive summary, gap analysis, findings, multi-source, cross-reference
- **test-engineer**: Testing expert. Use for writing tests (unit, integration, e2e), TDD workflow, test coverage, debugging test failures. Triggers: test, pytest, unittest, coverage, tdd, testing, mock, fixture.

---

## kb/best-practices/README.md

---
title: "Best Practices"
service: ai-toolkit
category: best-practices
tags: [best-practices, guidelines]
last_updated: "2026-03-25"
---

# Best Practices

Guidelines and recommendations. Guides will be added here as they are created.

---

## kb/best-practices/no-hardcoded-counts.md

---
title: "No Hardcoded Counts in Secondary Docs"
category: best-practices
service: ai-toolkit
tags: [counts, documentation, maintenance, skills, agents, hooks, tests]
created: "2026-04-07"
last_updated: "2026-04-07"
description: "Counts (skills, agents, hooks, tests) should only appear in README.md and manifest.json. All other docs must NOT contain hardcoded numbers to avoid drift."
---

# No Hardcoded Counts in Secondary Docs

## Rule

Hardcoded counts (skills, agents, hooks, tests, plugins) are allowed ONLY in:
- **README.md** — badges, "What You Get" table, comparison table
- **manifest.json** — module descriptions
- **package.json** — description field

All other files (CLAUDE.md, ARCHITECTURE.md, KB docs, plugin.json, llms.txt, AGENTS.md, copilot-instructions, rules, GEMINI.md) must NOT contain hardcoded counts like "90 skills" or "44 agents".

## Why

Every time a skill, agent, or hook is added/removed, dozens of files need updating. This causes constant drift and stale counts that erode trust. Consolidating to 2-3 files makes maintenance tractable.

## How to Apply

- In secondary docs, use relative language: "all agents", "the full skill set", "available hooks"
- If a doc MUST reference scale, use: "see README.md for current counts"
- `validate.py` checks counts only in README.md badges — that's sufficient
- When adding skills/agents/hooks, update ONLY: README.md badges + manifest.json descriptions

## Anti-Pattern

```markdown
# BAD — hardcoded count in ARCHITECTURE.md
Shared AI development toolkit — 90 skills, 44 agents

# GOOD — no count
Shared AI development toolkit with multi-platform support
```

---

## kb/history/completed/deep-coverage-v3-20260423.md

---
title: "Plan: Deep Coverage v3.0 — 100% Native Surface Utilization Across 12 Tools"
category: planning
service: ai-toolkit
doc_type: plan
status: completed
tags: [v3, deep-coverage, ecosystem, generators, hooks, skills, subagents, commands, profile-full]
created: "2026-04-23"
last_updated: "2026-04-23"
completed: "2026-04-23"
completion: "100%"
description: "Ship v3.0.0 where every supported editor exposes the full ai-toolkit surface it is capable of hosting natively: hooks, subagents, custom commands, skill pointers. Introduce --profile full. Skip the 2.13.0 interim release and fold the completed deep sweep into 3.0.0."
---

# Plan: Deep Coverage v3.0 — 100% Native Surface Utilization

**Status:** :yellow_circle: IN PROGRESS
**Invocation:** continuation of `ecosystem-deep-sweep-2026-04-23` — same orchestration model
**Estimated effort:** 10-15h orchestrated (~3h wall-clock across 4 parallel buckets + consolidation)
**Deliverable:** v3.0.0 release where every editor's native surface is fully utilized; `--profile full` available

---

## 1. Objective

After the 2026-04-23 deep sweep closed the doc-drift gap, **v3.0.0 closes the capability-utilization gap**: each editor now exposes the full ai-toolkit surface it can host natively.

Definition of "100% coverage" chosen: **each editor works at 100% of its native capability** (compat-read counts). No cargo-cult duplication. No writing to `~/.cursor/`, `~/.augment/rules/` etc. globally.

---

## 2. Policy decisions (immutable constraints for all buckets)

| # | Decision | Rule |
|---|----------|------|
| 1 | Skill propagation | `.claude/skills/` canonical. Cursor/Windsurf/opencode → compat-read (nothing). Augment/Gemini/Antigravity → **pointer skill** (1 file per editor). Codex → native `.agents/skills/` mirror |
| 2 | Global writes | Only `~/.claude/`. Cursor/Windsurf/opencode get global coverage via compat-read. Augment/Gemini/Roo require `--local` |
| 3 | Surface activation | **`--profile full`** turns on every native surface. `standard` stays close to today's defaults but adds niepodważalne wypełnienia (Copilot wiring + Gemini hooks). `minimal` unchanged |
| 4 | Default behavior | `--editors <name>` alone uses `standard`. Users who want the full stack pass `--profile full` |
| 5 | Version | Skip 2.13. Ship everything (completed sweep + v3 work) as **3.0.0** with migration notes |

---

## 3. What's missing → what each bucket delivers

### Bucket 1 — Hooks generators (backend-specialist)

**Owned files**
- New: `scripts/generate_gemini_hooks.py` (writes `.gemini/settings.json` hooks merge)
- New: `scripts/generate_cursor_hooks.py` (writes `.cursor/hooks.json`)
- New: `scripts/generate_windsurf_hooks.py` (writes `.windsurf/hooks.json`)
- New: `scripts/generate_augment_hooks.py` (writes `~/.augment/settings.json` hooks merge)
- New: `tests/test_hooks_per_editor.bats` (≥20 tests covering all 4 generators)

**Must-haves**
- All generators reuse `~/.softspark/ai-toolkit/hooks/*.sh` scripts (no duplicate shell code).
- Preserve user-authored hook entries; mark our entries with `_source: ai-toolkit`.
- Idempotent on regeneration.
- Event mapping informed by each editor's docs (Claude Code events ↔ target editor events).

### Bucket 2 — Native agents + custom commands (ai-engineer)

**Owned files**
- New: `scripts/generate_augment_agents.py` (`.augment/agents/*.md` with YAML frontmatter: name, description, model, color, tools, disabled_tools)
- New: `scripts/generate_augment_commands.py` (`.augment/commands/*.md` from user-invocable skills)
- New: `scripts/generate_cursor_agents.py` (`.cursor/agents/*.md` mirroring Claude Code agents)
- New: `scripts/generate_gemini_commands.py` (`.gemini/commands/*.toml` custom slash commands)
- New: `tests/test_native_surfaces.bats` (≥25 tests)

**Must-haves**
- Filter: only `user-invocable: true` skills become custom commands.
- `ai-toolkit-*` prefix everywhere for install/uninstall sweep.
- Do not touch files without our prefix.

### Bucket 3 — Skill pointers + Codex mirror (ai-engineer)

**Owned files**
- New: `scripts/generate_gemini_skills.py` (`.gemini/skills/ai-toolkit-skill-catalogue/SKILL.md` — pointer)
- New: `scripts/generate_augment_skills.py` (`.augment/skills/ai-toolkit-skill-catalogue/SKILL.md` — pointer)
- New: `scripts/generate_codex_skills.py` (Codex-native mirror to `.agents/skills/<name>/SKILL.md`)
- New: `tests/test_skills_native.bats` (≥15 tests)

**Must-haves**
- Pointer pattern same as Antigravity: 1 file per editor referencing `~/.claude/skills/<name>` and listing the catalogue.
- Codex mirror respects `user-invocable: false` (knowledge skills stay, task skills stay — Codex reads them all).
- Codex skills use the upstream `.agents/skills` discovery path.

### Bucket 4 — Install wiring + profile full + docs (devops-implementer)

**Owned files**
- `scripts/install_steps/ai_tools.py` — wire in all new generators from buckets 1-3
- `scripts/install.py` — parse `--profile full`, propagate to `_create_local_ai_tool_configs`
- `scripts/config_validator.py` — ensure `full` profile is accepted (already present, verify)
- `README.md` — "What's New in v3.0.0" + migration notes
- `CHANGELOG.md` — v3.0.0 entry
- `kb/reference/global-install-model.md` — document profile semantics
- `kb/reference/supported-tools-registry.md` — per-tool "generators by profile" column
- `kb/procedures/maintenance-sop.md` — profile table update
- `package.json` version bump → `3.0.0`
- `package-lock.json` sync
- `tests/test_install_profiles.bats` (≥15 tests covering minimal/standard/strict/full × 3 editors)

**Must-haves**
- `standard` profile: Copilot directory mode ON, Gemini hooks ON (non-breaking additions).
- `full` profile: everything from `standard` + all native surfaces from buckets 1-3.
- Migration note: users on `standard` today get Copilot instructions/prompts and Gemini hooks automatically after upgrading (acceptable breaking for major bump, documented).

---

## 4. Success criteria

- [ ] 13 new Python generators (6+4+3, minus wiring)
- [ ] ≥75 new bats tests across buckets
- [ ] `npm test` green
- [ ] `python3 scripts/validate.py --strict` 0/0
- [ ] `python3 scripts/ecosystem_doctor.py --check` exit 0
- [ ] `ai-toolkit install --local --editors all --profile full` produces every native surface per editor
- [ ] `ai-toolkit install --local --editors all --profile standard` is still minimal-invasive (no subagents, no hooks for non-Claude editors) **except** Copilot + Gemini hooks (both documented migration notes)
- [ ] README test badge bumped
- [ ] CHANGELOG v3.0.0 entry
- [ ] All docs updated (registry, install model, maintenance SOP)
- [ ] Single atomic commit
- [ ] Tag `v3.0.0` ready (push held for user confirmation)

---

## 5. Safety rails

- **Do not commit during bucket work.** Orchestrator consolidates.
- **Do not touch files outside your bucket's ownership list.**
- **Preserve user files** via `ai-toolkit-*` prefix on generated artifacts.
- **Do not write to global editor paths** (`~/.cursor/`, `~/.augment/rules/`, etc.) — policy decision 2.
- **Do not change `standard` profile in ways that break existing users**, beyond the two documented additions (Copilot directory mode + Gemini hooks).
- **Test per bucket locally before reporting.** Bucket reports must include a "tests green" line.

---

## 6. Consolidation steps (orchestrator)

1. Merge all 4 bucket registry deltas into `scripts/ecosystem_tools.json`
2. Run `python3 scripts/ecosystem_doctor.py --update`
3. Run `npm run generate:all`
4. Bump `package.json` → `3.0.0`; sync `package-lock.json`
5. Update `README.md` badge + "What's New"
6. Update `CHANGELOG.md` with v3.0.0 entry (include migration notes)
7. Run `python3 scripts/validate.py --strict` (must pass 0/0)
8. Run `npm test` (all green)
9. Run `python3 scripts/ecosystem_doctor.py --check` (exit 0)
10. Move this plan doc to `kb/history/completed/deep-coverage-v3-20260423.md`
11. Single commit + tag `v3.0.0`
12. Hold push pending user confirmation

---

## 7. Related

- `kb/history/completed/ecosystem-deep-sweep-20260423.md` — predecessor plan (doc-drift closure)
- `kb/reference/global-install-model.md` — install scope semantics
- `kb/reference/supported-tools-registry.md` — tool registry
- `scripts/config_validator.py` — `VALID_PROFILES` already includes `full`

---

## kb/history/completed/ecosystem-deep-sweep-20260423.md

---
title: "Plan: Ecosystem Deep Sweep — All 12 Supported Tools"
category: planning
service: ai-toolkit
doc_type: plan
status: completed
tags: [ecosystem, editors, generators, deep-sweep, orchestrate, drift, integration]
created: "2026-04-23"
last_updated: "2026-04-23"
completed: "2026-04-23"
completion: "100%"
description: "Orchestrate-ready plan for a deep per-tool documentation sweep across all 12 supported tools (Claude Code + 11 editors). Each agent owns 2-3 tools: fetches docs, diffs against our generators, proposes minimal patches. Consolidation step collects results into a single changeset."
---

# Plan: Ecosystem Deep Sweep — All 12 Supported Tools

**Status:** :yellow_circle: PROPOSED
**Invocation:** `/orchestrate deep ecosystem sweep per kb/planning/ecosystem-deep-sweep-2026-04-23.md`
**Estimated effort:** 4-6 hours orchestrated (1-1.5 h per agent in parallel)
**Deliverable:** Per-tool drift report + concrete generator/skill patches + updated registry

---

## 1. Objective

For every supported tool in `scripts/ecosystem_tools.json`:

1. Read the current official documentation end-to-end (not just landing page)
2. Identify every feature that ai-toolkit could integrate with but does not currently
3. Classify each gap using the ecosystem-sync SOP taxonomy (class A-F)
4. Produce minimal, reviewable patches for class B/D/E/F gaps
5. Update the registry (`ecosystem_tools.json`) with new capability markers and config paths
6. Refresh the snapshot (`benchmarks/ecosystem-doctor-snapshot.json`)

**Explicit non-goals:** complete feature parity, deep refactor of generators, adding new editors to the roster.

---

## 2. Parallelization Strategy

12 tools → **4 agents × 3 tools each** by affinity and complexity:

| Agent | Role | Tools | Rationale |
|-------|------|-------|-----------|
| `backend-specialist` | Deep CLI / config analysis | `claude-code`, `codex-cli`, `opencode` | CLI + config.toml + agents/commands/plugins — backend integration depth |
| `frontend-specialist` | Editor UI integrations | `cursor`, `windsurf`, `google-antigravity` | Editor-embedded AI, rule files, MCP-via-UI |
| `devops-implementer` | Pipeline + rules tools | `github-copilot`, `cline`, `roo-code` | Rules directories, MCP JSON variants, mode configs |
| `ai-engineer` | LLM-native tools | `gemini-cli`, `aider`, `augment` | Pure LLM workflows, minimal IDE coupling |

Each agent works **in parallel**, independent file scopes (different generators). Cross-file coordination only at the registry update (single JSON file).

---

## 3. Per-Tool Task Template

Every agent applies the **same 7-step protocol** per tool in their bucket:

### Step 1 — Baseline our current integration

Read these files (read-only):
- `scripts/generate_<tool>_*.py` — every generator targeting this tool
- `scripts/ecosystem_tools.json` — the tool's registry entry
- `kb/reference/supported-tools-registry.md` — human docs section
- `benchmarks/ecosystem-doctor-snapshot.json` — last-seen headings/markers/version

Produce a 3-line summary: "we currently generate X, Y, Z for this tool".

### Step 2 — Fetch official docs

Primary URL is in `ecosystem_tools.json::urls.docs`. Additionally fetch:
- `urls.release_notes` — recent changes (last 6 months)
- `urls.changelog` — if distinct from release notes
- Any deep-link from the docs landing page that corresponds to an integration surface (rules, hooks, MCP, agents, commands, plugins, config schema)

Use `WebFetch` (for general) or `gh api` (for GitHub-hosted docs like Codex CLI, opencode).

### Step 3 — Extract the feature surface

For the current version of the tool, enumerate:
- Config file paths (the tool's OWN paths, not ours)
- Rule / instruction / prompt formats
- Hook / lifecycle event names (if any)
- MCP config target path (if supported)
- Agent / custom-mode / preset concepts (if any)
- Slash command / CLI subcommand surface
- Supported model providers (note, do not integrate)
- Authentication / API-key mechanisms

Produce a structured markdown table: `Feature | Since version | Stable? | Our integration?`

### Step 4 — Diff against our output

For each feature in the table, compare against:
- What our `generate_<tool>_*.py` produces
- What fields are in our registry's `capability_markers`

Mark each row with one of:
- `✅ supported` — we already emit / track it
- `⚠️ partial` — we emit a subset; specific sub-feature missing
- `❌ missing` — we do not support at all
- `➖ out of scope` — tool has it, but not applicable to ai-toolkit's mission

### Step 5 — Classify each gap

For each `⚠️` / `❌` row, assign one of the SOP drift classes:

| Class | Name | Action |
|-------|------|--------|
| A | Cosmetic | No code change; update snapshot only |
| B | New feature — integrate | Patch generator(s), add tests |
| C | New feature — not adopted | Note in registry, no code |
| D | Deprecation | Migration warning in generator + CHANGELOG |
| E | Feature promoted to default | Simplify generator; keep fallback comment |
| F | Newly globally available | New generator / extended generator |

### Step 6 — Produce patches (class B/D/E/F only)

For every class B/D/E/F gap:
1. Edit the relevant generator in `scripts/generate_<tool>_*.py`
2. If a new capability marker emerges, add to `ecosystem_tools.json::capability_markers`
3. If a new config path emerges, add to `ecosystem_tools.json::config_paths`
4. If a hook event or skill frontmatter field emerges (for Claude Code), update:
   - `app/skills/hook-creator/SKILL.md` (hooks table)
   - `app/skills/skill-creator/SKILL.md` (frontmatter reference)
   - `scripts/validate.py` (allowlist)
5. Add a bats test under `tests/test_<tool>.bats` covering the new output
6. Update the tool's section in `kb/reference/supported-tools-registry.md`

**Constraints on patches:**
- One generator change per logical feature (no "big bang" commits)
- Preserve existing output format for backward compatibility
- New output opt-in via flag if it would change existing user-visible state
- Every new capability marker must pass the doctor's probe on the live docs page

### Step 7 — Report

Each agent emits a single markdown report with:
- Feature matrix table (step 3+4+5 combined)
- List of patches applied (files changed, bats tests added)
- List of class B/D/E/F gaps NOT patched (with reason: "out of scope", "requires user decision", "blocker")
- Registry diff (before/after for the tool's JSON entry)

---

## 4. Consolidation (after all agents finish)

Run in order:

1. Merge registry entries — single edit to `ecosystem_tools.json` combining all 12 per-tool updates
2. Regenerate human registry doc: manually update `kb/reference/supported-tools-registry.md` from JSON
3. `python3 scripts/ecosystem_doctor.py --update` — baseline new capability markers
4. `python3 scripts/validate.py --strict` — must pass
5. `npm test` — must pass (includes the newly added bats tests per tool)
6. `python3 scripts/ecosystem_doctor.py --check` — exit 0
7. Regenerate downstream artifacts:
   ```bash
   npm run generate:all
   ```
8. Collect all per-agent reports into `kb/learnings/ecosystem-sweep-2026-04-23.md`

---

## 5. Success Criteria

- [ ] All 12 tools covered (no "skipped for time" items)
- [ ] Every class B/D/E/F gap has either a patch OR a documented reason for deferral
- [ ] Registry `capability_markers` list grew for at least 6 of 12 tools (signals real gap coverage)
- [ ] `validate.py --strict`: 0 errors, 0 warnings
- [ ] `npm test`: all green (including new per-tool bats tests)
- [ ] `ecosystem_doctor.py --check`: exit 0 after snapshot refresh
- [ ] Single consolidated commit per agent-bucket, plus one final consolidation commit

---

## 6. Known Traps (from prior ecosystem work)

- **SPA docs** (Cursor, Antigravity, some Augment pages): `urllib` gets empty HTML skeleton. Agents should note this and do a **manual browser visit** or use a JS-aware fetcher. Do not treat "0 headings" as "nothing new".
- **GitHub docs** rate-limit aggressively on repeated reads. Space out fetches or use `gh api`.
- **Feature gates** vary by user plan. Copilot Business vs Individual vs Enterprise have different surface. Integrate with the OSS surface; document gated features as C (not adopted).
- **Version skew** on config schemas. A setting that existed in v1.x may be deprecated in v2.x. When docs reference "available since v1.5" and we don't know what version users run, default to generating the newer form with a comment.
- **Markdown vs MDX**: Cursor uses `.mdc`, Claude Code uses `.md`, Cline uses `.md` in `.clinerules/`, Roo uses `.md` in `.roo/rules/`. Don't assume one format fits all.

---

## 7. Orchestrate Invocation

In a fresh Claude Code session (to avoid context rot from this session):

```
/orchestrate deep ecosystem sweep for ai-toolkit per kb/planning/ecosystem-deep-sweep-2026-04-23.md

Spawn 4 agents in parallel:
- backend-specialist: claude-code, codex-cli, opencode
- frontend-specialist: cursor, windsurf, google-antigravity
- devops-implementer: github-copilot, cline, roo-code
- ai-engineer: gemini-cli, aider, augment

Each agent follows the 7-step per-tool protocol in section 3.
After all 4 report, run consolidation (section 4) and produce the sweep summary.
```

---

## 8. Deliverables (per agent)

Each agent's final output to orchestrator:
1. **Feature matrix** — one table per assigned tool (step 3+4+5)
2. **Patch log** — list of commits staged (not committed yet — orchestrator consolidates)
3. **Registry delta** — proposed JSON diff for `ecosystem_tools.json`
4. **Gaps not patched** — with rationale (out-of-scope, blocker, deferred)
5. **Test additions** — bats test file names + test count

Orchestrator's final output:
1. Consolidated commit with message `feat(ecosystem): deep sweep 2026-04-23 — N class B/F integrations`
2. Version bump decision (minor if any class B/F, patch if only class A updates)
3. `kb/learnings/ecosystem-sweep-2026-04-23.md` — retrospective noting which tools needed most work (informs priority for next sweep)

---

## 9. Safety Rails

- **Do not** silently upgrade default behavior — every user-visible change lands behind a flag OR goes through a minor version bump with CHANGELOG mention
- **Do not** rewrite generators wholesale — incremental additions only
- **Do not** commit during the sweep — orchestrator consolidates at the end
- **Do not** modify files outside the tool's scope (e.g., backend-specialist touching frontend-specialist's files requires a handoff)
- **Do** preserve existing symlinks and file-path expectations — the installer depends on them

---

## 10. Related

- [Ecosystem Sync SOP](../procedures/ecosystem-sync-sop.md) — the process this plan instantiates
- [Supported Tools Registry](../reference/supported-tools-registry.md) — source of truth for tool list
- `scripts/ecosystem_doctor.py` — drift detector consumed by orchestrator consolidation
- `scripts/ecosystem_tools.json` — registry file edited by every agent

---

## 11. Retrospective — 2026-04-23

### Execution summary

- 4 parallel agents, 3 tools each — full 12/12 coverage, one consolidation pass.
- 161 new bats tests (679 → 840); validate.py 0 errors / 0 warnings; `ecosystem_doctor --check` exit 0.
- 14 files modified, 12 new test files, 2 registry docs updated, 1 snapshot rebaselined.

### What worked

- **Bucket-level file ownership** eliminated merge conflicts entirely. Agents that flagged cross-bucket edits (`ecosystem_tools.json`, registry markdown) correctly left them for the orchestrator.
- **The 7-step protocol** caught high-impact bugs we would have shipped otherwise — Windsurf rules missing `trigger:` frontmatter (silent invisibility to Cascade), Aider's default `attribute-co-authored-by: true` violating our own git policy, Roo modes lacking `whenToUse` (invisible to Orchestrator).
- **SPA-wall compensation patterns** (Antigravity bundle strings, Cursor/Windsurf llms.txt mirrors, GitHub release notes as fallback) were reusable across buckets.

### What surprised us

- **Claude Code 2.1.x grew ~14 new hook events** and 3 new handler types since our last sync. Our validate.py allowlist was the bottleneck, not any generator.
- **Cross-editor compat reads**: Cursor, Windsurf, and opencode now natively read `.claude/skills/` and `.claude/agents/` — we get skill/agent discovery in those editors "for free" without emitting duplicates. Saved ~300 generated files.
- **Copilot tier-gating is heavy**: half of the upstream surface (custom agents, repo MCP, org instructions) is Business/Enterprise-only and was classified as C (documented non-integration).
- **Test #755 regression** from the Codex `PermissionRequest` addition: the test counted `guard-destructive.sh` occurrences with `== 1`. Fixed by updating the expected count to 2 with a comment explaining why base hooks legitimately register it twice now.

### Open items flagged for future passes

1. **Native `.agents/skills/*/SKILL.md` emission** (class B) — writes the Codex skill catalog to the upstream discovery path.
2. **`.opencode/skills/` duplication** — deferred indefinitely; `.claude/skills/` fallback already works.
3. **New generators needed**: `generate_gemini_hooks.py`, `generate_augment_agents.py`, `generate_augment_commands.py`, `generate_augment_hooks.py`.
4. **Cross-editor hooks unification**: Cursor and Windsurf both shipped `.cursor/hooks.json` and `.windsurf/hooks.json` — worth a dedicated shared-schema pass rather than per-editor copies.
5. **Roo `.roomodes` YAML variant** — upstream-preferred; deferred until a YAML multi-line helper is added.
6. **Copilot install wiring**: new `.github/instructions/` and `.github/prompts/` directories are emitted when `generate_copilot.py` is called with a target dir, but `install_steps/ai_tools.py` doesn't invoke that path yet. Wire behind minor bump.

### Process refinements for next sweep

- **Add a "class B/F deferred" register**: buckets produced these ad-hoc; a structured list in the plan would make prioritization for the next sweep trivial.
- **Cross-bucket test impact**: adding per-tool bats tests inflates the test count and trips the README badge validator. Next time, bump the badge at the start of consolidation, not at the end.
- **Search docs via llms.txt first** when the vendor publishes one — bypasses SPA walls with zero fallback logic.

---

## kb/history/completed/enterprise-config-inheritance-plan-20260412.md

---
title: "Plan: Enterprise Config Inheritance — Multi-Repo Governance with extends"
category: planning
service: ai-toolkit
tags:
  - enterprise
  - multi-repo
  - config-inheritance
  - extends
  - governance
  - team-management
  - monorepo
doc_type: plan
status: completed
created: "2026-04-10"
last_updated: "2026-04-11"
completion: "100%"
description: "Configuration inheritance system for ai-toolkit. Enables organizations to define a shared base config (agents, rules, hooks, profiles, constitution overrides) published as an npm package or local path, which individual projects extend via an `extends` field. Changes to the base config propagate automatically on `ai-toolkit update`. Targets enterprises managing 10-100+ repositories with uniform AI governance."
---

# Plan: Enterprise Config Inheritance — Multi-Repo Governance with `extends`

**Status:** Proposed
**Completion:** 0%
**Created:** 2026-04-10
**Origin:** Organizations adopting ai-toolkit across 10-100+ repositories face a config synchronization problem — updating a rule or policy requires touching every repository individually. The `extends` pattern (popularized by ESLint, TypeScript, Prettier) solves this by establishing a single source of truth that projects inherit from.
**Estimated Effort:** 5-7 weeks (1 person) — MVP (core engine + install integration) shippable in ~3.5 weeks

---

## 1. Objective

Create a configuration inheritance system where projects can extend a shared base config published as an npm package, a Git URL, or a local path. The base config defines organizational defaults (which agents to enable, which rules to enforce, which hooks to require, persona presets, and constitution amendments). Individual projects can override or supplement the base, creating a layered governance model.

**Key design principles:**
- **Familiar pattern** — mirrors ESLint's `extends`, TypeScript's `extends`, and Prettier's shared configs
- **npm-first distribution** — base configs are regular npm packages (e.g., `@mycompany/ai-toolkit-config`). Resolver shells out to `npm pack` CLI (respects `.npmrc` auth) — no hand-rolled npm client, preserves stdlib-only constraint
- **Single extends in v1** — `"extends": "string"` only. Multi-base merge (`"extends": [...]`) deferred to v2 to avoid merge-ordering complexity (ESLint's multi-extends is a known source of confusion)
- **Layered merge** — base → project, with explicit override semantics (`override: true` required for safety-critical overrides)
- **Constitution immutable** — base constitution articles cannot be modified by projects, period. Projects can only ADD new articles (article 6+). No weakening detection heuristics — absolute immutability is simpler and safer
- **Offline-capable** — resolved at `install`/`update` time, not at runtime
- **Backward-compatible** — projects without `extends` work exactly as today (no breaking changes)
- **Audit trail** — `state.json` records which base config was resolved and what was overridden

---

## 1a. Functional Requirements

| ID | Requirement | Priority | Success Metric |
|----|-------------|----------|----------------|
| FR1 | Resolve `extends` from npm package, git URL, local path | Must | 4 source types work |
| FR2 | Deep merge base → project config with layered semantics | Must | Merge engine handles dict, list, scalar types |
| FR3 | Constitution immutability — Articles I-V cannot be modified | Must | 100% block rate on modification attempts |
| FR4 | Override validation with `override: true` + `justification` | Must | Missing justification → error |
| FR5 | `enforce` block constraints (minHookProfile, requiredPlugins, forbidOverride, requiredAgents) | Must | All 4 constraint types enforced |
| FR6 | Install/update integration — resolve extends during install | Must | `install --local` detects `.softspark-toolkit.json` |
| FR7 | `config diff` command — show project vs base differences | Must | All merge layers visible |
| FR8 | `config validate` command — schema + enforcement validation | Must | Exit 0/1 for pass/fail |
| FR9 | `config init` — interactive project config setup | Should | Guided flow produces valid `.softspark-toolkit.json` |
| FR10 | `config create-base` — scaffold npm base config package | Should | Ready-to-publish package with `package.json` |
| FR11 | Lock file for reproducible installs | Should | Identical resolved config across team members |
| FR12 | Audit trail in `state.json` | Should | Resolved version + overrides recorded |
| FR13 | CI enforcement command (`config check`) | Could | Exit 0/1 for governance compliance |
| FR14 | Multi-base extends (`"extends": [...]`) | Won't (v2) | Deferred — merge ordering complexity |

---

## 2. Architecture Overview

```
Organization Level (published once, consumed by all repos):
═══════════════════════════════════════════════════════════

  @mycompany/ai-toolkit-config (npm package)
  ├── ai-toolkit.config.json       ← base configuration
  ├── rules/
  │   ├── code-review-policy.md    ← company-specific rules
  │   └── deployment-checklist.md
  ├── agents/
  │   └── compliance-auditor.md    ← company-specific agent
  └── package.json

Project Level (per-repository):
══════════════════════════════

  my-service/
  ├── .softspark-toolkit.json             ← project config with "extends"
  ├── .claude/
  │   ├── CLAUDE.md                ← generated (base + project merged)
  │   └── settings.json            ← generated (base hooks + project hooks merged)
  └── ...

Merge Pipeline:
══════════════

  @mycompany/ai-toolkit-config     ← Layer 0: organizational defaults
          │
          ▼
  ai-toolkit defaults (manifest.json) ← Layer 1: toolkit defaults
          │
          ▼
  .softspark-toolkit.json                  ← Layer 2: project overrides
          │
          ▼
  Resolved Configuration             ← Final: CLAUDE.md, settings.json, etc.
```

### Config Resolution Order

```
1. Load base config from "extends" (npm package, git URL, or local path)
2. Merge with ai-toolkit defaults (manifest.json profiles)
3. Apply project-level overrides from .softspark-toolkit.json
4. Validate merged config (constitution immutability, schema validation)
5. Generate output files (CLAUDE.md, settings.json, agent symlinks, etc.)
```

---

## 3. Progress Tracking

| # | Feature | Priority | Status | Est. Time | Notes |
|---|---------|----------|--------|-----------|-------|
| 1.1 | `.softspark-toolkit.json` schema definition | P0 | **Done** | 1d | `scripts/schemas/ai-toolkit-config.schema.json` |
| 1.2 | Config resolver (npm, git, local path) | P0 | **Done** | 3d | `scripts/config_resolver.py` (~330 LOC) |
| 1.3 | Merge engine (layered merge with override semantics) | P0 | **Done** | 3d | `scripts/config_merger.py` (~340 LOC) |
| 1.4 | Constitution immutability guard | P0 | **Done** | 1d | In config_merger.py `_merge_constitution()` |
| 2.1 | Install/update integration | P0 | **Done** | 2d | `install.py` + `ai_tools.py` — auto-detect, resolve, merge, inject |
| 2.2 | `ai-toolkit config diff` command | P0 | **Done** | 1.5d | `scripts/config_cli.py` `cmd_diff()` |
| 2.3 | `ai-toolkit config validate` command | P0 | **Done** | 1d | `scripts/config_cli.py` `cmd_validate()` |
| 2.4 | `ai-toolkit config init` command | P1 | **Done** | 1.5d | Interactive + flag-driven, validates extends |
| 2.5 | `ai-toolkit config create-base` command | P1 | **Done** | 2d | `scripts/config_scaffold.py` — full npm package scaffold |
| 3.1 | Audit trail in state.json | P1 | **Done** | 1d | `install_state.py` extends field + `.softspark-toolkit-extends.json` |
| 3.2 | Lock file (`.softspark-toolkit.lock.json`) | P1 | **Done** | 1.5d | `scripts/config_lock.py` — generate/consume/staleness check |
| 3.3 | Base config scaffolder (npm package template) | P1 | **Done** | 1.5d | Part of `config_scaffold.py` `create_base_package()` |
| 3.4 | CI enforcement (`ai-toolkit config check`) | P2 | **Done** | 1d | `config_cli.py` `cmd_check()` — JSON output, exit codes |
| 4.1 | Tests | P1 | **Done** | 3d | 39 tests: resolver (7), merger (13), CLI (10), install integration (9) |
| 4.2 | Documentation | P1 | **Done** | 3d | `kb/reference/enterprise-config-guide.md` — comprehensive guide |

**Phasing (MVP-first):**
- **MVP Phase 1 (week 1-2):** Core engine — schema (1.1), resolver (1.2), merge engine (1.3), constitution guard (1.4)
- **MVP Phase 2 (week 2-3):** Integration + diff — install integration (2.1), `config diff` (2.2), `config validate` (2.3), tests for above (~3.5 weeks = shippable MVP)
- **Phase 3 (week 4-5):** CLI polish — `config init` (2.4), `config create-base` (2.5), scaffolder (3.3)
- **Phase 4 (week 5-6):** Enterprise — audit trail (3.1), lock file (3.2), CI enforcement (3.4) (**gate behind real enterprise feedback**)
- **Phase 5 (week 6-7):** Tests + documentation (4.1, 4.2) (3d docs — all 9 docs per CLAUDE.md rules)

> **Demand validation gate:** Ship MVP (Phases 1-2), announce, measure adoption. Only build Phase 4 (lock file, CI enforcement, audit trail) in response to confirmed enterprise demand.

---

## 4. Dependency Graph

```
                     MVP Phase 1: Core Engine (week 1-2)
                     ====================================
Schema definition (1.1) ──────┐
                              ├──► Merge engine (1.3)
Config resolver (1.2) ────────┤
                              └──► Constitution guard (1.4)

                     MVP Phase 2: Integration + Diff (week 2-3)
                     ============================================
Install integration (2.1) ──┐
                            ├──► config diff (2.2)
                            └──► config validate (2.3)
                            └──► MVP tests → SHIP

                     ═══ DEMAND VALIDATION GATE ═══

                     Phase 3: CLI Polish (week 4-5)
                     ===============================
                            ├──► config init (2.4)
                            └──► create-base (2.5) + scaffolder (3.3)

                     Phase 4: Enterprise (week 5-6)
                     ================================
Audit trail (3.1) ──┐
                    ├──► Lock file (3.2)
                    └──► CI enforcement (3.4)

                     Phase 5: Polish (week 6-7)
                     ===========================
                            └──► Full tests + docs (4.1, 4.2)
```

---

## 5. Detailed Implementation

### Phase 1: Core Engine (week 1-2)

#### 1.1 Configuration Schema (`.softspark-toolkit.json`)

> **v1 scope:** The full schema below shows the target state. v1 implements only: `extends`, `profile`, `agents`, `rules`, `constitution`, and `enforce`. See section 6a for the v1/v2 field breakdown.

**Project-level config file:**

```json
{
  "$schema": "https://softspark.github.io/ai-toolkit/schemas/ai-toolkit-config.json",

  "extends": "@mycompany/ai-toolkit-config",

  "profile": "standard",
  "persona": "backend-lead",
  "hookProfile": "strict",

  "agents": {
    "enabled": ["backend-specialist", "test-engineer", "debugger"],
    "disabled": ["game-developer", "mobile-developer"],
    "custom": ["./agents/compliance-auditor.md"]
  },

  "skills": {
    "disabled": ["/deploy", "/rollback"],
    "custom": ["./skills/internal-deploy/"]
  },

  "rules": {
    "inject": ["./rules/code-review-policy.md"],
    "remove": []
  },

  "plugins": {
    "required": ["security-pack", "memory-pack"],
    "forbidden": []
  },

  "languages": ["typescript", "python"],

  "editors": ["cursor", "windsurf", "copilot"],

  "constitution": {
    "amendments": [
      {
        "article": 6,
        "title": "Data Sovereignty",
        "text": "All code generation must comply with GDPR. No personal data in prompts. No PII in generated code comments."
      }
    ]
  },

  "overrides": {
    "hooks": {
      "quality-check": {
        "override": true,
        "justification": "Company uses custom lint pipeline via Jenkins",
        "replacement": "skip"
      }
    }
  }
}
```

**Base config (`ai-toolkit.config.json` in npm package):**

```json
{
  "$schema": "https://softspark.github.io/ai-toolkit/schemas/ai-toolkit-base-config.json",
  "name": "@mycompany/ai-toolkit-config",
  "version": "2.1.0",
  "description": "MyCompany standard AI coding config",

  "extends": null,

  "profile": "strict",
  "persona": "backend-lead",
  "hookProfile": "strict",

  "agents": {
    "enabled": ["backend-specialist", "test-engineer", "code-reviewer", "security-auditor", "debugger", "documenter"],
    "disabled": ["game-developer"],
    "custom": ["./agents/compliance-auditor.md"]
  },

  "rules": {
    "inject": [
      "./rules/code-review-policy.md",
      "./rules/deployment-checklist.md",
      "./rules/data-handling-policy.md"
    ]
  },

  "plugins": {
    "required": ["security-pack"]
  },

  "languages": ["typescript"],

  "constitution": {
    "amendments": [
      {
        "article": 6,
        "title": "Data Sovereignty",
        "text": "All code generation must comply with GDPR. No personal data in prompts."
      },
      {
        "article": 7,
        "title": "Audit Compliance",
        "text": "All AI-generated code changes must be logged to the company audit system. The governance-capture hook must remain enabled."
      }
    ]
  },

  "enforce": {
    "minHookProfile": "standard",
    "requiredPlugins": ["security-pack"],
    "forbidOverride": ["constitution", "guard-destructive", "guard-path"],
    "requiredAgents": ["security-auditor"]
  }
}
```

**`enforce` section:** Base configs can define non-overridable constraints:
- `minHookProfile` — projects cannot go below this profile
- `requiredPlugins` — must be installed in all projects
- `forbidOverride` — these components cannot be overridden
- `requiredAgents` — must be enabled in all projects

---

#### 1.2 Config Resolver

**Resolution sources:**

| Source | Syntax | Resolution |
|--------|--------|------------|
| npm package | `"extends": "@mycompany/ai-toolkit-config"` | `npm pack --pack-destination /tmp` + extract |
| npm with version | `"extends": "@mycompany/ai-toolkit-config@^2.0.0"` | Version resolution via npm |
| Git URL | `"extends": "git+https://github.com/myco/ai-config.git"` | `git clone --depth 1` to cache |
| Local path | `"extends": "../shared-config"` | Resolve relative to project root |
| ~~Multiple bases~~ | ~~`"extends": ["@mycompany/base", "@mycompany/typescript-extra"]`~~ | Deferred to v2 — multi-base merge ordering is a complexity trap |

**Cache directory:** `~/.softspark/ai-toolkit/config-cache/`
```
~/.softspark/ai-toolkit/config-cache/
  @mycompany/
    ai-toolkit-config/
      2.1.0/
        ai-toolkit.config.json
        rules/
        agents/
```

**Resolution algorithm:**
```python
def resolve_extends(extends_value: str, project_root: str) -> list[BaseConfig]:
    """Resolve extends chain into ordered list of base configs.

    v1: single string only. Multi-base (list) deferred to v2.
    """
    configs = []
    for source in [extends_value]:  # v2: support list[str]
        if source.startswith('@') or source.startswith('npm:'):
            config = resolve_npm(source)
        elif source.startswith('git+'):
            config = resolve_git(source)
        elif source.startswith('.') or source.startswith('/'):
            config = resolve_local(source, project_root)
        else:
            raise ConfigError(f"Unknown extends source: {source}")

        # Recursive: base config may also have "extends"
        if config.extends:
            parent_configs = resolve_extends(config.extends, config.root)
            configs.extend(parent_configs)

        configs.append(config)

    return configs


def resolve_extends(extends_value: str, project_root: str,
                    _visited: set[str] | None = None) -> list[BaseConfig]:
    """Full signature with cycle detection via visited set."""
    if _visited is None:
        _visited = set()
    if extends_value in _visited:
        raise ConfigError(
            f"Circular extends detected: {extends_value} already in chain "
            f"{' → '.join(_visited)}. Check your base config's 'extends' field."
        )
    if len(_visited) >= 5:
        raise ConfigError(
            f"Extends chain too deep (max 5 levels). Chain: {' → '.join(_visited)}"
        )
    _visited.add(extends_value)
    # ... resolution logic as above, passing _visited to recursive calls
```

**Max recursion depth:** 5 levels (prevent circular extends). Circular detection via visited set.

**Offline handling:** If the npm/git source is unavailable:
1. Check cache (`~/.softspark/ai-toolkit/config-cache/`)
2. If cached version found → use with warning: "Using cached config v2.1.0 (offline)"
3. If not cached → error with instructions: "Run `ai-toolkit config update` when online"

---

#### 1.3 Merge Engine

**Layered deep merge with explicit override semantics:**

```python
def merge_configs(base: dict, project: dict) -> dict:
    """Merge project config over base config with rules."""
    merged = {}

    for key in set(base.keys()) | set(project.keys()):
        base_val = base.get(key)
        proj_val = project.get(key)

        if proj_val is None:
            merged[key] = base_val
        elif base_val is None:
            merged[key] = proj_val
        elif key == 'constitution':
            merged[key] = merge_constitution(base_val, proj_val)
        elif key == 'agents':
            merged[key] = merge_agents(base_val, proj_val)
        elif key == 'rules':
            merged[key] = merge_rules(base_val, proj_val)
        elif key == 'overrides':
            merged[key] = validate_overrides(base, proj_val)
        elif isinstance(base_val, dict) and isinstance(proj_val, dict):
            merged[key] = merge_configs(base_val, proj_val)
        elif isinstance(base_val, list) and isinstance(proj_val, list):
            merged[key] = list(set(base_val + proj_val))  # union
        else:
            merged[key] = proj_val  # project wins for scalars

    return merged
```

**Agent merge rules:**
```python
def merge_agents(base: dict, project: dict) -> dict:
    """Merge agent configs — project can enable/disable but not remove base-required."""
    merged_enabled = set(base.get('enabled', []))

    # Project can add agents
    merged_enabled.update(project.get('enabled', []))

    # Project can disable agents (unless base enforces them)
    for agent in project.get('disabled', []):
        if agent in base.get('enforce', {}).get('requiredAgents', []):
            raise ConfigError(
                f"Cannot disable '{agent}' — required by base config '{base['name']}'. "
                f"Contact your team lead to request an exemption."
            )
        merged_enabled.discard(agent)

    return {
        'enabled': sorted(merged_enabled),
        'custom': base.get('custom', []) + project.get('custom', [])
    }
```

**Override validation:**
```python
def validate_overrides(base: dict, overrides: dict) -> dict:
    """Validate project overrides against base enforcement rules."""
    forbidden = set(base.get('enforce', {}).get('forbidOverride', []))

    for key, override in overrides.items():
        if key in forbidden:
            raise ConfigError(
                f"Cannot override '{key}' — forbidden by base config '{base['name']}'.\n"
                f"Forbidden overrides: {', '.join(sorted(forbidden))}\n"
                f"Contact your team lead to request an exemption."
            )
        if not override.get('override'):
            raise ConfigError(
                f"Override for '{key}' requires explicit 'override: true' + 'justification' field.\n"
                f"This ensures intentional deviation from organizational defaults."
            )
        if not override.get('justification'):
            raise ConfigError(
                f"Override for '{key}' requires a 'justification' field explaining why.\n"
                f"Example: \"Company uses custom lint pipeline via Jenkins\""
            )

    return overrides
```

---

#### 1.4 Constitution Immutability Guard

**Core rule:** Base constitution articles are absolutely immutable. Projects can only ADD new articles.

No weakening-detection heuristic (character count, semantic analysis) — these produce false positives and are gameable. Instead, the rule is simple and absolute: if an article number exists in the base, it cannot be modified by the project.

```python
def merge_constitution(base: dict, project: dict) -> dict:
    """Merge constitution — additions only, no modifications."""
    base_amendments = {a['article']: a for a in base.get('amendments', [])}
    proj_amendments = {a['article']: a for a in project.get('amendments', [])}

    # Toolkit articles I-V are always immutable
    IMMUTABLE_ARTICLES = {1, 2, 3, 4, 5}

    merged = dict(base_amendments)

    for article_num, amendment in proj_amendments.items():
        if article_num in IMMUTABLE_ARTICLES:
            raise ConfigError(
                f"Cannot modify Constitution Article {article_num} — immutable.\n"
                f"Articles I-V are defined by ai-toolkit and cannot be overridden.\n"
                f"You can ADD new articles (article 6+)."
            )
        if article_num in base_amendments:
            # Base articles are immutable — projects cannot modify them
            raise ConfigError(
                f"Cannot modify Constitution Article {article_num} — "
                f"defined by base config '{base.get('name', 'unknown')}'.\n"
                f"Base articles are immutable. You can ADD new articles "
                f"with a higher article number."
            )
        merged[article_num] = amendment

    return {'amendments': list(merged.values())}
```

---

### MVP Phase 2: Integration + Diff (week 2-3)

#### 2.1 Install/Update Integration

**Modified `install.py` flow:**

```python
# During install --local:
# 1. Check for .softspark-toolkit.json in project root
# 2. If found and has "extends":
#    a. Resolve base config(s)
#    b. Merge base → project
#    c. Validate merged config
#    d. Generate files from merged config
# 3. If not found: proceed with current behavior (backwards compatible)
```

**CLI flags:**
```bash
ai-toolkit install --local                           # auto-detect .softspark-toolkit.json
ai-toolkit install --local --config ./custom.json    # explicit config file
ai-toolkit update --local                            # re-resolve extends + update
ai-toolkit update --local --refresh-base             # force re-fetch base config
```

---

#### 2.2 `ai-toolkit config diff`

**Show differences between project config and base:**

```bash
ai-toolkit config diff

# Output:
# Base: @mycompany/ai-toolkit-config@2.1.0
#
# Profile:     strict (base) → standard (project) ⚠ OVERRIDE
# Persona:     backend-lead (base) → frontend-lead (project)
# Hook Profile: strict (base) → strict (inherited)
#
# Agents:
#   + frontend-specialist     (project adds)
#   - game-developer          (base disables)
#   = security-auditor        (base requires, cannot disable)
#
# Rules:
#   + ./rules/api-standards.md  (project adds)
#   = code-review-policy.md     (inherited from base)
#
# Constitution:
#   = Articles I-V              (immutable)
#   = Article 6: Data Sovereignty (inherited from base)
#   + Article 8: API Standards    (project adds)
#
# Overrides:
#   quality-check: SKIP (justification: "Custom Jenkins pipeline")
```

---

#### 2.3 `ai-toolkit config validate`

```bash
ai-toolkit config validate

# Checks:
# ✓ .softspark-toolkit.json schema valid
# ✓ extends: @mycompany/ai-toolkit-config@2.1.0 resolved
# ✓ No forbidden overrides
# ✓ Required plugins installed: security-pack
# ✓ Required agents enabled: security-auditor
# ✓ Constitution articles I-V intact
# ✓ Hook profile meets minimum: standard ≥ standard
# ✓ All custom rule files exist
# ✓ All custom agent files exist
```

---

### Phase 3: CLI Polish (week 4-5)

#### 2.4 `ai-toolkit config init`

**Interactive project config setup:**

```bash
ai-toolkit config init

# Flow:
# 1. "Does your organization have a shared ai-toolkit config? [y/n]"
#    → y: "npm package name or git URL:" → resolves + validates
#    → n: creates minimal .softspark-toolkit.json without extends
# 2. "Which profile? [minimal/standard/strict]" → default from base or standard
# 3. "Which persona? [none/backend-lead/frontend-lead/devops-eng/junior-dev]"
# 4. Auto-detect languages from project
# 5. Auto-detect editors from project files
# 6. Write .softspark-toolkit.json
# 7. Run ai-toolkit install --local
```

---

#### 2.5 `ai-toolkit config create-base`

**Scaffold a base config package:**

```bash
ai-toolkit config create-base @mycompany/ai-toolkit-config

# Creates:
# @mycompany-ai-toolkit-config/
# ├── package.json          (name, version, files, peerDependencies)
# ├── ai-toolkit.config.json (base config with sane defaults)
# ├── rules/                (empty, ready for company rules)
# ├── agents/               (empty, ready for company agents)
# └── README.md             (setup instructions)
```

**Generated `package.json`:**
```json
{
  "name": "@mycompany/ai-toolkit-config",
  "version": "1.0.0",
  "description": "Shared ai-toolkit configuration for MyCompany",
  "main": "ai-toolkit.config.json",
  "files": ["ai-toolkit.config.json", "rules/", "agents/"],
  "peerDependencies": {
    "@softspark/ai-toolkit": ">=1.5.0"
  },
  "keywords": ["ai-toolkit", "config", "shared"]
}
```

---

### Phase 4: Enterprise Features (week 5-6)

#### 3.1 Audit Trail

**`state.json` additions:**
```json
{
  "extends": {
    "source": "@mycompany/ai-toolkit-config",
    "version": "2.1.0",
    "resolved_at": "2026-04-10T10:30:00Z",
    "hash": "sha256:abc123...",
    "overrides_applied": [
      {
        "key": "hooks.quality-check",
        "action": "skip",
        "justification": "Custom Jenkins pipeline"
      }
    ]
  }
}
```

---

#### 3.2 Lock File (`.softspark-toolkit.lock.json`)

**Purpose:** Pin the exact resolved version of base configs for reproducible installs across team members and CI.

```json
{
  "lockfileVersion": 1,
  "resolved": {
    "@mycompany/ai-toolkit-config": {
      "version": "2.1.0",
      "resolved": "https://registry.npmjs.org/@mycompany/ai-toolkit-config/-/ai-toolkit-config-2.1.0.tgz",
      "integrity": "sha512-abc123...",
      "cached": "~/.softspark/ai-toolkit/config-cache/@mycompany/ai-toolkit-config/2.1.0/"
    }
  },
  "generated_at": "2026-04-10T10:30:00Z",
  "ai_toolkit_version": "1.5.1"
}
```

**Behavior:**
- `ai-toolkit install --local` → uses lock file if present (like `npm ci`)
- `ai-toolkit update --local` → re-resolves and updates lock file (like `npm install`)
- `ai-toolkit update --local --refresh-base` → force re-fetch ignoring cache
- `.softspark-toolkit.lock.json` should be committed to git (team synchronization)

---

#### 3.4 CI Enforcement

**`ai-toolkit config check` — for CI pipelines:**

```bash
ai-toolkit config check

# Exit codes:
# 0 — project complies with base config
# 1 — violations found (missing required plugins, forbidden overrides, etc.)
# 2 — .softspark-toolkit.json not found
```

**GitHub Actions example:**
```yaml
- name: AI Toolkit Governance Check
  run: |
    npx @softspark/ai-toolkit config check
    npx @softspark/ai-toolkit config validate --strict
```

**What it checks:**
1. Required plugins are installed
2. Required agents are enabled
3. No forbidden overrides applied without exemption
4. Hook profile meets minimum
5. Constitution articles intact
6. Lock file up-to-date (warn if stale)

---

## 6. File Summary

| File | Action | LOC (est.) | Description |
|------|--------|------------|-------------|
| `scripts/config_resolver.py` | CREATE | ~400 | Resolve extends (npm, git, local path) |
| `scripts/config_merger.py` | CREATE | ~350 | Layered merge engine |
| `scripts/config_validator.py` | CREATE | ~200 | Schema + enforcement validation |
| `scripts/config_scaffold.py` | CREATE | ~250 | create-base scaffolder |
| `scripts/config_diff.py` | CREATE | ~200 | Diff viewer |
| `scripts/config_check.py` | CREATE | ~150 | CI enforcement checker |
| `scripts/install.py` | EDIT | +80 | Integrate extends resolution |
| `bin/ai-toolkit.js` | EDIT | +40 | Register config subcommands |
| `manifest.json` | EDIT | +10 | Schema references |
| `kb/reference/enterprise-config-guide.md` | CREATE | ~300 | Enterprise setup guide |
| `kb/reference/base-config-template/` | CREATE | ~200 | Scaffolded base config files |
| `tests/test_config_resolver.bats` | CREATE | ~150 | Resolution tests |
| `tests/test_config_merger.bats` | CREATE | ~200 | Merge + override tests |
| `tests/test_config_immutability.bats` | CREATE | ~100 | Constitution guard tests |
| `tests/test_config_cli.bats` | CREATE | ~150 | CLI command tests |
| **Total** | | **~2780** | |

---

## 6a. Schema Scope (v1 vs v2)

v1 ships with a minimal schema. Each additional field adds merge logic, validation, diff output, and test surface. Expand based on real usage, not speculation.

| Field | v1 | v2 | Rationale |
|-------|----|----|-----------|
| `extends` | single string | array (multi-base) | Multi-base merge ordering is complex |
| `profile` | yes | — | Core governance knob |
| `agents` | yes | — | Most common customization |
| `rules` | yes | — | Rule injection is existing feature |
| `constitution` | yes | — | Key differentiator |
| `enforce` | yes | — | Non-overridable constraints |
| `skills` | — | yes | Less commonly customized at org level |
| `plugins` | — | yes | Depends on plugin maturity |
| `languages` | — | yes | Auto-detected, rarely org-level |
| `editors` | — | yes | Auto-detected, rarely org-level |
| `overrides` | — | yes | Complex, needs real-world feedback |
| `hookProfile` / `persona` | — | yes | Low demand signal |

---

## 6b. Non-Functional Requirements

| Category | Requirement |
|----------|-------------|
| **Performance** | `install --local` with extends resolution < 5s (cached), < 15s (first fetch). Config merge < 100ms. |
| **Offline** | Cached configs used when registry unavailable, with clear warning. |
| **Security** | No secret exposure in config files or audit trail. npm auth via `.npmrc` (user-managed). `execFile` for npm CLI (no shell injection). |
| **Error messages** | Every validation error includes: what failed, which config layer caused it, and what to do (e.g., "Contact your team lead to request an exemption"). |
| **Backward compatibility** | 100% — projects without `.softspark-toolkit.json` work exactly as today. Zero behavioral changes for existing users. |
| **Maintainability** | Each new schema field requires: merge logic, validation, diff output, test. Budget 0.5d per new field. |
| **Quality gates** | `ruff check scripts/config_*.py` (0 errors), `mypy --strict scripts/config_*.py` (0 errors). Run before every commit. |
| **Type safety** | 100% public API type hints (all function signatures). >60% internal. Use `TypedDict` for config schemas, `dataclass` for resolved configs. |

---

## 7. Success Criteria (Overall)

| Metric | Target |
|--------|--------|
| Extends sources (v1) | 4 (npm, npm+version, git URL, local path) — single string only |
| Merge depth | 5 levels max (recursive extends) |
| Config schema | JSON Schema validated |
| Constitution protection | 100% (Articles I-V immutable) |
| Override justification | Required for all overrides |
| Enforce constraints | 4 types (minHookProfile, requiredPlugins, forbidOverride, requiredAgents) |
| Backward compatibility | 100% (projects without .softspark-toolkit.json work as today) |
| CI enforcement | Exit code 0/1 for governance compliance |
| Lock file | Reproducible installs across team members |
| Scaffold command | Ready-to-publish npm package template |
| Tests | 30+ |
| Offline resolution | Cached configs with warning |

---

## 8. Risks and Mitigation

| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| npm registry unavailable during install | Low | Medium | Cache + offline fallback with warning |
| Circular extends chain | Low | High | Max depth 5 + visited set for cycle detection |
| Base config breaks project | Medium | High | Lock file pins exact version; `ai-toolkit config diff` shows changes before update |
| Override abuse (teams bypass governance) | Medium | Medium | `enforce.forbidOverride` + CI check + justification requirement |
| Config schema too restrictive | Medium | Medium | Start with minimal enforcement, expand based on enterprise feedback |
| Multiple base configs conflict | — | — | Deferred to v2 (single extends only in v1) |
| Private npm registry authentication | Medium | Low | Use existing npm auth (`.npmrc`), document setup |
| Git URL resolution slow | Low | Low | `--depth 1` clone, cache aggressively |

---

## 9. Pre-Mortem

1. **"Config file fatigue"** — developers already have `.eslintrc`, `tsconfig.json`, `.prettierrc`. Another `.softspark-toolkit.json` may feel like bloat. Mitigation: file is optional, all features work without it. The DX gain (organizational governance without per-repo updates) justifies the file.
2. **"Base config never gets updated"** — team lead creates base config, nobody maintains it. Mitigation: `ai-toolkit config check` in CI catches drift; lock file staleness warnings.
3. **"Override justification is annoying"** — developers will write "needed" as justification. Mitigation: CI check can enforce minimum justification length (>20 chars); code review culture catches low-effort justifications.
4. **"Merge semantics are confusing"** — "does project override or extend the base agent list?" Mitigation: explicit semantics documented in schema; `ai-toolkit config diff` shows exactly what happened.
5. **"Enterprise teams want RBAC on overrides"** — who can approve overrides? Mitigation: v1 uses justification text + code review; v2 could integrate with GitHub CODEOWNERS for override approval.

---

## 10. Market Positioning

**Target users:**
1. **Engineering managers** — enforce AI coding standards across 20+ repos without touching each one
2. **Security teams** — ensure constitution + security-auditor agent is always enabled
3. **Platform teams** — distribute company-specific agents, rules, and plugins via npm
4. **Compliance officers** — audit trail of what AI governance rules are active in each project

**Competitive advantage:** No existing AI coding toolkit supports configuration inheritance. This is a unique enterprise feature that transforms ai-toolkit from a developer tool into an organizational governance platform.

**Revenue potential:** Enterprise teams are the primary audience for paid support/consulting around ai-toolkit. Config inheritance is the feature that makes enterprise adoption manageable.

---

## 11. Next Actions

**MVP (ship first, ~3.5 weeks):**
1. [x] Approve plan
2. [x] Define `.softspark-toolkit.json` JSON Schema — v1 scope only (1.1)
3. [x] Implement config resolver (npm, git, local) with caching (1.2)
4. [x] Implement merge engine with override validation (1.3)
5. [x] Implement constitution immutability guard (1.4)
6. [x] Integrate into install.py flow (2.1)
7. [x] Create `config diff` viewer (2.2) — primary debugging tool
8. [x] Create `config validate` checker (2.3)
9. [x] Tests for above (4.1 partial)
10. [x] **Ship MVP → announce → measure adoption**

**Post-MVP:**
11. [x] Create `config init` interactive command (2.4)
12. [x] Create `config create-base` scaffolder (2.5)
13. [x] Add audit trail to state.json (3.1)
14. [x] Implement lock file generation + resolution (3.2)
15. [x] Create base config npm package template (3.3)
16. [x] Create CI enforcement command `config check` (3.4)
17. [x] Full tests + documentation (4.1, 4.2)

**All 17 items completed — 2026-04-11.**

---

## 12. Future (v2)

| Feature | Rationale |
|---------|-----------|
| Multi-base extends (`"extends": [...]`) | Needs real-world feedback on merge ordering UX |
| v1 deferred schema fields (skills, plugins, languages, editors, overrides, hookProfile, persona) | Expand based on actual enterprise requests |
| RBAC on overrides (GitHub CODEOWNERS integration) | v1 uses justification + code review |
| Semantic constitution analysis | Character-count heuristics removed in v1; revisit only if absolute immutability proves too restrictive |
| `ai-toolkit config audit` (full governance report) | Depends on audit trail maturity |

---

## 13. Cross-Plan Dependencies

This plan shares modification targets with the Offline SLM plan:

| Shared File | This Plan | Offline SLM Plan |
|-------------|-----------|-----------------|
| `scripts/install.py` | +80 LOC (extends resolution) | +30 LOC (offline-slm profile) |
| `manifest.json` | +10 LOC (schema refs) | +5 LOC (offline-slm profile) |
| `bin/ai-toolkit.js` | +40 LOC (config subcommands) | +10 LOC (compile-slm command) |

**If implementing in parallel:** coordinate merge order for shared files. Recommended sequence: Offline SLM (smallest changes) → Enterprise Config.

---

**Last Updated:** 2026-04-10

---

## kb/history/completed/f2-mcp-trim-spike-20260504.md

---
title: "Spike: F2 MCP Context Trim — Hook Feasibility & Path Decision"
category: planning
service: ai-toolkit
tags:
  - mcp
  - hooks
  - claude-code
  - spike
  - feasibility
doc_type: spike
status: completed
created: "2026-05-04"
last_updated: "2026-05-04"
completed: "2026-05-04"
shipped_in: "v3.2.0 (decision only — implementation deferred to v4.0)"
description: "Spike conclusion for Feature 2 of the output-token-discipline plan. Determines whether Claude Code hooks can modify MCP tool descriptions before they reach the LLM. Result: hooks operate per-call, not on tool list metadata. Full feature requires an MCP proxy server (multi-day scope). Outcome: F2 deferred to v4.0 with own dedicated PRD."
---

# Spike: F2 MCP Context Trim — Hook Feasibility & Path Decision

## Question

Can Claude Code's hook system modify MCP **tool descriptions** that get included in the model's system prompt, or do hooks only intercept individual tool **calls**?

## Method

Reviewed local sources only (RAG MCP offline at spike time):

1. `app/skills/hook-creator/SKILL.md` — exhaustive list of supported hook events and their data shapes
2. `app/hooks/guard-destructive.sh`, `app/hooks/guard-path.sh` — actual examples reading `tool_input` from stdin
3. `app/skills/mcp-builder/SKILL.md` — MCP server-side conventions
4. `~/.claude/.mcp.json` — user's installed MCP servers (Context7, sequential-thinking, filesystem, rag-mcp, memory, jira-mcp)

## Findings

### Hook events that touch tool data

| Event | Modifies tool list? | Modifies tool input? | Notes |
|-------|--------------------|--------------------|-------|
| `PreToolUse` | no | no (only block via exit 2) | Reads `tool_input.*`, decides allow/deny |
| `PostToolUse` | no | no | Sees result for logging / feedback |
| `PermissionRequest` | no | yes (`updatedInput`) | Can rewrite a single call's args |
| `Elicitation` | no | n/a | Intercepts MCP UI prompts, not tool list |
| `SessionStart` | no | n/a | Context injection only |
| `InstructionsLoaded` | no | n/a | Verifies CLAUDE.md presence |

**No event exposes the MCP `tools/list` response or the system-prompt tool catalog**. The tool catalog is materialized once per MCP server connection from the server's own `tools/list` reply.

### Why this matters

The compression target was the bulk of MCP tool descriptions sitting in every model turn's system prompt. Examples from the user's installed servers:

- `dart-mcp-server` — ~30 tools with multi-paragraph descriptions
- `filesystem` — verbose paths and example sections
- `pencil` — "IMPORTANT" stanzas repeated across tools
- `jira-mcp` — long `Use this tool to...` boilerplate

At ~100 tools across 7 servers in this user's config, easily 8–15k tokens of pure description text. Real waste, but Claude Code does not let a hook touch it.

### What would actually work

To compress MCP tool descriptions before they reach the LLM, exactly two architectures are viable:

1. **Local MCP proxy server** between Claude Code and each target server. The proxy re-implements `tools/list` to rewrite descriptions on the fly while passing through `tools/call`. Requires JSON-RPC 2.0 over stdio + SSE per server, per-server config in `~/.claude/.mcp.json`, and a process supervisor for the proxies. Multi-day scope. Failure mode: a buggy proxy breaks all MCP-dependent skills.
2. **Source-side fork**: ship pre-trimmed copies of common MCP servers (`@softspark/jira-mcp-trim`, etc.) — high maintenance burden, doesn't help users with custom servers.

Neither is a "minimal change" by the standards of this plan.

## Decision (final, 2026-05-04)

**Drop F2 from v3.2.0 entirely. Defer the full MCP proxy approach to v4.0** with its own dedicated PRD and architecture spike.

The spike originally surfaced a smaller "F2-lite observability tool" alternative (read-only inventory + suggestions). After review, the user chose to drop both options from v3.2.0:

- v3.2.0 ships F1 + F3 + F3.5 only (output modes, token telemetry, default statusline)
- F2 work — including any observability-first prototype — moves wholesale to v4.0 milestone
- Reasoning: keep v3.2.0 release scope tight; v4.0 owns MCP-cost story end-to-end with proper proxy architecture

## Alternatives considered

| Option | Pros | Cons | Verdict |
|--------|------|------|---------|
| Build full MCP proxy in v3.2.0 | Achieves original compression goal | Multi-day work, single-bug-breaks-all-MCP failure mode, would block release | Rejected — too big for current release |
| Pre-install rewrite of `.mcp.json` | One-shot, no runtime cost | MCP spec sources descriptions from server, not config — wouldn't actually take effect | Rejected — does not work |
| F2-lite observability tool in v3.2.0 | Low risk, gives users data | Not the original target; partial value; mixes two milestones | Rejected by user — keep v3.2.0 focused |
| **Defer F2 entirely to v4.0** | Clean release boundaries; v4.0 owns MCP story end-to-end with full proxy scope | Token waste in MCP descriptions stays invisible to users until v4.0 | **Selected** |

## What was delivered in v3.2.0 (F1 + F3 + F3.5)

The output-discipline goal is partially addressed by what shipped:

- **Output modes** (F1) cut conversational response tokens 60–80% on the shipped fixture set
- **Real token telemetry** (F3) lets users see actual cost per session — including the MCP description overhead, even if they cannot yet trim it
- **Default statusline** (F3.5) surfaces that cost continuously

Users now have visibility into the MCP-description waste this spike identified, even though automated compression has to wait for v4.0.

## What goes into v4.0

Tracked as an active PRD: [`kb/planning/mcp-context-trim-v4-prd.md`](../../planning/mcp-context-trim-v4-prd.md). It carries forward:

1. Compression heuristics from the original F2 design (migrated out of the archived plan into the live PRD)
2. Local MCP proxy server architecture: JSON-RPC 2.0 over stdio + SSE per server, process supervisor, per-server config in `~/.softspark/ai-toolkit/mcp-proxy/`
3. Rollback / opt-out story — a buggy proxy must not break MCP-dependent skills
4. Failure mode — proxy down → fall through to direct MCP server, with telemetry warning
5. Migration of existing user `.mcp.json` configs

Estimate in the PRD: ~8 working days.

## Status

| Date | Status | Author |
|------|--------|--------|
| 2026-05-04 | Spike completed | claude |
| 2026-05-04 | User decision: defer F2 to v4.0 entirely (no F2-lite in v3.2.0) | lukasz.krzemien |
| 2026-05-04 | Spike archived to `kb/history/completed/` alongside the parent plan | claude |

---

## kb/history/completed/offline-slm-profile-plan-20260411.md

---
title: "Plan: Offline-First SLM Profile — Lightweight Mode for Local Models"
category: planning
service: ai-toolkit
tags:
  - offline
  - slm
  - small-language-models
  - ollama
  - lm-studio
  - profile
  - context-optimization
  - privacy
doc_type: plan
status: completed
created: "2026-04-10"
last_updated: "2026-04-11"
completion: "100%"
completed: "2026-04-11"
description: "Lightweight profile for ai-toolkit optimized for Small Language Models (SLMs) running locally via Ollama, LM Studio, or similar. Compiles a minimal instruction set that fits within 4K-8K system prompt budgets while preserving critical safety guardrails. Targets air-gapped, privacy-first, and cost-sensitive development workflows."
---

# Plan: Offline-First SLM Profile — Lightweight Mode for Local Models

**Status:** Completed
**Completion:** 100%
**Completed:** 2026-04-11
**Created:** 2026-04-10
**Origin:** Enterprise IP security requirements (air-gapped environments), cost-sensitive solo developers, and the growing adoption of local models (Ollama, LM Studio, llamafile). Current toolkit emits 20K+ token system prompts that exceed SLM context windows and degrade small model performance.
**Estimated Effort:** 4-5 weeks (1 person)

---

## 1. Objective

Create a `--profile offline-slm` install profile and a `scripts/compile_slm.py` compiler that produces a minimal, high-signal instruction set optimized for Small Language Models (8B-32B parameters). The compiled output preserves critical safety guardrails while stripping agent orchestration, multi-agent coordination, and complex skill routing that SLMs cannot handle.

**Key design principles:**
- **Token budget** — compiled output fits within 4K tokens (system prompt), with optional 8K mode for larger SLMs
- **Safety-preserved** — Constitution Articles I-V always included (non-negotiable)
- **Single-agent focus** — no multi-agent orchestration, no /swarm, no /teams
- **Deterministic compilation** — same input → same output, no LLM involved in compilation
- **Model-aware** — detects model size from Ollama API or manual flag and adjusts verbosity
- **Platform-agnostic** — outputs plain markdown consumable by any local inference engine
- **Hooks stripped** — SLM providers don't support lifecycle hooks; rules compile into system prompt

---

## 1a. Functional Requirements

| ID | Requirement | Priority | Success Metric |
|----|-------------|----------|----------------|
| FR1 | Token counter (stdlib-only, ±10% accuracy target) | Must | Conservative estimate, no external deps |
| FR2 | Component parser + scorer with safety-priority ranking | Must | Constitution=1.0, all components scored |
| FR3 | Compression engine with 4 levels (ultra-light, light, standard, extended) | Must | Each level strips progressively less |
| FR4 | Budget packer (greedy knapsack by score/size ratio) | Must | Output ≤ budget × 0.95 in all cases |
| FR5 | Markdown emitter with safety-first structure | Must | Constitution always first in output |
| FR6 | `--profile offline-slm` install integration | Must | `install.py` + `manifest.json` updated |
| FR7 | `compile-slm` CLI command with flags | Must | `--budget`, `--model-size`, `--persona`, `--lang`, `--output`, `--format`, `--dry-run` |
| FR8 | Constitution always included (non-negotiable) | Must | Compilation fails if constitution exceeds budget alone |
| FR9 | Model size detection from Ollama API | Should | Auto-detect with graceful fallback to `14b` |
| FR10 | Persona-aware compilation (boost relevant skills) | Should | Persona skills ranked higher |
| FR11 | Language-aware compilation (include matching rules only) | Should | Non-matching language rules excluded |
| FR12 | Integration guides for 4 platforms (Ollama, LM Studio, Aider, Continue.dev) | Should | Step-by-step setup per platform |
| FR13 | Compile quality validator (post-compilation checks) | Should | FAIL on missing constitution, budget exceeded |
| FR14 | 4 output formats (raw markdown, Ollama Modelfile, JSON string, Aider-compatible) | Should | Each format usable by target tool |
| FR15 | `--dry-run` output showing included components + token counts | Should | Table: component, score, tokens, included? |

---

## 2. Architecture Overview

```
ai-toolkit install --profile offline-slm [--model-size 8b|14b|32b|70b]
ai-toolkit compile-slm [--budget 4096] [--persona backend-lead] [--lang typescript]

  ┌──────────────────────────────────────────────────────────┐
  │              offline-slm Profile                          │
  │                                                          │
  │  Compiler: scripts/compile_slm.py                        │
  │  Input:  full toolkit (agents, skills, rules, constitution)│
  │  Output: single compiled .md file for system prompt       │
  │                                                          │
  │  Token Budget Tiers:                                     │
  │    ultra-light (2K) — safety + persona only              │
  │    light (4K) — safety + persona + top skills + rules    │
  │    standard (8K) — safety + persona + full skills + rules │
  │    extended (16K) — near-full toolkit (for 32B+ models)   │
  │                                                          │
  │  Output Files:                                           │
  │    ~/.softspark/ai-toolkit/compiled/slm-system-prompt.md           │
  │    ~/.softspark/ai-toolkit/compiled/slm-skills-reference.md        │
  │    CLAUDE.md (or equivalent) — auto-generated            │
  │                                                          │
  │  Integration Targets:                                    │
  │    Ollama (modelfile SYSTEM directive)                    │
  │    LM Studio (system prompt field)                       │
  │    llamafile (--system-prompt flag)                       │
  │    Open WebUI (system prompt setting)                    │
  │    Aider (--system-prompt-file flag)                     │
  │    Continue.dev (system prompt in config)                │
  └──────────────────────────────────────────────────────────┘
```

### Compilation Pipeline

```
Full Toolkit (20K+ tokens)
         │
         ▼
  ┌─────────────────┐
  │ 1. Parse Phase   │  Read all agents, skills, rules, constitution
  └────────┬────────┘
           ▼
  ┌─────────────────┐
  │ 2. Rank Phase    │  Score components by: safety criticality × usage frequency × persona relevance
  └────────┬────────┘
           ▼
  ┌─────────────────┐
  │ 3. Compress Phase│  Strip: examples, rationalization tables, related skills, verbose headers
  └────────┬────────┘
           ▼
  ┌─────────────────┐
  │ 4. Budget Phase  │  Pack highest-scoring components until token budget reached
  └────────┬────────┘
           ▼
  ┌─────────────────┐
  │ 5. Emit Phase    │  Write compiled .md + integration instructions
  └─────────────────┘
```

---

## 3. Progress Tracking

| # | Feature | Priority | Status | Est. Time | Notes |
|---|---------|----------|--------|-----------|-------|
| 1.1 | Token counter (tiktoken-free, word-based estimator) | P0 | Proposed | 0.5d | ~0.75 tokens/word heuristic (stdlib only) |
| 1.2 | Component parser + scorer | P0 | Proposed | 2d | Parse frontmatter, score by criticality/frequency/persona |
| 1.3 | Compression engine | P0 | Proposed | 2d | Strip examples, rationalization tables, headers |
| 1.4 | Budget packer | P0 | Proposed | 1d | Greedy knapsack by score/size ratio |
| 1.5 | Emitter (markdown output) | P0 | Proposed | 1d | Clean compiled .md file |
| 2.1 | Profile integration (`--profile offline-slm`) | P0 | Proposed | 1.5d | Install.py + manifest.json + state.json |
| 2.2 | CLI command (`ai-toolkit compile-slm`) | P0 | Proposed | 1d | Standalone compilation with flags |
| 2.3 | Model size detection (Ollama API) | P1 | Proposed | 1d | Auto-detect model params from `ollama list` |
| 2.4 | Persona-aware compilation | P1 | Proposed | 1.5d | Boost persona-relevant skills in ranking |
| 2.5 | Language-aware compilation | P1 | Proposed | 1d | Include only matching language rules |
| 3.1 | Integration guides (Ollama, LM Studio, Aider, Continue) | P1 | Proposed | 1.5d | Step-by-step per platform |
| 3.2 | Compile quality validator | P1 | Proposed | 1d | Verify output covers constitution, fits budget |
| 3.3 | Tests | P1 | Proposed | 3d | Unit: compilation determinism, budget compliance, 4 compression levels × 4 output formats, constitution guard. Integration: `compile-slm --model-size 8b`, verify output fits 2048 tokens + constitution present end-to-end. Target: 40+ tests |
| 3.4 | Documentation | P1 | Proposed | 2.5d | All 9 docs per CLAUDE.md: README, CLAUDE.md, ARCHITECTURE.md, package.json, llms.txt, llms-full.txt, AGENTS.md, skills-catalog.md, architecture-overview.md + integration guide |

**Phasing:**
- **Phase 1 (week 1-2):** Compiler — parser, scorer, compressor, packer, emitter
- **Phase 2 (week 2-3):** Integration — profile, CLI, model detection, persona/language awareness
- **Phase 3 (week 3-4):** Polish — integration guides, validator, tests, documentation

> **Demand validation gate:** Ship Phase 1 + basic Phase 2 (compiler + profile + CLI with `--budget` and `--model-size` flags) as MVP. Test with 3 real models (8B, 14B, 32B). Only build persona/language-aware compilation and platform-specific integration guides if MVP validation confirms output quality.

---

## 4. Dependency Graph

```
                     Phase 1: Compiler (week 1-2)
                     ============================
Token counter (1.1) ────┐
                        ├──► Compression engine (1.3) ──► Budget packer (1.4) ──► Emitter (1.5)
Component parser (1.2) ──┘

                     Phase 2: Integration (week 2-3)
                     ================================
Profile integration (2.1) ──┐
                            ├──► CLI command (2.2)
Model detection (2.3) ──────┤
Persona-aware (2.4) ────────┤
Language-aware (2.5) ────────┘

                     Phase 3: Polish (week 3-4)
                     ===========================
                            ├──► Integration guides (3.1)
                            ├──► Compile validator (3.2)
                            └──► Tests + docs (3.3, 3.4)
```

---

## 5. Detailed Implementation

### Phase 1: Compiler Engine (week 1-2)

#### 1.1 Token Counter

**Stdlib-only token estimation** — no tiktoken, no external dependencies.

```python
def estimate_tokens(text: str) -> int:
    """Estimate token count from text without external dependencies.

    Uses two heuristics and returns the higher (conservative) estimate:
    1. Word-based: ~0.75 tokens/word for English prose
    2. Char-based: ~1 token per 4 chars (more accurate for code-heavy content)

    Accuracy target: ±10% vs tiktoken cl100k_base. To be validated on 50 toolkit files before shipping.
    """
    word_est = int(len(text.split()) * 0.75)
    char_est = len(text) // 4
    # Code blocks have higher token density — adjust
    code_blocks = text.count('```')
    code_penalty = code_blocks * 15
    return max(word_est, char_est) + code_penalty
```

**Why not tiktoken:** tiktoken requires a C extension and network download of the BPE file. This violates the stdlib-only constraint and fails in air-gapped environments (which is literally the target audience for this feature).

**Accuracy target:** ±10% vs tiktoken cl100k_base. Using `max(word, char)` gives a conservative estimate. We pack to budget × 0.95 (5% safety margin) to absorb estimation error.

---

#### 1.2 Component Parser + Scorer

**Parse all toolkit components into a unified scoring table:**

```python
@dataclass
class Component:
    name: str
    type: str          # 'constitution', 'agent', 'skill', 'rule', 'hook-equivalent'
    source_file: str
    full_text: str
    compressed_text: str  # after stripping (populated by compressor)
    tokens_full: int
    tokens_compressed: int
    score: float       # 0.0 - 1.0

    # Scoring factors
    safety_criticality: float   # 0.0-1.0 (constitution=1.0, guard hooks=0.9)
    usage_frequency: float      # 0.0-1.0 (from stats.json, normalized)
    persona_relevance: float    # 0.0-1.0 (match against active persona)
    language_relevance: float   # 0.0-1.0 (match against project language)
```

**Scoring formula:**
```python
score = (
    safety_criticality * 0.40 +    # Safety always dominates — non-negotiable content gets priority
    usage_frequency * 0.25 +       # Frequently used = valuable — from stats.json invocation counts
    persona_relevance * 0.20 +     # Persona-matched = valuable — e.g. backend-lead boosts API skills
    language_relevance * 0.15      # Language-matched = contextual — include only relevant rules
)
# Weight rationale: safety must dominate (0.40) to guarantee constitution + guard rules always fit.
# Usage + persona (0.45 combined) ensure the most practical content fills remaining budget.
# Language (0.15) is a tiebreaker — most projects use 1-2 languages.
# Weights are compile-time constants in v1. If empirical testing (5 standard tasks across
# 3 model sizes) shows suboptimal results, expose as --score-weights flag in v2.
```

**Fixed-score components (always included):**

| Component | Score | Reason |
|-----------|-------|--------|
| Constitution (Articles I-V) | 1.0 | Non-negotiable safety |
| Guard hooks (destructive, path) | 0.95 | Core safety rules (compiled as text, not hooks) |
| Active persona definition | 0.90 | User-selected identity |
| Active language rules | 0.85 | Project-specific quality gates |

**Dynamic-score components:**

| Component | Base Score | Adjusted By |
|-----------|-----------|-------------|
| Individual skills | 0.3-0.7 | Usage frequency + persona fit |
| Agent definitions | 0.2-0.6 | Persona relevance (only 1 agent in SLM mode) |
| Knowledge skills | 0.2-0.5 | Language match + persona match |
| Iron Law rules | 0.7 | Always high (quality enforcement) |

---

#### 1.3 Compression Engine

**Strip low-signal content while preserving semantics:**

| Strip Target | Savings (est.) | Example |
|-------------|---------------|---------|
| `## Common Rationalizations` tables | 200-400 tokens/skill | 15 skills have these tables |
| `## Related Skills` sections | 50-100 tokens/skill | Routing not useful for SLMs |
| `## Verification Checklist` (keep 1-liner summary) | 100-200 tokens/agent | Compress to "Verify: tests pass, no placeholders" |
| Markdown headers (collapse hierarchy) | 20-50 tokens/file | `### 2.1.3 Sub-feature` → plain paragraph |
| Example code blocks (keep first, strip rest) | 100-500 tokens/skill | Keep 1 example max |
| Frontmatter (YAML) | 50-100 tokens/file | Strip entirely from compiled output |
| Agent `## Allowed CLI Commands` lists | 200-400 tokens/agent | Not needed when agent won't execute them |
| Multi-agent coordination instructions | 300-500 tokens | SLM = single agent, no /orchestrate |
| Effort-based budgeting rules | 100 tokens | SLM doesn't manage budgets |

**Compression levels:**

```python
COMPRESSION_LEVELS = {
    'ultra-light': {
        'strip_examples': True,
        'strip_rationalizations': True,
        'strip_related_skills': True,
        'strip_verification': True,
        'strip_agent_commands': True,
        'strip_multi_agent': True,
        'max_skills': 5,         # Only top 5 skills by score
        'max_agents': 0,         # No agent definitions (persona only)
        'include_rules': False,
    },
    'light': {
        'strip_examples': True,
        'strip_rationalizations': True,
        'strip_related_skills': True,
        'strip_verification': 'summary',  # 1-liner
        'strip_agent_commands': True,
        'strip_multi_agent': True,
        'max_skills': 10,
        'max_agents': 1,         # Persona agent only
        'include_rules': True,
    },
    'standard': {
        'strip_examples': 'first-only',  # Keep 1 example
        'strip_rationalizations': True,
        'strip_related_skills': True,
        'strip_verification': 'summary',
        'strip_agent_commands': True,
        'strip_multi_agent': True,
        'max_skills': 20,
        'max_agents': 3,
        'include_rules': True,
    },
    'extended': {
        'strip_examples': 'first-only',
        'strip_rationalizations': 'first-only',
        'strip_related_skills': False,
        'strip_verification': False,
        'strip_agent_commands': False,
        'strip_multi_agent': True,  # Still stripped for SLMs
        'max_skills': 40,
        'max_agents': 5,
        'include_rules': True,
    },
}
```

---

#### 1.4 Budget Packer

**Greedy knapsack algorithm:** Sort components by `score / compressed_tokens` ratio (value density), pack until budget exhausted.

```python
def pack_components(components: list[Component], budget: int) -> list[Component]:
    """Pack highest-value components into token budget."""
    # Fixed components always included (constitution, persona, language rules)
    fixed = [c for c in components if c.score >= 0.85]
    remaining_budget = budget - sum(c.tokens_compressed for c in fixed)

    # Sort remaining by value density
    dynamic = sorted(
        [c for c in components if c.score < 0.85],
        key=lambda c: c.score / max(c.tokens_compressed, 1),
        reverse=True
    )

    packed = list(fixed)
    for comp in dynamic:
        if comp.tokens_compressed <= remaining_budget:
            packed.append(comp)
            remaining_budget -= comp.tokens_compressed

    return packed
```

**Budget validation:** After packing, verify total tokens ≤ budget × 0.95 (5% safety margin for tokenizer estimation error).

**Constitution budget guard:** Before packing dynamic components, verify that fixed components (constitution + persona + language rules) fit within the budget. If `sum(fixed.tokens_compressed) > budget`, fail with: `"Constitution + safety rules alone exceed {budget} token budget. Minimum safe budget: {required}. Use --budget {required} or higher."` This prevents silent omission of safety-critical content.

---

#### 1.5 Emitter

**Output:** Single markdown file structured for maximum SLM comprehension.

```markdown
# AI Coding Assistant — System Instructions

## Safety Rules (MANDATORY)
[Compiled constitution — always first, highest attention position]

## Your Identity
[Compiled persona — who you are, what you focus on]

## Coding Standards
[Compiled language rules — active language only]

## Key Skills
[Top N skill summaries — compressed, actionable]

## Quality Checklist
[Compiled from Iron Laws + verification — bullet points only]
```

**Why this structure:**
- Safety first = maximum attention weight in transformer architecture
- Identity second = establishes persona before task instructions
- Standards = project-specific rules that shape code output
- Skills at end = reference material, lower attention needed

---

### Phase 2: Integration (week 2-3)

#### 2.1 Profile Integration

**manifest.json addition:**
```json
{
  "profiles": {
    "offline-slm": ["core"],
    "offline-slm-extended": ["core", "agents"]
  }
}
```

**Install behavior:**
```bash
ai-toolkit install --profile offline-slm

# What happens:
# 1. Standard install of core components
# 2. Runs compile_slm.py with auto-detected settings
# 3. Writes compiled output to ~/.softspark/ai-toolkit/compiled/
# 4. Generates integration instructions for detected local model tools
# 5. state.json records profile as "offline-slm"
```

**No hooks installed:** SLM providers (Ollama, LM Studio) don't support lifecycle hooks. The critical hook behavior (destructive command guard, path guard) is compiled into the system prompt text as rules.

---

#### 2.2 CLI Command

```bash
ai-toolkit compile-slm                              # auto-detect model, default budget
ai-toolkit compile-slm --budget 4096                 # explicit token budget
ai-toolkit compile-slm --budget 8192 --persona backend-lead  # persona + budget
ai-toolkit compile-slm --model-size 8b               # auto-select budget for 8B model
ai-toolkit compile-slm --model-size 32b              # auto-select budget for 32B model
ai-toolkit compile-slm --lang typescript,python       # include specific language rules
ai-toolkit compile-slm --output ./my-system-prompt.md # custom output path
ai-toolkit compile-slm --dry-run                     # show what would be included + token counts (table format below)

# --dry-run output format:
# Budget: 4096 tokens | Level: light | Persona: backend-lead
# ┌────────────────────────────┬──────────┬────────┬──────────┐
# │ Component                  │ Score    │ Tokens │ Included │
# ├────────────────────────────┼──────────┼────────┼──────────┤
# │ Constitution (Articles I-V)│ 1.00     │ 420    │ YES      │
# │ Persona: backend-lead      │ 0.90     │ 180    │ YES      │
# │ Rule: coding-style         │ 0.85     │ 310    │ YES      │
# │ Skill: /review             │ 0.68     │ 290    │ YES      │
# │ ...                        │ ...      │ ...    │ ...      │
# │ Skill: /deploy             │ 0.22     │ 350    │ NO (budget)│
# └────────────────────────────┴──────────┴────────┴──────────┘
# Total: 3,840 / 4,096 tokens (93.7% utilization)
ai-toolkit compile-slm --format ollama               # output as Ollama Modelfile SYSTEM block
ai-toolkit compile-slm --format json-string          # JSON-escaped string (for config files)
ai-toolkit compile-slm --format raw                  # plain markdown (default)
```

**Model size → budget mapping:**

Note: budget is about *effective instruction following capacity*, not context window. A 128K-context 8B model can *hold* 16K system prompt tokens, but cannot *follow* them reliably. Empirically, SLMs degrade when system prompt exceeds ~10-15% of their effective capacity.

```python
MODEL_BUDGETS = {
    '7b':  {'budget': 2048, 'level': 'ultra-light'},   # Llama 3.1 8B, Mistral 7B
    '8b':  {'budget': 2048, 'level': 'ultra-light'},
    '14b': {'budget': 4096, 'level': 'light'},          # Qwen 2.5 14B, Phi-3 14B
    '32b': {'budget': 8192, 'level': 'standard'},       # Qwen 2.5 32B, Mixtral 8x7B
    '70b': {'budget': 16384, 'level': 'extended'},      # Llama 3.1 70B
}
```

---

#### 2.3 Model Size Detection

**Auto-detect from Ollama:**
```python
def detect_model_size() -> str | None:
    """Detect running model size from Ollama API."""
    try:
        # curl http://localhost:11434/api/tags
        resp = urllib.request.urlopen('http://localhost:11434/api/tags', timeout=2)
        data = json.loads(resp.read())
        models = data.get('models', [])
        if models:
            # Extract parameter count from model name: "llama3.1:8b" → "8b"
            latest = models[0]['name']
            match = re.search(r'(\d+)[bB]', latest)
            if match:
                return match.group(0).lower()
    except (urllib.error.URLError, TimeoutError, json.JSONDecodeError):
        pass
    return None
```

**Fallback:** If no model detected, use `14b` defaults (4K budget, light compression). User can override with `--model-size`.

---

### Phase 3: Polish (week 3-4)

#### 3.1 Integration Guides

Per-platform setup instructions generated by the compiler.

**Ollama:**
```bash
# 1. Compile system prompt
ai-toolkit compile-slm --format ollama --model-size 8b > Modelfile.ai-toolkit

# 2. Create custom model
ollama create my-coder -f Modelfile.ai-toolkit

# 3. Use
ollama run my-coder "implement the payment API"
```

**LM Studio:**
```
1. ai-toolkit compile-slm --model-size 14b
2. Open LM Studio → Chat → System Prompt
3. Paste contents of ~/.softspark/ai-toolkit/compiled/slm-system-prompt.md
```

**Aider:**
```bash
ai-toolkit compile-slm --output .aider.system-prompt.md --model-size 32b
aider --model ollama/qwen2.5-coder:32b --system-prompt-file .aider.system-prompt.md
```

**Continue.dev:**
```bash
# 1. Compile to a local file
ai-toolkit compile-slm --model-size 14b

# 2. In .continue/config.json, paste the compiled content into systemMessage
#    (Continue.dev does not support file references — content must be inline)
#    Use: ai-toolkit compile-slm --format json-string to get escaped output
```

---

#### 3.2 Compile Quality Validator

Post-compilation checks:

| Check | Severity | Description |
|-------|----------|-------------|
| Constitution present | FAIL | Articles I-V must be in output |
| Budget exceeded | FAIL | Token count > budget |
| Persona missing (when specified) | WARN | Persona definition not included |
| No language rules included | WARN | Project language not detected |
| Less than 3 skills included | WARN | Very minimal — may be too sparse |
| Output empty | FAIL | Compilation produced no content |

---

## 6. File Summary

| File | Action | LOC (est.) | Description |
|------|--------|------------|-------------|
| `scripts/compile_slm.py` | CREATE | ~500 | Main compiler — orchestrates pipeline: parse → score → compress → pack → emit. Contains `Component` dataclass, scorer, and budget packer |
| `scripts/slm_token_counter.py` | CREATE | ~50 | Token estimation (stdlib only) — `estimate_tokens()` function used by compiler and validator |
| `scripts/slm_compression.py` | CREATE | ~300 | Compression engine — strip/summarize functions per content type, compression level configs (`COMPRESSION_LEVELS` dict) |
| `scripts/slm_integration.py` | CREATE | ~150 | Platform-specific output formatters — Ollama Modelfile, JSON-escaped string, raw markdown, Aider-compatible |
| `bin/ai-toolkit.js` | EDIT | +10 | Register `compile-slm` command |
| `scripts/install.py` | EDIT | +30 | Handle `--profile offline-slm` |
| `manifest.json` | EDIT | +5 | Add offline-slm profile |
| `kb/reference/offline-slm-guide.md` | CREATE | ~200 | Integration guides for all platforms |
| `tests/test_compile_slm.bats` | CREATE | ~150 | Compilation tests |
| `tests/test_slm_budgets.bats` | CREATE | ~80 | Budget compliance tests |
| **Total** | | **~1575** | |

---

## 6a. Non-Functional Requirements

| Category | Requirement |
|----------|-------------|
| **Performance** | Compilation < 2 seconds. No network calls during compilation (all data local). |
| **Accuracy** | Token estimation ±10% vs tiktoken cl100k_base. Budget compliance: output ≤ budget × 0.95. |
| **Determinism** | Same input (agents, skills, rules, persona, language, budget) → identical output. No randomness. |
| **Security** | Constitution Articles I-V always present in output — compilation fails if they exceed budget alone. |
| **Offline** | Zero network dependencies. Ollama auto-detection gracefully fails to manual fallback. |
| **Portability** | Output is plain markdown — consumable by any tool accepting a system prompt string/file. |
| **Quality gates** | `ruff check scripts/compile_slm.py scripts/slm_*.py` (0 errors), `mypy --strict scripts/compile_slm.py scripts/slm_*.py` (0 errors). Run before every commit. |
| **Type safety** | 100% public API type hints (all function signatures). `Component` dataclass fully typed. Scoring functions use typed parameters, not bare `dict`. |

---

## 6b. Cache Invalidation & Recompile Triggers

Compiled output (`~/.softspark/ai-toolkit/compiled/slm-system-prompt.md`) is a **derived artifact** — it must be recompiled when inputs change:

| Trigger | Action |
|---------|--------|
| `ai-toolkit update` | Auto-recompile if profile is `offline-slm` |
| `ai-toolkit install --profile offline-slm` | Always compile |
| Agent/skill/rule files changed (detected via mtime) | Warn: "Compiled SLM prompt may be stale. Run `ai-toolkit compile-slm`" |
| Manual `ai-toolkit compile-slm` | Always recompile |

Compiled output includes a header comment: `<!-- Compiled: 2026-04-10T10:30:00Z | Budget: 4096 | Level: light | Persona: backend-lead -->` for staleness detection.

---

## 6b-bis. Rollback & Removal

The offline-slm feature is purely additive — removing it is trivial:
1. Delete `scripts/compile_slm.py`, `scripts/slm_token_counter.py`, `scripts/slm_compression.py`, `scripts/slm_integration.py`
2. Remove `"offline-slm"` and `"offline-slm-extended"` from `manifest.json` profiles
3. Remove `compile-slm` from `SCRIPT_COMMANDS` in `bin/ai-toolkit.js`
4. Delete `~/.softspark/ai-toolkit/compiled/` directory (user-side)
5. No hooks, no state files, no config entries to clean up

---

## 6c. Quality Gate Degradation Notice

**Important:** The `offline-slm` profile strips lifecycle hooks because SLM providers don't support them. This means:

- No pre-commit quality check (ruff/tsc/mypy)
- No destructive command interception (guard hooks)
- No session context preservation

Guard hook behavior is **compiled into the system prompt as text rules** — the SLM is *instructed* not to run destructive commands, but unlike hook-based enforcement, this is advisory, not blocking.

Documentation must clearly state: **"SLM mode trades enforcement for guidance. Safety rules are present but not machine-enforced."**

For teams needing enforcement, recommend `--profile offline-slm` combined with a Git pre-commit hook (`.git/hooks/pre-commit`) that runs lint/type-check independently of the AI tool.

---

## 7. Success Criteria (Overall)

| Metric | Target |
|--------|--------|
| Budget tiers | 4 (ultra-light 2K, light 4K, standard 8K, extended 16K) |
| Token budget compliance | 100% (output ≤ budget in all cases) |
| Constitution inclusion | 100% (always present, all 5 articles) |
| Compilation time | < 2 seconds |
| Model size auto-detection | Ollama API (with graceful fallback) |
| Output formats | 4 (raw markdown, Ollama Modelfile, JSON-escaped string, Aider-compatible) |
| Integration guides | 4 platforms (Ollama, LM Studio, Aider, Continue.dev) |
| Persona support | 4 personas (backend-lead, frontend-lead, devops-eng, junior-dev) |
| Language rule support | 13 languages (all existing rules) |
| External dependencies | 0 (stdlib Python only) |
| Tests | 40+ |

---

## 8. Risks and Mitigation

| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| Token estimation too inaccurate | Medium | Medium | Validate against tiktoken on 50 files, target ±10% |
| Compiled prompt too compressed — loses meaning | Medium | High | Validate with 8B model on 5 standard tasks; if quality drops, increase minimum budget |
| Ollama API changes | Low | Low | Graceful fallback to manual `--model-size` |
| User expects full toolkit features with SLM | Medium | Medium | Clear documentation: "SLM mode = safety + coding standards, not multi-agent orchestration" |
| Air-gapped environment can't run `ai-toolkit compile-slm` | Low | Medium | Pre-compile during `install` when network is available; compiled output is self-contained |
| Constitution text changes break compiled cache | Low | Low | Recompile on every `ai-toolkit update` |

---

## 9. Pre-Mortem

1. **"Compiled prompt is too generic"** — without the full agent definitions, the SLM may produce generic code that doesn't match project conventions. Mitigation: Language rules and persona have highest priority after constitution — they provide project-specific context.
2. **"Users expect /review to work with 8B model"** — Skill invocations won't be available via SLM providers that lack hook support. Mitigation: Compiled output includes skill *knowledge* as rules, not invocable commands. Clear docs: "Skills are compiled as coding standards, not slash commands."
3. **"Model detects wrong size"** — Ollama model naming is inconsistent (`llama3.1:8b` vs `codellama:7b-instruct`). Mitigation: regex extracts any `\d+[bB]` pattern; fallback to `14b` if ambiguous.
4. **"Other local inference tools emerge"** — Jan.ai, GPT4All, Tabby, etc. Mitigation: Raw markdown output works with any tool that accepts a system prompt file.
5. **"Nobody uses the feature"** — SLM adoption among professional developers may be niche. Mitigation: Low effort (3-4 weeks), high signaling value ("we support air-gapped environments"), enterprise sales enabler.

---

## 10. Market Positioning

**Target users:**
1. **Enterprise (air-gapped)** — financial services, defense, healthcare firms that cannot send code to cloud LLM APIs
2. **Privacy-conscious solo devs** — developers who don't want code leaving their machine
3. **Cost-sensitive teams** — startups that can't afford Anthropic/OpenAI API costs at scale
4. **Offline-first** — developers working on trains, planes, or in regions with poor connectivity

**Competitive advantage:** No existing AI coding toolkit provides a compilation pipeline that adapts its instruction set to model capacity. This is a first-mover feature.

---

## 11. Next Actions

1. [ ] Approve plan
2. [ ] Implement token counter (1.1)
3. [ ] Implement component parser + scorer (1.2)
4. [ ] Implement compression engine (1.3)
5. [ ] Implement budget packer + emitter (1.4, 1.5)
6. [ ] Integrate profile into install.py + manifest.json (2.1)
7. [ ] Create CLI command `compile-slm` (2.2)
8. [ ] Add Ollama model detection (2.3)
9. [ ] Add persona + language aware compilation (2.4, 2.5)
10. [ ] Write integration guides for 4 platforms (3.1)
11. [ ] Compile quality validator (3.2)
12. [ ] Tests + documentation (3.3, 3.4)

---

## 12. Future

| Feature | Rationale |
|---------|-----------|
| Automatic recompile on file changes (file watcher) | v1 uses manual recompile + staleness warning |
| Quality benchmarks (run 5 standard tasks, measure output quality per budget tier) | Validate compilation quality empirically before shipping |
| Plugin-aware compilation (include memory-pack prompts if installed) | Depends on plugin system maturity |
| Model-specific prompt templates (different SLM families prefer different instruction styles) | Needs empirical testing across model families |
| `--profile offline-slm --enforce-git-hooks` (install git pre-commit hook for quality gates) | Compensates for stripped lifecycle hooks |

---

## 13. Cross-Plan Dependencies

This plan shares modification targets with the Enterprise Config plan:

| Shared File | This Plan | Enterprise Config Plan |
|-------------|-----------|----------------------|
| `scripts/install.py` | +30 LOC (offline-slm profile) | +80 LOC (extends resolution) |
| `manifest.json` | +5 LOC (offline-slm profile) | +10 LOC (schema refs) |
| `bin/ai-toolkit.js` | +10 LOC (compile-slm command) | +40 LOC (config subcommands) |

**If implementing in parallel:** this plan has the smallest changes to shared files — merge first to minimize conflicts.

**Enterprise Config interaction:** If Enterprise Config ships, `compile-slm` should respect the `extends` chain — compile the merged config, not just local. Add a `--ignore-extends` flag for air-gapped environments without access to the base config.

---

**Last Updated:** 2026-04-10

---

## kb/history/completed/output-token-discipline-plan-20260504.md

---
title: "Plan: Output & Token Discipline — Concise Modes, MCP Trim, Token Receipts"
category: planning
service: ai-toolkit
tags:
  - brand-voice
  - output-style
  - mcp
  - statusline
  - token-tracking
  - hooks
  - briefing
doc_type: plan
status: completed
created: "2026-05-04"
last_updated: "2026-05-04"
completed: "2026-05-04"
completion: "100% of v3.2.0 scope (F2 deferred to v4.0 per spike conclusion)"
shipped_in: "v3.2.0"
description: "Three coordinated extensions to ai-toolkit that reduce token usage and surface real cost data: (1) brand-voice output modes for concise/strict Claude responses, (2) MCP description trimmer to compact tool listings before they reach the model, (3) token receipts in statusline reading session JSONL directly. Native extensions, no third-party skill names imported."
---

# Plan: Output & Token Discipline

**Status:** Completed (shipped in v3.2.0 on 2026-05-04)
**Author:** lukasz.krzemien
**Source of inspiration:** external Claude Code plugin observed 2026-05-04 (mechanism only, not naming or branding)
**Spike companion:** [`kb/history/completed/f2-mcp-trim-spike-20260504.md`](f2-mcp-trim-spike-20260504.md)

## Cel

Zmniejszyć realne zużycie tokenów w sesjach Claude Code i dać użytkownikowi widoczność tego zużycia w czasie rzeczywistym. Trzy mechanizmy działające razem, każdy jako natywne rozszerzenie istniejących komponentów ai-toolkit (`brand-voice`, `briefing`, `track-usage.sh`). Bez importowania obcych nazw — adaptacja idei jako własnych.

## Kontekst

Obecnie ai-toolkit ma:

- `brand-voice` skill — pilnuje stylu pisanego (docs, README, content)
- `track-usage.sh` hook — liczy `/skill` invocations do `~/.softspark/ai-toolkit/stats.json`
- `compile_slm.py` — kompresuje cały toolkit dla małych modeli (inny scope)
- 113 skilli, pełen system hooków, doctor, eject

Brakuje:

- Trybu zwięzłego dla *odpowiedzi* Claude'a (brand-voice działa tylko dla pisanej zawartości)
- Kompresji opisów MCP-tooli, które zżerają setki tokenów na każdym wywołaniu
- Realnego pomiaru tokenów per sesja (mamy tylko licznik invocations, nie tokenów)

## Zakres

Trzy features, zaplanowane w kolejności narastającego ryzyka.

---

## Feature 1 — `brand-voice` output modes

### Cel
Rozszerzyć `brand-voice` o tryby zwięzłości stosowane do odpowiedzi konwersacyjnych Claude'a, nie tylko do generowanych dokumentów.

### Decyzje nazewnicze
- **Wybrane:** zostaje `brand-voice` z wewnętrznymi trybami (`default`, `concise`, `strict`)
- **Odrzucone:**
  - `concise` jako osobny skill — duplikuje brand-voice, niepotrzebny rozłam
  - `terse` — niejednoznaczne, kojarzy się z "rude"
  - `output-discipline` — zbyt biurokratyczne

### Pliki

| Ścieżka | Akcja | Cel |
|---------|-------|-----|
| `app/skills/brand-voice/SKILL.md` | edit | Dodaj sekcję `## Output Modes` z opisem trzech trybów i sposobu aktywacji |
| `app/skills/brand-voice/modes/concise.md` | new | Reguły: max 3 zdania per odpowiedź na pytanie zamknięte, brak preamble, brak "I'll now..." |
| `app/skills/brand-voice/modes/strict.md` | new | Reguły: tylko fakty, zero filler adjectives, max 1 zdanie per fakt, listy zamiast prozy |
| `app/skills/brand-voice/scripts/measure.py` | new | Eval przed/po na fixtures, raport oszczędności tokenów |
| `tests/fixtures/output-modes/` | new | 10 par baseline/expected dla różnych typów zadań (debug, review, plan, eksploracja) |
| `tests/skills_brand_voice.bats` | edit | Dodaj asercje dla modes (regex na zakazane filler, max-line-length) |

### Aktywacja

Trzy mechanizmy:

1. Frontmatter w projekcie: `output-mode: concise` w `CLAUDE.md` lub `.claude/settings.json`
2. Slash: `/brand-voice concise` przełącza dla bieżącej sesji (przez `track-usage.sh` zapisuje do session state)
3. Auto-trigger: skill ładuje się także przy długich sesjach generowania (>30 min, heurystyka)

### Success criteria

- Na zestawie 10 fixtures `concise` redukuje output >40% bez utraty kluczowych faktów
- Test asercji: zachowane są wszystkie nazwy plików i symboli z baseline (regex match)
- `validate.py --strict` przechodzi
- `audit_skills.py --ci` zero HIGH

### Estymata
4–6h

---

## Feature 2 — MCP context trim — DEFERRED TO v4.0

**Status:** Deferred. Spike conducted before implementation, conclusion in [`f2-mcp-trim-spike-20260504.md`](f2-mcp-trim-spike-20260504.md).

**Reason:** Claude Code hooks do not expose the MCP `tools/list` response or the system-prompt tool catalog. Hook events (`PreToolUse`, `PermissionRequest`, `Elicitation`) operate on individual tool calls only. Modifying tool descriptions before they reach the model requires a local MCP proxy server — multi-day scope, single-bug-breaks-all-MCP failure mode, out of scope for v3.2.0.

**Resolution:** Full proxy-server approach moved to its own active planning doc: [`kb/planning/mcp-context-trim-v4-prd.md`](../../planning/mcp-context-trim-v4-prd.md). The mid-spike "F2-lite observability tool" alternative was also dropped per user decision (2026-05-04) — v3.2.0 ships F1+F3 only; v4.0 picks up the proxy-server work in full scope.

The compression heuristics, file plan, and risk register from the original Feature 2 design were migrated into the v4.0 PRD. They are no longer duplicated in this archived doc.

---

## Feature 3 — Token receipts w statusline

### Cel
Pokazać realne (nie estymowane) zużycie tokenów per-sesja w statusline Claude Code. Dane czytane z session JSONL, nie z heurystyk.

### Decyzje nazewnicze
- **Wybrane:** rozszerzenie istniejącego skilla `briefing` + nowy hook `statusline-tokens.sh` + nowy skrypt `session_token_stats.py`
- **Odrzucone:**
  - Nowy skill `stats` / `receipts` — duplikuje funkcjonalnie `briefing`
  - Modyfikacja istniejącego `track-usage.sh` jako jedynego punktu — za duża odpowiedzialność jednego pliku

### Pliki

| Ścieżka | Akcja | Cel |
|---------|-------|-----|
| `scripts/session_token_stats.py` | new | Parser JSONL stdlib-only. Funkcje: `read_session()`, `aggregate_by_skill()`, `compare_baseline_vs_concise()` |
| `app/hooks/statusline-tokens.sh` | new | Type `statusLine` w settings.json. Sumuje `usage.input_tokens` + `usage.output_tokens` z bieżącej sesji JSONL |
| `app/hooks/track-usage.sh` | edit | Po wykryciu `/skill` zapisuj też `prompt_tokens` jeśli `transcript_path` dostępne |
| `app/skills/briefing/SKILL.md` | edit | Nowa sekcja "Token receipts", komendy `/briefing --tokens --since 7d`, `/briefing --tokens --share` |
| `scripts/merge-hooks.py` | edit | Statusline injection do `settings.json`, preserve user-customized entries (delivered as F3.5) |
| `app/hooks/ai-toolkit-statusline.sh` | new | Comprehensive statusline (cwd + git + ctx + tokens + cost + model), default install (delivered as F3.5) |
| `tests/session_token_stats.bats` | new | Fixture JSONL z 3 messages, asercje na sumę i breakdown |
| `tests/statusline_tokens.bats` | new | Mock JSONL, weryfikacja outputu hooka (max 80 znaków, brak NaN, fallback gdy brak sesji) |

### Format statusline (proponowany)

```
[ai-toolkit] /concise · session: 24.7k · trend: ↓18%
```

Krótki tryb default, `--verbose` dodaje breakdown per skill.

### Ścieżka odczytu sesji

Claude Code zapisuje JSONL do `~/.claude/projects/<sanitized-cwd>/<session-id>.jsonl`. Każda linia to message z polem `usage` (input_tokens, output_tokens, cache_*). Skrypt:

1. Identyfikuje aktualną sesję z env var `CLAUDE_SESSION_ID` (jeśli istnieje) lub najświeższy plik
2. Parsuje linie ignorując te bez `usage`
3. Sumuje + agreguje per skill (jeśli `track-usage.sh` zapisał skill mapping w sidecar pliku)

### Success criteria

- Statusline pokazuje liczbę tokenów odczytaną z JSONL z dokładnością ±2% vs Anthropic API report (jeśli dostępny)
- Brak crash gdy sesja jeszcze pusta
- Brak crash gdy JSONL malformed
- `validate.py --strict` przechodzi

### Estymata
6–8h

---

## Co nie wchodzi w plan

| Pomysł | Powód odrzucenia |
|--------|------------------|
| Memory/file compressor (`/compress <file>`) | `compile_slm.py` już kompresuje toolkit, brak konkretnego use-case dla per-file |
| Compact `/commit`, `/review` modes | Powstaną automatycznie po Feature 1 (te skille będą używać reguł `concise` mode) |
| "Caveman speak" / classical Chinese mode | Nie pasuje do tonu workmanlike, sprzeczne z brand-voice |
| Single curl-installer | Już mamy `ai-toolkit install` z profilami |

---

## Kolejność realizacji i zależności

```
Feature 1 (brand-voice modes)
    ↓ dostarcza reguły zwięzłości
Feature 3 (token receipts)
    ↓ dostarcza pomiar before/after dla F1
Feature 2 (mcp-trim)  ← spike research najpierw, niezależne od F1/F3
```

**F1 i F3 mogą iść parallel po dokończeniu F1 mode files.**
**F2 ma osobną decyzję go/no-go po spike'u.**

## Estymata zbiorcza

| Feature | Min | Max |
|---------|-----|-----|
| F1 | 4h | 6h |
| F3 | 6h | 8h |
| F2 spike | 1h | 1h |
| F2 implementacja | 0h | 10h |
| **Total** | **11h** | **25h** |

## Doc & test sweep (obowiązkowy po każdym feature)

1. `python3 scripts/validate.py --strict`
2. `python3 scripts/audit_skills.py --ci`
3. Regen `AGENTS.md`: `python3 scripts/generate_agents_md.py > AGENTS.md`
4. Regen `llms.txt`: `python3 scripts/generate_llms_txt.py > llms.txt`
5. Bump version w `package.json` + `plugin.json`
6. Update `skills-catalog.md` z nowymi/zmienionymi skillami
7. Update `README.md`, `CLAUDE.md`, `ARCHITECTURE.md`, `architecture-overview.md` jeśli zmiana behavior
8. Commit conventional: `feat(brand-voice): add output modes`, `feat(briefing): add token receipts`, `feat(mcp-trim): add description trimmer`

## Open questions — resolved

1. **Czy `PreToolUse` może modyfikować deklarację tool'a w MCP listingu?** — NIE. Spike potwierdził że żaden hook event nie wystawia `tools/list`. F2 wymaga MCP proxy. Odsunięte do v4.0.
2. **Czy Claude Code wystawia hookom `CLAUDE_SESSION_ID`?** — częściowo. `scripts/session_token_stats.py` używa fallback "newest JSONL w katalogu projektu" + opcjonalnie cwd → sanitize → match. Działa stabilnie na realnych sesjach (96.6k tokens parsed correctly w smoke tescie).
3. **Czy włączyć `concise` mode jako default?** — pozostaje opt-in. `brand-voice` z trybami auto-loaduje się tylko gdy projekt ustawi `output-mode: concise` w `CLAUDE.md` lub user wpisze `/brand-voice concise`. Pomiary z F3 dadzą dane do późniejszej decyzji.

## Final delivery (v3.2.0)

### Shipped

| Feature | Outcome | Pliki |
|---------|---------|-------|
| **F1 — brand-voice output modes** | Done. Aggregate ratio na 3 fixtures: concise **21%**, strict **14%** (cel ≤60% / ≤40%). | `app/skills/brand-voice/SKILL.md`, `app/skills/brand-voice/modes/{concise,strict}.md`, `app/skills/brand-voice/scripts/measure.py`, `tests/fixtures/output-modes/{debug-explanation,plan-question,review-summary}/`, `tests/test_brand_voice.bats` (14 tests) |
| **F3 — token receipts** | Done. Smoke-test na realnej sesji: 96.6k tokenów poprawnie sparsowane. | `scripts/session_token_stats.py`, `tests/fixtures/session-jsonl/{three-messages,malformed,empty}.jsonl`, `tests/test_session_token_stats.bats` (15 tests) |
| **F3.5 — comprehensive default statusline** | Done. Pełny segment: cwd + git + ctx% + tokens + trend + model-aware cost + model. Installed by default via `merge-hooks.py`, user-custom statusLine preserved untouched. | `app/hooks/ai-toolkit-statusline.sh`, `app/hooks.json` (`statusLine` entry), `scripts/merge-hooks.py` (statusLine inject/strip), `tests/test_statusline_hook.bats` (14 tests), `tests/test_merge_hooks_statusline.bats` (8 tests) |
| **briefing skill extension** | `/briefing --tokens` + wire-up docs + opt-out env vars. | `app/skills/briefing/SKILL.md` |

### Deviations from plan

| Plan said | Shipped | Reason |
|-----------|---------|--------|
| 10 fixtures w F1 | 3 fixtures + `must_contain.txt` mechanism | Mniejszy zestaw + extensible konwencja wystarcza do walidacji budżetów; jakość > ilość |
| F3 hook integracja jako follow-up (manual settings.json edit) | F3.5 dostarczył pełny default install via `merge-hooks.py` | User feedback w trakcie pracy: "niech ai-toolki instaluje go domyslnie od nowej wersji" |
| F2 implementacja po spike'u | F2 deferred do v4.0 | Spike pokazał że Claude Code hooki nie wystawiają `tools/list` → wymaga proxy server, multi-day scope |

### Quality gates passed

- `validate.py --strict`: 0 errors / 0 warnings
- `audit_skills.py --ci`: 0 HIGH / 0 WARN / 13 INFO (pre-existing)
- `audit_skills.py --sarif`: SARIF 2.1.0 valid, 5 rules
- `npm test`: 1032 / 1032 passing (was 981 in v3.1.1)
- Registry drift: clean
- Provenance + checksum-pin: verified
- Ecosystem doctor: 9 cosmetic drifts (class A) refreshed

### Skill classification change

`brand-voice` przeszedł `user-invocable: false → true` (knowledge → hybrid):

- **Hybrid**: 31 → 32
- **Knowledge**: 49 → 48
- **Task**: 32 (no change)

Updated: `README.md`, `kb/reference/architecture-overview.md`, `kb/reference/skills-catalog.md`.

## Status & rewizje

| Data | Zmiana | Autor |
|------|--------|-------|
| 2026-05-04 | Initial draft | lukasz.krzemien |
| 2026-05-04 | F1 implementation done — brand-voice modes, measure.py, 3 fixtures, 14 bats tests. Aggregate ratio: concise 21%, strict 14%. | claude |
| 2026-05-04 | F3 implementation done — `session_token_stats.py`, `statusline-tokens.sh`, briefing skill extension. 22 new bats tests. Smoke-tested on real session. | claude |
| 2026-05-04 | F3.5 follow-up done — replaced focused tokens hook with comprehensive `ai-toolkit-statusline.sh`. Extended `merge-hooks.py` for safe statusLine injection. Version bumped 3.1.1 → 3.2.0. 22 new bats tests. CHANGELOG + README updated. | claude |
| 2026-05-04 | F2 spike completed — see [`f2-mcp-trim-spike-20260504.md`](f2-mcp-trim-spike-20260504.md). Hooks cannot modify `tools/list`. F2 deferred to v4.0 with own PRD. | claude |
| 2026-05-04 | Plan archived to `kb/history/completed/`. Shipped in v3.2.0. | claude |

---

## kb/howto/README.md

---
title: "How-To Guides"
service: ai-toolkit
category: howto
tags: [howto, guides]
last_updated: "2026-03-25"
---

# How-To Guides

Step-by-step guides for common tasks. Guides will be added here as they are created.

---

## kb/planning/cloud-security-pack-plan.md

---
title: "Plan: Cloud Security Pack — Multi-Cloud Audit (GCP/AWS/Azure)"
category: planning
service: ai-toolkit
tags:
  - cloud-security
  - gcp
  - aws
  - azure
  - firebase
  - plugin-pack
  - security-audit
  - credentials
doc_type: plan
status: proposed
created: "2026-04-10"
last_updated: "2026-04-10"
completion: "0%"
council_review: "2026-04-10 — conditional FOR, scope reduction recommended"
description: "Plugin pack for deterministic, read-only security auditing of GCP (Firebase/Cloud Functions), AWS (S3/Lambda/IAM), and Azure (NSG/Functions/CosmosDB). Includes CLI credential management, static+live modes, false positive resolution, SARIF output, incremental scanning, and CI integration."
---

# Plan: Cloud Security Pack — Multi-Cloud Audit

**Status:** Proposed
**Completion:** 0%
**Created:** 2026-04-10
**Origin:** Firebase RTDB/Firestore rules audit, Cloud Functions public exposure, false positive resolution for App Check/Gateway patterns
**Estimated Effort:** 5-6 weeks (council-revised from original 3-4 weeks)

---

## 1. Objective

Create `cloud-security-pack` plugin pack that provides deterministic, read-only security auditing for three major cloud providers. All scripts are stdlib-only Python with zero external dependencies. The pack includes a dedicated agent, multiple scan scripts, CLI credential management, and CI pipeline integration.

**Key design principles:**
- **Read-only** — never modifies cloud resources, only reads state
- **Deterministic** — reproducible results, no LLM-driven regex (same pattern as `hipaa_scan.py`)
- **False positive aware** — context graph resolves "public endpoint behind gateway/App Check/WAF"
- **CI-ready** — `--output json` + `--output sarif` (SARIF v2.1.0 for GitHub Advanced Security), exit code 1 on HIGH, 0 otherwise
- **Credential isolation** — keys stored in `~/.softspark/ai-toolkit/credentials/`, accessible only by this pack's scripts
- **Static-first** — static mode (no credentials) is the default, live mode is opt-in upgrade
- **Incremental** — `--changed` flag scans only files modified since last commit (PR workflow)
- **IaC via `terraform show -json`** — wraps Terraform's own JSON output instead of parsing HCL directly

---

## 2. Architecture Overview

```
ai-toolkit credentials add gcp --file ~/sa-viewer.json
ai-toolkit credentials add aws --profile my-audit-profile
ai-toolkit credentials add azure --subscription abc-123

  ┌──────────────────────────────────────────────────────┐
  │              cloud-security-pack                      │
  │                                                      │
  │  Agent: cloud-security-auditor                       │
  │  Tools: Read, Grep, Glob, Bash (read-only commands)  │
  │                                                      │
  │  Skills:                                             │
  │    /cloud-security-audit          (orchestrator)     │
  │    /firebase-rules-audit          (GCP: rules)       │
  │    /cloud-functions-audit         (GCP: CF + IAM)    │
  │    /aws-security-audit            (AWS: S3/Lambda)   │
  │    /azure-security-audit          (Azure: NSG/Fn)    │
  │                                                      │
  │  Scripts (stdlib Python, zero deps):                 │
  │    gcp_auth.py                    (credential helper)│
  │    firebase_rules_scan.py         (static parser)    │
  │    cloud_functions_audit.py       (CF IAM + context) │
  │    aws_security_scan.py           (S3/Lambda/IAM)    │
  │    azure_security_scan.py         (NSG/Fn/RBAC)      │
  │    false_positive_resolver.py     (context graph)    │
  │    sarif_formatter.py             (SARIF v2.1.0)     │
  │    incremental.py                 (git diff filter)  │
  │                                                      │
  │  Modes:                                              │
  │    --static    (no credentials, parse IaC/source)    │
  │    --live      (credentials, deployed state)         │
  │    --output json|sarif  (CI pipeline)                │
  │    --changed <ref>  (incremental, static only)       │
  │    --explain <id>   (remediation lookup)             │
  └──────────────────────────────────────────────────────┘

### YAML Parsing Constraint (BLOCKER)

Python stdlib has NO YAML parser. This affects AWS static mode:
- CloudFormation templates (`.yaml`) — YAML
- `serverless.yml` — YAML
- SAM templates (`template.yaml`) — YAML

**Decision: JSON-only for static IaC parsing.** Rationale:
1. CloudFormation supports both JSON and YAML — JSON variant parseable with `json` module
2. `terraform show -json` outputs JSON — the primary Terraform path
3. `serverless.yml` → recommend users run `sls print --format json` to convert
4. Adding `pyyaml` breaks the stdlib-only constraint for the entire toolkit

**Practical impact:** ~30% of CloudFormation users use JSON, ~70% YAML. For YAML users, live mode (`--mode live`) still works (queries APIs directly, no file parsing needed). The `--explain` output will suggest `sls print --format json` conversion.

**v2 option:** Ship a vendored minimal YAML subset parser (~150 LOC, handles flat key-value and simple nested maps — enough for security-relevant fields like `Principal`, `Effect`, `authLevel`).
```

---

## 3. Progress Tracking

| # | Feature | Priority | Status | Est. Time | Notes |
|---|---------|----------|--------|-----------|-------|
| 1.1 | CLI `credentials` command (add/list/remove/test) | P0 | Proposed | 2d | 0600 perms, allowlist wrapper |
| 1.1b | `credentials init` interactive wizard | P1 | Proposed | 1.5d | **Deferred to Milestone 2** (orchestration-review: saves 1.5d in M1 critical path) |
| 1.2 | `cloud-security-auditor` agent | P0 | Proposed | 1d | Agent definition |
| 1.3 | SARIF + incremental scan infrastructure | P0 | Proposed | 2d | `--output sarif`, `--changed` flag |
| 2.1 | `firebase-rules-audit` skill + script | P0 | Proposed | 4-5d | Recursive descent parser (orchestration-review: +1d vs regex) |
| 2.2 | `cloud-functions-audit` skill + script | P0 | Proposed | 3-4d | CF IAM + App Check context |
| 2.3 | False positive resolver (GCP context) | P0 | Proposed | 3-4d | Context graph engine — GCP only (~8 code paths). **+2d per provider** in later milestones (orchestration-review) |
| 3.1 | `aws-security-audit` skill + script | P1 | Proposed | 4-5d | S3/Lambda/IAM/SG, `terraform show -json` |
| 3.2 | `/cloud-security-audit` orchestrator + plugin.json | P1 | Proposed | 3d | Multi-provider orchestration + pack |
| 4.1 | `azure-security-audit` skill + script | P1 | Proposed | 4-5d | NSG/Functions/RBAC |
| 5.1 | Tests + CI integration docs | P1 | Proposed | 3d | Tests + SARIF + pipeline examples |
| 5.2 | Documentation (kb/) | P2 | Proposed | 1d | Checklists, patterns, KB |

**Phasing (full delivery, all 3 providers):**
- **Phase 1 (week 1-2):** Foundation + GCP — credentials CLI, agent, SARIF/incremental infra, firebase-rules, cloud-functions, false positive resolver (GCP context)
- **Phase 2 (week 3-4):** AWS + orchestrator — AWS security audit, `terraform show -json`, orchestrator skill, plugin pack, `credentials init`
- **Phase 3 (week 5-6):** Azure + polish — Azure security audit, false positive resolver (Azure context), full test suite, documentation

---

## 4. Dependency Graph

```
                         Phase 1: Foundation + GCP (week 1-2)
                         ====================================
credentials CLI (1.1) ─────┐
                            ├──► firebase-rules-audit (2.1)
SARIF + incremental (1.3) ──┤
                            ├──► cloud-functions-audit (2.2) ──► false-positive-resolver (2.3)
agent definition (1.2) ─────┘

                         Phase 2: AWS + Orchestrator (week 3-4)
                         ======================================
credentials init (1.1b) ───┐
                            ├──► aws-security-audit (3.1) ─────► false-positive-resolver (+AWS context)
                            └──► orchestrator skill + plugin.json (3.2)

                         Phase 3: Azure + Polish (week 5-6)
                         ==================================
                            ├──► azure-security-audit (4.1) ───► false-positive-resolver (+Azure context)
                            └──► tests + docs (5.1, 5.2)
```

**All 3 providers ship.** No conditional gates — full delivery in 6 weeks.

---

## 5. Detailed Implementation

### Faza 1: Foundation (tydzien 1)

#### 1.1 CLI `credentials` Command

**Purpose:** Secure credential storage for cloud provider API access. Credentials live outside any project directory and are only accessible by this pack's scripts.

**CLI interface:**
```bash
# GCP — Service Account JSON
ai-toolkit credentials add gcp --file ~/sa-viewer.json
ai-toolkit credentials add gcp --file ~/sa-viewer.json --project my-project-id

# GCP — use existing gcloud session (no file needed)
ai-toolkit credentials add gcp --gcloud --project my-project-id

# AWS — named profile (reads from ~/.aws/credentials)
ai-toolkit credentials add aws --profile audit-readonly
ai-toolkit credentials add aws --profile audit-readonly --region eu-west-1

# AWS — explicit keys (interactive, never on CLI args)
ai-toolkit credentials add aws --interactive

# Azure — subscription
ai-toolkit credentials add azure --subscription abc-123-def

# Azure — use existing az login session
ai-toolkit credentials add azure --az-cli

# Interactive guided setup (reduces onboarding from 4 steps to 1)
ai-toolkit credentials init           # auto-detect provider, interactive wizard
ai-toolkit credentials init --provider gcp  # skip auto-detect, go straight to GCP setup

# Management
ai-toolkit credentials list
ai-toolkit credentials remove gcp
ai-toolkit credentials remove aws
ai-toolkit credentials remove azure
ai-toolkit credentials test gcp       # verify read-only access works
ai-toolkit credentials test aws
```

**Storage structure:**
```
~/.softspark/ai-toolkit/
  credentials/
    gcp.json              # SA key file (copied, chmod 0600)
    gcp.meta.json         # { project_id, added_at, method: "file"|"gcloud" }
    aws.json              # { profile, region, method: "profile"|"keys" }
    azure.json            # { subscription_id, method: "subscription"|"az-cli" }
```

**`credentials init` interactive flow:**
1. Auto-detect providers from project files (`firebase.json` → GCP, `*.tf` with `provider "aws"` → AWS, etc.)
2. If multiple detected → ask user: "Found GCP and AWS markers. Which provider to configure first? [gcp/aws/both]"
3. Per provider:
   - GCP: "Do you have a Service Account JSON file? [y/n]" → if yes: ask path → if no: "Run `gcloud auth application-default login` and we'll use that"
   - AWS: "Do you have a named profile in ~/.aws/credentials? [y/n]" → if yes: ask profile name → if no: "Run `aws configure` first"
   - Azure: "Do you have an active `az login` session? [y/n]" → if yes: ask subscription ID → if no: "Run `az login` first"
4. Run `credentials test` automatically after setup
5. Generate `.cloud-security.json` scaffold with detected context

**Security requirements:**
- All credential files: `chmod 0600` (owner read/write only)
- Never log credential contents to stdout/stderr
- `credentials test` validates:
  - Connection works (GCP: `gcloud auth list`, AWS: `aws sts get-caller-identity`, Azure: `az account show`)
  - SA/role has **only read permissions** — **REFUSE to store** if write access detected (orchestration-review: warn-only is ignored by users). Override: `--force` flag with explicit acknowledgment
  - Project/subscription exists
- `.gitignore`-proof — lives in `~/.softspark/ai-toolkit/`, never in project directory
- Scripts access credentials via `gcp_auth.py` helper — single entry point, no direct file reads

**Files to create/modify:**

| File | Action | Description |
|------|--------|-------------|
| `scripts/credentials_cli.py` | CREATE | CLI: add/list/remove/test credentials |
| `bin/ai-toolkit.js` | EDIT | Register `credentials` subcommand |
| `tests/test_credentials.bats` | CREATE | Tests: add, remove, permissions, test |

**Success Criteria:**
- [ ] `credentials add gcp --file` copies and secures SA key
- [ ] `credentials add aws --profile` stores profile reference
- [ ] `credentials add azure --subscription` stores subscription
- [ ] `credentials test` validates read access + warns on write perms
- [ ] `credentials list` shows providers without exposing secrets
- [ ] `credentials remove` cleans up securely
- [ ] All files created with 0600 permissions
- [ ] Tests: >= 8

---

#### 1.3 SARIF Output + Incremental Scanning (Council addition)

**SARIF v2.1.0 output** — industry standard consumed by GitHub Advanced Security (inline PR annotations), VS Code SARIF Viewer, Azure DevOps, and SonarQube.

```bash
/cloud-security-audit --output sarif > results.sarif

# GitHub Actions: upload SARIF for inline PR annotations
- uses: github/codeql-action/upload-sarif@v3
  with:
    sarif_file: results.sarif
```

**SARIF structure (per finding):**
```json
{
  "$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/main/sarif-2.1/schema/sarif-schema-2.1.0.json",
  "version": "2.1.0",
  "runs": [{
    "tool": { "driver": {
      "name": "cloud-security-audit", "version": "1.0.0",
      "rules": [{ "id": "GCP-CF-001", "shortDescription": { "text": "Public Cloud Function invoker" }, "helpUri": "https://cloud.google.com/functions/docs/securing" }]
    } },
    "results": [{
      "ruleId": "GCP-CF-001",
      "level": "error",
      "message": { "text": "Cloud Function 'adminEndpoint' has allUsers invoker with no protection layer" },
      "locations": [{ "physicalLocation": { "artifactLocation": { "uri": "functions/src/admin.ts" }, "region": { "startLine": 42 } } }],
      "properties": { "resolved_severity": "HIGH", "context_chain": [], "provider": "gcp" }
    }]
  }]
}
```

**Incremental scan mode** — scan only changed files since last commit:

```bash
/cloud-security-audit --changed              # files changed vs HEAD~1
/cloud-security-audit --changed HEAD~5       # files changed in last 5 commits
/cloud-security-audit --changed main         # files changed vs main branch (PR workflow)
```

Implementation: use `git diff --name-only <ref>` to get changed files, filter to relevant extensions (.rules, .tf, .json, .ts, .py, .bicep), scan only those. Falls back to full scan if no git repo detected.

**`--changed` applies to static mode ONLY.** Live mode always scans all deployed resources (it queries cloud APIs, not files). If user passes `--changed` with `--mode live`, emit warning: "Incremental scan applies to static analysis only. Live mode will scan all resources." and proceed with full live scan.

**`--explain` flag** — detailed remediation for a specific finding:

```bash
/cloud-security-audit --explain GCP-CF-001
# Output: what the finding means, why it matters, exact fix steps,
# links to GCP documentation, CIS Benchmark reference
```

**Files:**

| File | Action | Description |
|------|--------|-------------|
| `app/skills/cloud-security-audit/scripts/sarif_formatter.py` | CREATE | SARIF v2.1.0 output |
| `app/skills/cloud-security-audit/scripts/incremental.py` | CREATE | Git diff + file filter |
| `app/skills/cloud-security-audit/reference/rule-explanations.json` | CREATE | Per-rule remediation guides |

**Success Criteria:**
- [ ] `--output sarif` produces valid SARIF v2.1.0 JSON with `driver.rules[]` array (orchestration-review: GitHub silently drops annotations without rule metadata)
- [ ] SARIF upload to GitHub Advanced Security works (inline PR annotations)
- [ ] SARIF `level` mapping: HIGH→`error`, WARN→`warning`, INFO→`note`
- [ ] `--changed main` scans only PR-changed files
- [ ] `--explain <rule-id>` shows detailed remediation
- [ ] Tests: >= 6

---

#### 1.2 Agent Definition: `cloud-security-auditor`

**File:** `app/agents/cloud-security-auditor.md`

```markdown
---
name: cloud-security-auditor
description: "Multi-cloud security auditor (GCP/AWS/Azure). Read-only deterministic scans
  for IAM, network, storage, serverless, and compliance. False positive resolution
  via security context graph."
model: opus
color: red
tools: Read, Grep, Glob, Bash
skills: security-patterns, cloud-security-audit
---

# Cloud Security Auditor Agent

You are the **Cloud Security Auditor**. You perform read-only security assessments
across GCP, AWS, and Azure. You never modify cloud resources.

## Core Philosophy
**"Read everything, change nothing. Context before verdict."**

## Mandatory Protocol
Before any audit:
1. Check credentials: `ai-toolkit credentials test <provider>`
2. Determine mode: `--static` (IaC/source only) or `--live` (deployed state)
3. Run deterministic scripts first, then interpret results

## Responsibilities

### 1. Static Analysis (no credentials needed)
- Parse IaC: Terraform (.tf), CloudFormation (.yaml/.json), Bicep (.bicep)
- Parse Firebase rules: firestore.rules, database.rules.json
- Parse source: Cloud Functions (onCall vs onRequest), Lambda handlers, Azure Functions
- Parse configs: firebase.json, serverless.yml, sam-template.yaml

### 2. Live Analysis (credentials required, READ-ONLY)
- GCP: `gcloud` CLI commands (list, describe, get-iam-policy)
- AWS: `aws` CLI commands (s3api, lambda, iam, ec2 — get/list/describe only)
- Azure: `az` CLI commands (network nsg, functionapp, cosmosdb — list/show only)

### 3. False Positive Resolution
Build security context graph before rendering verdict:
- Public endpoint → check: API Gateway? WAF? App Check? CDN?
- Open port → check: behind Load Balancer? VPN? private subnet?
- Broad IAM role → check: scoped to specific resource? temporary?

## Allowed CLI Commands (WHITELIST — read-only only)

### GCP
- gcloud functions list / describe / get-iam-policy
- gcloud projects get-iam-policy
- gcloud app-check services list
- gcloud firestore indexes list
- gcloud compute firewall-rules list
- gcloud run services list / describe
- firebase apps:list

### AWS
- aws s3api get-bucket-policy / get-bucket-acl / get-public-access-block
- aws lambda get-policy / get-function-configuration / list-functions
- aws iam list-roles / list-policies / get-role / get-policy-version
- aws ec2 describe-security-groups / describe-network-acls
- aws apigateway get-rest-apis / get-resources
- aws elbv2 describe-load-balancers / describe-listeners

### Azure
- az network nsg list / show / rule list
- az functionapp list / show / config show
- az cosmosdb list / show / keys list
- az role assignment list
- az webapp show / config show
- az network application-gateway list

### NEVER ALLOWED
- Any create/update/delete/put/set/deploy/push command
- Any command that modifies state
- `gcloud auth activate-service-account` (credential pivot)
- `aws sts assume-role` (lateral movement)
- `az login` (session hijack)
- `terraform apply` / `terraform destroy`

## Output Format
(see skill SKILL.md for detailed format)
```

**SECURITY: Programmatic Bash Allowlist (orchestration-review P0 BLOCKER)**

The read-only whitelist above lives in the agent's system prompt. An LLM can be prompt-injected via malicious IaC files or source code comments. **The prompt-only approach is insufficient.**

**Required:** A shell wrapper script (`scripts/cloud_security_allowlist.sh`) that validates every CLI invocation against a compiled allowlist regex BEFORE execution, independent of the LLM:

```bash
#!/bin/bash
# cloud_security_allowlist.sh — wraps Bash calls from cloud-security-auditor
# Rejects any command not matching read-only patterns

ALLOWED_PATTERNS=(
  '^gcloud (functions|projects|app-check|firestore|compute|run) (list|describe|get-iam-policy|indexes)'
  '^gcloud auth list$'
  '^firebase apps:list'
  '^aws (s3api|lambda|iam|ec2|apigateway|elbv2|sts|cloudfront) (get-|list-|describe-|generate-credential-report)'
  '^aws sts get-caller-identity$'
  '^az (network|functionapp|cosmosdb|role|webapp|storage) (list|show|rule list|config show|assignment list|account show)'
  '^az account show$'
  '^terraform show -json'
  '^git diff --name-only'
  '^python3 .*/scripts/.*\.(py)$'
)

CMD="$*"
for pattern in "${ALLOWED_PATTERNS[@]}"; do
  if [[ "$CMD" =~ $pattern ]]; then
    exec $CMD
  fi
done

echo "BLOCKED: Command not in read-only allowlist: $CMD" >&2
exit 1
```

The agent's `allowed-tools` in SKILL.md references this wrapper instead of raw Bash. All cloud CLI calls go through the allowlist.

**`terraform plan` risk (orchestration-review):** `terraform plan -json` executes providers and provisioners. A malicious `.tf` file with `local-exec` provisioner runs arbitrary code during plan. **Decision: use `terraform show -json` ONLY (reads existing state), NOT `terraform plan -json`.** Document this prominently.

**Files:**

| File | Action | Description |
|------|--------|-------------|
| `app/skills/cloud-security-audit/scripts/cloud_security_allowlist.sh` | CREATE | Bash allowlist wrapper |

**Success Criteria:**
- [ ] Agent file created in `app/agents/`
- [ ] Read-only command whitelist documented
- [ ] NEVER ALLOWED section explicit
- [ ] Programmatic Bash allowlist enforced (not just prompt)
- [ ] `terraform plan` excluded — only `terraform show -json` allowed
- [ ] Allowlist tested: blocked commands return exit 1

---

### Faza 2: GCP / Firebase (Phase 1, tydzien 1-2)

#### 2.1 Skill: `firebase-rules-audit`

**Purpose:** Static analysis of Firestore rules and RTDB rules. No credentials needed.

**What it scans:**

| Check | Severity | Description |
|-------|----------|-------------|
| `allow read, write: if true` | HIGH | World-readable/writable collection |
| `allow read: if true` without `write` guard | WARN | Public read — may be intentional |
| `allow write: if request.auth != null` without field validation | WARN | Authenticated but no field-level validation |
| Missing `request.resource.data` validation on writes | WARN | No schema enforcement |
| Wildcard collection `/{document=**}` with broad permissions | HIGH | Recursive wildcard + open access |
| RTDB `.read: true` or `.write: true` at root | HIGH | Entire database public |
| RTDB `.read: "auth != null"` without path scoping | WARN | All authenticated users can read everything |
| Timestamp/TTL rules missing for sensitive collections | WARN | No data lifecycle enforcement |
| `get()` / `exists()` cross-collection reads without auth check | WARN | Privilege escalation via rule chaining |
| Rules file > 256KB (approaching Firebase limit) | WARN | May hit deployment limit |

**Script:** `scripts/firebase_rules_scan.py`
- Parses `firestore.rules` via **recursive descent parser** (not regex — orchestration-review P1)
  - Grammar: ~8 production rules (service, match, allow, function, condition)
  - Tracks: brace depth, current match path, accumulated allow blocks
  - Handles: nested `match` blocks, multi-line conditions with `&&`/`||`, custom `function` declarations
  - Estimated: 550-650 LOC for parser + check logic
  - Unsupported (documented): CEL ternary expressions, complex `get()`/`exists()` chains with computed paths
- Parses `database.rules.json` (JSON rules — stdlib `json` module)
- Outputs findings as JSON or text
- Exit code 1 on HIGH, 0 otherwise
- Supports `.cloud-security-ignore` for suppressions

**Reference file:** `reference/firebase-rules-patterns.md` — safe/unsafe patterns with examples

**Files:**

| File | Action | Description |
|------|--------|-------------|
| `app/skills/firebase-rules-audit/SKILL.md` | CREATE | Skill definition |
| `app/skills/firebase-rules-audit/scripts/firebase_rules_scan.py` | CREATE | Deterministic scanner |
| `app/skills/firebase-rules-audit/reference/firebase-rules-patterns.md` | CREATE | Safe/unsafe patterns |

**Success Criteria:**
- [ ] Parses `firestore.rules` — detects 10+ check patterns
- [ ] Parses `database.rules.json` — detects root-level open access
- [ ] `--output json` for CI
- [ ] `.cloud-security-ignore` support
- [ ] Tests: >= 10 (one per check pattern + edge cases)

---

#### 2.2 Skill: `cloud-functions-audit`

**Purpose:** Audit Cloud Functions permissions and detect false positives.

**Static mode (no credentials):**

| Check | Severity | Description |
|-------|----------|-------------|
| `onRequest` handler without auth middleware | WARN | Potentially public — needs context |
| `onCall` handler (inherently authenticated) | INFO | Informational — callable is auth'd |
| Hardcoded API keys / secrets in source | HIGH | Secrets in code |
| CORS `origin: '*'` in CF handler | WARN | Unrestricted CORS |
| Missing rate limiting patterns | WARN | No throttling on public endpoint |
| `functions.https.onRequest` + no `validateFirebaseIdToken` | WARN | HTTP function without Firebase Auth check |

**Live mode (credentials required):**

| Check | Severity | CLI Command | Description |
|-------|----------|-------------|-------------|
| `allUsers` invoker on CF | CONTEXT | `gcloud functions get-iam-policy` | Public — resolve with context graph |
| `allAuthenticatedUsers` invoker | WARN | `gcloud functions get-iam-policy` | Any Google account can invoke |
| App Check enforcement status | CONTEXT | `gcloud app-check services list` | Feeds into false positive resolution |
| Deployed rules vs local rules diff | WARN | `gcloud firestore indexes` + local | Rules drift detection |
| Cloud Run public ingress | CONTEXT | `gcloud run services describe` | Public — resolve with context graph |
| Overly broad IAM roles on SA | HIGH | `gcloud projects get-iam-policy` | CF service account with editor/owner |

**Script:** `scripts/cloud_functions_audit.py`

**Files:**

| File | Action | Description |
|------|--------|-------------|
| `app/skills/cloud-functions-audit/SKILL.md` | CREATE | Skill definition |
| `app/skills/cloud-functions-audit/scripts/cloud_functions_audit.py` | CREATE | Scanner |
| `app/skills/cloud-functions-audit/scripts/gcp_auth.py` | CREATE | Credential loader |
| `app/skills/cloud-functions-audit/reference/false-positives-gcp.md` | CREATE | False positive patterns |

**Success Criteria:**
- [ ] Static: parses CF source for auth patterns
- [ ] Live: checks IAM bindings via `gcloud`
- [ ] False positive resolution for App Check + Gateway patterns
- [ ] Tests: >= 8

---

#### 2.3 False Positive Resolver

**Purpose:** Central engine that resolves "is this actually a problem?" by building a security context graph.

**How it works:**
```
Input: Finding { resource, severity, type }

Step 1: Gather context
  ├── Check API Gateway routes (firebase.json rewrites, API Gateway configs)
  ├── Check WAF/CDN (Cloudflare, CloudFront, Azure Front Door)
  ├── Check App Check / AppArmor / Shield
  ├── Check callable vs HTTP function type
  ├── Check VPC / private subnet placement
  └── Check Load Balancer + auth middleware

Step 2: Apply resolution rules
  ├── Public CF + App Check ENFORCED → SUPPRESSED (protected)
  ├── Public CF + API Gateway route → SUPPRESSED (gateway handles auth)
  ├── Public CF + onCall() → SUPPRESSED (callable is auth'd by SDK)
  ├── Public S3 + CloudFront OAI → SUPPRESSED (not directly accessible)
  ├── Open SG port + ALB → SUPPRESSED (ALB handles TLS + auth)
  ├── Open NSG + Application Gateway → SUPPRESSED (WAF handles filtering)
  └── No context found → KEEP ORIGINAL SEVERITY

Step 3: Output
  ├── Original severity
  ├── Resolved severity (SUPPRESSED / DOWNGRADED / CONFIRMED)
  ├── Context chain (what protections were found)
  └── Confidence (high if multiple protections, low if single)
```

**Output example:**
```json
{
  "resource": "processPayment",
  "provider": "gcp",
  "type": "cloud-function-public-invoker",
  "original_severity": "HIGH",
  "resolved_severity": "SUPPRESSED",
  "confidence": "high",
  "context_chain": [
    { "layer": "app_check", "status": "ENFORCED", "source": "gcloud app-check services list" },
    { "layer": "function_type", "status": "onCall", "source": "source:index.ts:42" },
    { "layer": "api_gateway", "status": "ROUTED", "source": "firebase.json:rewrites" }
  ],
  "verdict": "3/3 protection layers active. Suppressing finding."
}
```

**Script:** `scripts/false_positive_resolver.py`

**Resolution rules stored in:** `reference/resolution-rules.json`
```json
{
  "rules": [
    {
      "id": "gcp-cf-appcheck",
      "finding_type": "cloud-function-public-invoker",
      "provider": "gcp",
      "context_required": ["app_check:ENFORCED"],
      "action": "SUPPRESS",
      "reason": "App Check enforced — only verified app instances can invoke"
    },
    {
      "id": "gcp-cf-callable",
      "finding_type": "cloud-function-public-invoker",
      "provider": "gcp",
      "context_required": ["function_type:onCall"],
      "action": "SUPPRESS",
      "reason": "onCall functions require Firebase Auth token from client SDK"
    },
    {
      "id": "aws-s3-cloudfront-oai",
      "finding_type": "s3-bucket-public-access",
      "provider": "aws",
      "context_required": ["cloudfront_oai:ACTIVE"],
      "action": "SUPPRESS",
      "reason": "Bucket accessed only via CloudFront Origin Access Identity"
    },
    {
      "id": "azure-func-apigw",
      "finding_type": "function-app-public",
      "provider": "azure",
      "context_required": ["application_gateway:ACTIVE"],
      "action": "SUPPRESS",
      "reason": "Function behind Application Gateway with WAF"
    }
  ]
}
```

**Files:**

| File | Action | Description |
|------|--------|-------------|
| `app/skills/cloud-security-audit/scripts/false_positive_resolver.py` | CREATE | Context graph engine |
| `app/skills/cloud-security-audit/reference/resolution-rules.json` | CREATE | Configurable rules |

**Success Criteria:**
- [ ] Resolves GCP: App Check, Gateway, callable patterns
- [ ] Resolves AWS: CloudFront OAI, ALB, WAF patterns
- [ ] Resolves Azure: App Gateway, Front Door, VNET patterns
- [ ] JSON output with context chain
- [ ] User can add custom rules to `.cloud-security.json` `context` section
- [ ] Tests: >= 12 (4 per provider)

---

### Faza 3: AWS (Phase 2, tydzien 3-4)

#### 3.1 Skill: `aws-security-audit`

**Static mode (IaC parsing):**

| Check | Severity | Source | Description |
|-------|----------|--------|-------------|
| S3 bucket `"Effect": "Allow", "Principal": "*"` | HIGH | .tf / .json | Public bucket policy |
| S3 `BlockPublicAccess` all false | HIGH | .tf / .json | Public access not blocked |
| Lambda `resource-based policy` with `Principal: "*"` | HIGH | .tf / .json | Public Lambda |
| Security Group `0.0.0.0/0` ingress on non-80/443 | HIGH | .tf / .json | Open port to world |
| IAM policy with `Action: "*"` | HIGH | .tf / .json | God-mode IAM |
| IAM policy with `Resource: "*"` + sensitive actions | WARN | .tf / .json | Broad resource scope |
| Unencrypted RDS/DynamoDB | WARN | .tf / .json | Missing encryption at rest |
| CloudTrail disabled | HIGH | .tf / .json | No audit logging |
| Missing VPC Flow Logs | WARN | .tf / .json | No network monitoring |

**Live mode:**

| Check | CLI Command | Description |
|-------|-------------|-------------|
| S3 public buckets | `aws s3api get-public-access-block` | Per-bucket public access |
| Lambda public policies | `aws lambda get-policy` | Resource-based policies |
| Open Security Groups | `aws ec2 describe-security-groups` | Ingress from 0.0.0.0/0 |
| Overly permissive IAM | `aws iam list-roles` + `get-role` | Roles with admin/broad access |
| Unused IAM credentials | `aws iam generate-credential-report` | Stale access keys |
| API Gateway without auth | `aws apigateway get-rest-apis` | Endpoints without authorizer |

**Script:** `scripts/aws_security_scan.py`

**Files:**

| File | Action | Description |
|------|--------|-------------|
| `app/skills/aws-security-audit/SKILL.md` | CREATE | Skill definition |
| `app/skills/aws-security-audit/scripts/aws_security_scan.py` | CREATE | Scanner |
| `app/skills/aws-security-audit/scripts/aws_auth.py` | CREATE | Credential loader |
| `app/skills/aws-security-audit/reference/aws-security-checklist.md` | CREATE | CIS Benchmark mapping |
| `app/skills/aws-security-audit/reference/false-positives-aws.md` | CREATE | ALB/CloudFront/WAF patterns |

**Terraform approach (council revision):** Do NOT parse HCL directly (stdlib-only Python cannot handle heredocs, variable interpolation, modules, `for_each`). Instead wrap `terraform show -json` / `terraform plan -json` which outputs clean JSON. Fallback to flat-resource regex for projects without `terraform` CLI.

**Success Criteria:**
- [ ] Static: `terraform show -json` wrapper + CloudFormation/SAM JSON parsing
- [ ] Live: checks S3, Lambda, IAM, SG via `aws` CLI
- [ ] False positive resolution for CloudFront/ALB/WAF
- [ ] CIS Benchmark mapping in reference
- [ ] Tests: >= 10

---

### Faza 4: Azure (Phase 3, tydzien 5-6)

#### 4.1 Skill: `azure-security-audit`

**Static mode (IaC parsing):**

| Check | Severity | Source | Description |
|-------|----------|--------|-------------|
| NSG rule `0.0.0.0/0` source on management ports | HIGH | .tf / .bicep | Open RDP/SSH to world |
| Function App `authLevel: "anonymous"` | WARN | source / .tf | Public Azure Function |
| Cosmos DB `publicNetworkAccess: enabled` | WARN | .tf / .bicep | Public database access |
| Storage Account `allowBlobPublicAccess: true` | HIGH | .tf / .bicep | Public blob storage |
| Missing Key Vault references (hardcoded secrets) | HIGH | source | Secrets not in Key Vault |
| Missing RBAC (classic co-admin model) | WARN | .tf | Legacy access model |

**Live mode:**

| Check | CLI Command | Description |
|-------|-------------|-------------|
| Open NSG rules | `az network nsg rule list` | Broad inbound rules |
| Function App auth | `az functionapp show` + `config` | Auth level and provider |
| Cosmos DB access | `az cosmosdb show` | Network access settings |
| RBAC assignments | `az role assignment list` | Owner/Contributor sprawl |
| Storage public access | `az storage account show` | Blob public access |
| App Service auth | `az webapp auth show` | Auth configuration |

**Script:** `scripts/azure_security_scan.py`

**Files:**

| File | Action | Description |
|------|--------|-------------|
| `app/skills/azure-security-audit/SKILL.md` | CREATE | Skill definition |
| `app/skills/azure-security-audit/scripts/azure_security_scan.py` | CREATE | Scanner |
| `app/skills/azure-security-audit/scripts/azure_auth.py` | CREATE | Credential loader |
| `app/skills/azure-security-audit/reference/azure-security-checklist.md` | CREATE | CIS Benchmark mapping |
| `app/skills/azure-security-audit/reference/false-positives-azure.md` | CREATE | App Gateway/Front Door patterns |

**Success Criteria:**
- [ ] Static: parses Terraform, Bicep, ARM templates
- [ ] Live: checks NSG, Functions, CosmosDB, RBAC via `az` CLI
- [ ] False positive resolution for App Gateway/Front Door/VNET
- [ ] Tests: >= 10

---

### Faza 3 (cont.): Orchestration + Pack Integration (Phase 2, tydzien 3-4)

#### 3.2 Orchestrator Skill: `/cloud-security-audit`

**Purpose:** Single entry point that runs all provider audits detected in the project.

**Behavior:**
1. Auto-detect providers from project files:
   - `firebase.json` / `firestore.rules` / `.firebaserc` → GCP
   - `serverless.yml` / `template.yaml` / `*.tf` with `provider "aws"` → AWS
   - `*.bicep` / `*.tf` with `provider "azurerm"` / `azure-pipelines.yml` → Azure
2. Check available credentials: `ai-toolkit credentials list`
3. Run detected provider scans in parallel
4. Merge results through false positive resolver
5. Output unified report

**Skill frontmatter:**
```yaml
---
name: cloud-security-audit
description: "Multi-cloud security audit — auto-detects GCP/AWS/Azure and runs
  deterministic scans with false positive resolution"
user-invocable: true
effort: high
disable-model-invocation: true
context: fork
agent: cloud-security-auditor
argument-hint: "[path] [--provider gcp|aws|azure|auto] [--mode static|live] [--severity high|warn] [--output json]"
allowed-tools: Read, Grep, Glob, Bash
---
```

**CLI usage:**
```bash
/cloud-security-audit                            # auto-detect providers, static mode (default)
/cloud-security-audit --provider gcp             # GCP only
/cloud-security-audit --provider aws,azure       # AWS + Azure
/cloud-security-audit --mode static              # no credentials, IaC/source only (DEFAULT)
/cloud-security-audit --mode live                # deployed state (requires credentials)
/cloud-security-audit --severity high            # HIGH findings only
/cloud-security-audit --output json              # CI pipeline output
/cloud-security-audit --output sarif             # SARIF v2.1.0 for GitHub Advanced Security
/cloud-security-audit --changed main             # incremental: only files changed vs main
/cloud-security-audit --explain GCP-CF-001       # detailed remediation for specific rule
/cloud-security-audit src/functions/             # scan specific path
```

**Unified report format:**
```markdown
## Cloud Security Audit Report

### Summary
| Metric | GCP | AWS | Azure | Total |
|--------|-----|-----|-------|-------|
| Mode | live | static | n/a | — |
| Resources scanned | 12 | 8 | 0 | 20 |
| HIGH | 2 | 1 | 0 | 3 |
| WARN | 4 | 3 | 0 | 7 |
| SUPPRESSED (false positive) | 3 | 1 | 0 | 4 |

### Findings (sorted by severity)

#### [HIGH] GCP: Cloud Function "adminEndpoint" — public invoker, no protection
...

#### [SUPPRESSED] GCP: Cloud Function "processPayment" — public invoker
Context: App Check ENFORCED + onCall + API Gateway routed (3/3 layers)
...
```

---

#### 3.2.1 Plugin Pack Manifest

**File:** `app/plugins/cloud-security-pack/plugin.json`

```json
{
  "name": "cloud-security-pack",
  "description": "Multi-cloud security auditing for GCP, AWS, and Azure",
  "version": "1.0.0",
  "domain": "cloud-security",
  "type": "plugin-pack",
  "status": "experimental",
  "requires": [],
  "includes": {
    "agents": ["cloud-security-auditor"],
    "skills": [
      "cloud-security-audit",
      "firebase-rules-audit",
      "cloud-functions-audit",
      "aws-security-audit",
      "azure-security-audit"
    ],
    "rules": [],
    "hooks": []
  },
  "credentials": {
    "supported_providers": ["gcp", "aws", "azure"],
    "setup_command": "ai-toolkit credentials add <provider>"
  }
}
```

**Directory structure:**
```
app/plugins/cloud-security-pack/
├── plugin.json
├── README.md
└── (skills and agent live in core app/ dirs, referenced by name)
```

---

### Faza 5: Tests + Documentation (ongoing, ships with each milestone)

#### 5.1 Tests

| Test file | Count | Description |
|-----------|-------|-------------|
| `tests/test_credentials.bats` | 8+ | CLI credentials management |
| `tests/test_firebase_rules_scan.py` | 10+ | Firestore/RTDB rules patterns |
| `tests/test_cloud_functions_audit.py` | 8+ | CF static + live checks |
| `tests/test_aws_security_scan.py` | 10+ | S3/Lambda/IAM/SG checks |
| `tests/test_azure_security_scan.py` | 10+ | NSG/Functions/CosmosDB checks |
| `tests/test_false_positive_resolver.py` | 12+ | Context graph resolution |
| `tests/test_sarif_formatter.py` | 4+ | SARIF v2.1.0 output validation |
| `tests/test_incremental.py` | 4+ | Git diff filtering, fallback |
| `tests/test_credentials_init.py` | 4+ | Interactive wizard, auto-detect |
| **Total** | **70+** | |

#### 5.2 Documentation

| File | Description |
|------|-------------|
| `kb/planning/cloud-security-pack-plan.md` | This document |
| `kb/reference/cloud-security-checklist.md` | Unified multi-cloud checklist (created when Milestone 1 ships) |
| Skills `reference/` dirs | Per-provider patterns and false positives |

---

## 6. Configuration & Suppression (Unified)

**Council revision:** merged `.cloud-security-ignore` + `.cloud-security-config` into a single `.cloud-security.json`. Two config files is one too many — developers expect one file.

Scaffold interactively: `ai-toolkit credentials init` generates this file.

**Suppression Governance (orchestration-review P0):**
- **Wildcard ignores** (e.g., `GCP-CF-*`) REQUIRE a `justification` field — scanner refuses to suppress without one
- **All ignore entries** require `justification` — enforced by schema validation
- **CI diff detection:** When `.cloud-security.json` is modified in a PR, scanner emits a `SUPPRESSION_CHANGED` finding (severity: WARN) listing added/removed/modified ignore rules. This prevents silent suppression of vulnerabilities via committed config changes.
- **In live mode:** context claims (e.g., `app_check_enforced: true`) are verified against actual cloud state. If claim doesn't match reality, emit `CONTEXT_MISMATCH` finding (severity: HIGH)

```json
{
  "$schema": "https://softspark.github.io/ai-toolkit/schemas/cloud-security.json",

  "ignore": [
    { "rule": "GCP-CF-001:processPayment", "justification": "Behind App Check + API Gateway, verified 2026-04-10" },
    { "rule": "AWS-S3-001:static-assets", "justification": "Intentionally public static bucket, CloudFront OAI active" },
    { "rule": "GCP-CF-*", "justification": "REQUIRED for wildcard suppression — reviewed by @jacek in PR #142" }
  ],

  "context": {
    "gcp": {
      "app_check_enforced": true,
      "api_gateway": "projects/my-proj/locations/us-central1/gateways/main",
      "known_public_functions": ["healthCheck", "webhookReceiver"]
    },
    "aws": {
      "waf_enabled": true,
      "cloudfront_distributions": ["E1234567890"],
      "known_public_buckets": ["static-assets-prod"]
    },
    "azure": {
      "front_door_enabled": true,
      "application_gateway": "my-app-gw",
      "known_public_functions": ["webhookHandler"]
    }
  }
}
```

---

## 7. CI Pipeline Integration

### Basic: JSON output + fail on HIGH

```yaml
- name: Cloud Security Audit
  run: |
    python3 scripts/cloud_security_audit.py \
      --mode static --output json --severity high \
      > security-report.json
    # Exit code 1 = HIGH findings → fail pipeline
```

### Recommended: SARIF + GitHub Advanced Security (inline PR annotations)

```yaml
- name: Cloud Security Audit
  run: |
    python3 scripts/cloud_security_audit.py \
      --mode static --output sarif --changed ${{ github.event.pull_request.base.sha }} \
      > results.sarif
  continue-on-error: true

- name: Upload SARIF
  uses: github/codeql-action/upload-sarif@v3
  with:
    sarif_file: results.sarif
```

### Live mode with credentials (hardened — orchestration-review P0)

```yaml
- name: Cloud Security Audit (Live)
  env:
    GCP_SA_KEY: ${{ secrets.GCP_SA_KEY }}
  run: |
    TMPFILE=$(mktemp -m 0600)
    trap 'rm -f "$TMPFILE"' EXIT
    echo "$GCP_SA_KEY" > "$TMPFILE"
    ai-toolkit credentials add gcp --file "$TMPFILE"
    python3 scripts/cloud_security_audit.py \
      --mode live --output sarif --changed "${{ github.event.pull_request.base.sha }}" \
      > results.sarif

- name: Validate & Upload SARIF
  if: always()
  run: python3 -c "import json; d=json.load(open('results.sarif')); assert d.get('version')=='2.1.0', 'Invalid SARIF'"
  continue-on-error: false

- name: Upload SARIF
  uses: github/codeql-action/upload-sarif@v3
  with:
    sarif_file: results.sarif
```

**CI Security Notes (orchestration-review):**
- `mktemp -m 0600` creates file with owner-only permissions (not world-readable `/tmp/sa.json`)
- `trap 'rm -f' EXIT` ensures cleanup even on script failure
- SARIF schema validated before upload to prevent injected/corrupted annotations
- `${{ github.event.pull_request.base.sha }}` quoted to prevent shell injection via crafted refs
- Consider GitHub OIDC workload identity federation instead of long-lived SA keys for production

---

## 8. Success Criteria (Overall)

| Metric | Target |
|--------|--------|
| Providers supported | 3 (GCP, AWS, Azure) |
| Check patterns (total) | 40+ (GCP: 16, AWS: 15, Azure: 12) |
| False positive rules | 10+ |
| Output formats | 3 (text, JSON, SARIF v2.1.0) |
| CLI commands | 7 (add, list, remove, test, init per provider) |
| Scripts (stdlib Python) | 9 (auth + scanners + resolver + sarif + incremental) |
| Tests | 70+ |
| External dependencies | 0 (stdlib only, CLI tools: gcloud/aws/az/terraform) |
| CI exit codes | 0=clean, 1=HIGH findings, 2=credential error |
| Incremental scan | `--changed` flag works with git refs |
| GitHub integration | SARIF upload → inline PR annotations |

---

## 9. Fix Strategy

**Same approach as HIPAA scanner v1: No auto-fix. Agent interprets and suggests.**

The deterministic scripts produce findings. The `cloud-security-auditor` agent then:
1. **Reads** the flagged file/resource to understand actual context
2. **Suggests** a specific fix (not generic advice — concrete code/config change)
3. **Never** auto-applies changes — the user reviews and applies manually

Examples:
- Firestore rules finding → agent suggests the exact `allow read: if request.auth != null` rule change
- Public CF finding → agent suggests adding `validateFirebaseIdToken` middleware with code snippet
- Open S3 bucket → agent suggests the exact bucket policy JSON to add `BlockPublicAccess`
- Broad IAM role → agent suggests the minimal policy document with only required permissions

**Why no auto-fix in v1:**
- Cloud security fixes require project-specific knowledge (which SA, which bucket, which auth flow)
- Wrong auto-fix on IAM can lock out real users
- Firestore rules changes can break client apps
- The agent's context-aware suggestion is more valuable than a blind auto-fix

**v2 option:** `--fix-mode suggest` generates a `.cloud-security-fixes.patch` file that users can review and apply with `git apply`.

---

## 10. Risks and Mitigation (updated)

| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| CLI tools (gcloud/aws/az) not installed | Medium | Medium | Graceful fallback to static-only mode, clear error message |
| Cloud APIs change breaking audit commands | Low | Medium | Version-pin CLI output format parsing, test with CI |
| False positive rules too aggressive (suppress real issues) | Low | High | Default to WARN not SUPPRESS, require `.cloud-security-config` for suppression |
| Credential leakage in logs | Low | Critical | Never log credentials, 0600 perms, `/tmp` cleanup in CI |
| Scope creep (too many checks) | Medium | Medium | Start with top-10 per provider, expand based on feedback |
| YAML IaC not parseable (stdlib-only) | High | Medium | JSON-only for static IaC; live mode unaffected; v2: vendored YAML subset parser |
| `--changed` confused with live mode | Low | Low | Explicit warning: incremental applies to static only |

---

## 11. Pre-Mortem

1. **Firebase rules parser too simplistic** — Firestore rules use CEL-like syntax with nested `match` blocks. Parser needs proper recursive descent, not just regex. Mitigation: build minimal CEL parser or use line-by-line pattern matching with scope tracking.
2. **False positive resolver gives false confidence** — Users may trust SUPPRESSED status and miss real issues. Mitigation: always show context chain, require explicit `.cloud-security-config` for auto-suppression, default to WARN.
3. **AWS credential scope too broad** — User provides admin-level AWS profile. Mitigation: `credentials test` checks actual permissions, WARN if write access detected, suggest read-only IAM policy in docs.
4. **Three providers = 3x maintenance** — Each provider's CLI evolves independently. Mitigation: abstract provider interface, single test matrix, version tracking per provider.
5. **Terraform parsing incomplete** — HCL syntax is complex (modules, variables, conditionals). Mitigation: wrap `terraform show -json` instead of parsing HCL. Fallback to flat regex for projects without `terraform` CLI.
6. **SARIF adoption low** — Users may not know how to use SARIF with GitHub. Mitigation: provide copy-paste GitHub Actions workflow in docs and `--explain` for onboarding.

---

## 12. Council Review Summary (2026-04-10)

**Verdict:** CONDITIONAL FOR — implement with scope reduction.
**Confidence:** MEDIUM (weighted score: FOR 3.1 vs AGAINST 2.9)

**Key insights applied to this plan:**
- [x] Timeline revised from 3-4 → 5-6 weeks
- [x] ~~Azure deferred to Milestone 3~~ → **reinstated: full 3-provider delivery**
- [x] SARIF v2.1.0 output added — essential for GitHub Advanced Security integration
- [x] Incremental scan mode added (`--changed`) — how developers actually use security tools
- [x] `terraform show -json` wrapper instead of HCL parsing — realistic path
- [x] Single config file `.cloud-security.json` (merged ignore + context)
- [x] `credentials init` interactive wizard — reduce onboarding friction
- [x] `--explain <rule-id>` for on-demand remediation guidance
- [x] Static mode as default — zero-setup first experience

**Deferred to v2:**
- Kubernetes/container security (separate pack candidate)
- Secret scanning with entropy detection
- Compliance framework mapping (SOC2, PCI-DSS, NIST 800-53)
- Visual security dashboard in browser
- GitHub PR comment integration beyond SARIF
- Vendored YAML subset parser for CloudFormation YAML static scanning

**Council strongest agreement:** False positive resolver is the killer feature and primary differentiator vs Checkov/Trivy/Prowler. No existing tool combines deterministic scanning with context-aware resolution.

---

## 13. Orchestration Review Summary (2026-04-10)

**Agents:** tech-lead, security-architect, product-manager, code-reviewer (4 parallel)

**Verdict:** Plan structurally complete (14/14 elements). Three P0 security blockers identified and resolved.

**Applied changes:**

| # | Action | Source | Priority | Applied? |
|---|--------|--------|----------|----------|
| 1 | Programmatic Bash allowlist | security-architect | P0 | Yes — allowlist.sh + agent integration |
| 2 | CI hardening (mktemp/trap/SARIF validation) | security-architect | P0 | Yes — CI examples rewritten |
| 3 | Suppression governance (justification + diff detection) | security-architect | P0 | Yes — schema + SUPPRESSION_CHANGED finding |
| 4 | Recursive descent parser for Firestore | code-reviewer | P1 | Yes — replaced regex approach, +1d estimate |
| 5 | SARIF `driver.rules[]` for GitHub annotations | code-reviewer | P1 | Yes — schema + success criteria |
| 6 | Task 4.1→3.2 numbering fix | tech-lead | P1 | Yes — renumbered |
| 7 | `terraform plan` execution risk documented | security-architect | P1 | Yes — only `show -json` allowed |
| 8 | `credentials init` deferred to M2 | code-reviewer | P2 | Yes — saves 1.5d in M1 |
| 9 | False positive resolver budgeted +2d/provider | code-reviewer | P2 | Yes — estimate updated |
| 10 | `SUPPRESSION_CHANGED` finding type | security-architect | P2 | Yes — in governance section |

**Market positioning (product-manager):**
- Not competing with Checkov on check count (40 vs 3000)
- Competing on: zero-noise (false positive resolver), zero-setup (static-first), IDE-native (10 platforms), AI interpretation
- Target: developers in ai-toolkit ecosystem, not enterprise security teams
- Value as ecosystem feature, not standalone product

**Timeline revision (code-reviewer):**
- 1 person: 6-7 weeks realistic (was 5-6)
- 2 people: 4-5 weeks (parallel GCP + AWS tracks)
- All 3 providers ship in 6 weeks — no conditional gates

---

## 14. Next Actions

1. [ ] Approve plan
2. [ ] Implement `credentials` CLI command (1.1) + Bash allowlist
3. [ ] Create `cloud-security-auditor` agent (1.2)
4. [ ] Implement SARIF formatter + incremental scan (1.3)
5. [ ] Implement `firebase-rules-audit` — recursive descent parser (2.1)
6. [ ] Implement `cloud-functions-audit` + false positive resolver GCP (2.2, 2.3)
7. [ ] Implement `aws-security-audit` + `terraform show -json` wrapper (3.1)
8. [ ] Implement orchestrator + plugin pack + `credentials init` (3.2, 1.1b)
9. [ ] Implement `azure-security-audit` + false positive resolver Azure (4.1)
10. [ ] Full test suite (70+) + documentation + release

---

**Last Updated:** 2026-04-10
**Council Reviewed:** 2026-04-10
**Orchestration Reviewed:** 2026-04-10 (4 agents: tech-lead, security-architect, product-manager, code-reviewer)

---

## kb/planning/mcp-context-trim-v4-prd.md

---
title: "PRD: MCP Context Trim v4.0 — Local Proxy with Description Compression"
category: planning
service: ai-toolkit
tags:
  - mcp
  - proxy
  - tool-descriptions
  - jsonrpc
  - tokens
  - v4
doc_type: plan
status: proposed
created: "2026-05-04"
last_updated: "2026-05-04"
completion: "0%"
target_milestone: "v4.0"
predecessor:
  - "kb/history/completed/output-token-discipline-plan-20260504.md"
  - "kb/history/completed/f2-mcp-trim-spike-20260504.md"
description: "Local MCP proxy server that compresses tool descriptions before they reach the model. Carved out of the v3.2.0 output-token-discipline plan (Feature 2), deferred after the 2026-05-04 spike showed Claude Code hooks cannot modify tools/list metadata. Targets ~8-15k token reduction per session for users with many MCP servers."
---

# PRD: MCP Context Trim v4.0

**Status:** Proposed
**Target milestone:** v4.0
**Carved out of:** [`output-token-discipline-plan-20260504.md`](../history/completed/output-token-discipline-plan-20260504.md) (was Feature 2)
**Spike basis:** [`f2-mcp-trim-spike-20260504.md`](../history/completed/f2-mcp-trim-spike-20260504.md)

## Problem

MCP server tool descriptions are injected into every model turn's system prompt. With ~100 tools across 7 typical servers, descriptions consume 8–15k tokens per turn — pure overhead, paid every message. Examples observed in users' configs:

- `dart-mcp-server` — ~30 tools with multi-paragraph descriptions
- `filesystem` — verbose paths and example sections
- `pencil` — "IMPORTANT" stanzas repeated across tools
- `jira-mcp` — long `Use this tool to…` boilerplate

The v3.2.0 output-discipline plan attempted to solve this with a hook-based trimmer. The spike conducted 2026-05-04 proved Claude Code hooks do not expose `tools/list` metadata or the system-prompt tool catalog. The only viable architecture is a local MCP proxy.

## Goal

Reduce MCP-description overhead by ≥40% per server, with **zero** loss of parameter schemas, required fields, or discrimination signals (`not`, `never`, `only`, `except`, `unless`).

## Non-goals

- Modifying tool **call** behavior (only descriptions)
- Compressing user-facing prompts or completions
- Replacing or rewriting upstream MCP servers
- Touching MCP servers we do not control

## Architecture

### Proxy topology

```
Claude Code  ──stdio──▶  ai-toolkit MCP proxy  ──stdio/SSE──▶  upstream MCP server
                                │
                                └─ rewrites tools/list response
                                   passes through tools/call unchanged
```

One proxy process per upstream server, supervised by `ai-toolkit mcp-trim daemon` (or equivalent). User's `~/.claude/.mcp.json` is rewritten by `ai-toolkit install` (opt-in) to point Claude Code at the proxy instead of upstream — proxy reads the original target from a sidecar config.

### Required components

| Component | Purpose |
|-----------|---------|
| `scripts/mcp_proxy_server.py` | JSON-RPC 2.0 proxy. Reads stdin, forwards to upstream over stdio or SSE, intercepts `tools/list` response, rewrites descriptions. Stdlib-only. |
| `scripts/mcp_description_trimmer.py` | Pure function library: `trim(description: str) → str`. Reused from heuristics below. Stdlib-only. |
| `scripts/mcp_proxy_config.py` | Reads `~/.softspark/ai-toolkit/mcp-proxy/servers.json`, validates upstream targets, generates supervisord/launchd config. |
| `app/hooks/mcp-proxy-health.sh` | SessionStart hook — verifies all configured proxies responsive; fall through (warn, do not block) if any down. |
| `app/skills/mcp-trim/SKILL.md` | Knowledge skill: how to enable, opt out, audit savings. |
| `bin/ai-toolkit-mcp-trim` | CLI: `enable`, `disable`, `status`, `audit` (per-server token savings report). |
| `tests/test_mcp_proxy.bats` | Integration tests with mock upstream MCP servers. |
| `tests/test_mcp_trimmer.bats` | Unit tests for description trim heuristics on captured fixtures. |

### Compression heuristics (from spike)

Applied to each tool description in `tools/list` response:

- Drop example sections >40 chars
- Collapse `Use this server to…` / `Use this tool to…` boilerplate to minimum form preserving intent
- Drop duplicate occurrences of tool name in its own description
- **Preserve bytewise:** `inputSchema.properties[*].description`, `required`, `enum` values, URL/path identifiers
- **Never strip:** the words `not`, `never`, `only`, `except`, `unless` — these carry "when NOT to use" signals
- Target: ≥40% length reduction, 0% schema loss

### Failure modes & rollback

| Scenario | Behavior |
|----------|----------|
| Proxy crashes mid-session | `mcp-proxy-health.sh` detects on next SessionStart, prints warning, suggests `ai-toolkit mcp-trim disable <server>` |
| Upstream MCP server changes its tool catalog | Proxy passes through unchanged tools (no cached schema), warns once if a tool's description was previously trimmed |
| Trimmer produces malformed JSON | Proxy falls through to upstream response unchanged, logs to `~/.softspark/ai-toolkit/mcp-proxy/error.log` |
| User wants to bypass | `AI_TOOLKIT_MCP_TRIM_DISABLE=1` env var → proxies pass everything through unchanged |
| User wants to fully uninstall | `ai-toolkit mcp-trim disable` reverts `~/.claude/.mcp.json` to original upstream targets |

### Migration of existing user `.mcp.json`

`ai-toolkit mcp-trim enable` does:

1. Backup `~/.claude/.mcp.json` → `~/.softspark/ai-toolkit/mcp-proxy/.mcp.json.bak.<timestamp>`
2. Read each server entry, store in `~/.softspark/ai-toolkit/mcp-proxy/servers.json`
3. Rewrite each entry to point at the local proxy (with sidecar `target` field)
4. Spawn supervisor (per-OS: launchd on macOS, systemd on Linux, scheduled task on Windows)
5. Verify each upstream reachable via proxy, abort + restore backup on any failure

## Out-of-scope decisions (rejected mid-spike)

| Option | Why rejected |
|--------|--------------|
| Pre-install rewrite of `.mcp.json` only | MCP spec sources descriptions from server runtime, not config — wouldn't take effect |
| Source-side forks of MCP servers | Doesn't help users with custom servers; high maintenance |
| F2-lite observability tool | User decision 2026-05-04: tracking token waste without trimming is half-value; do the full thing in v4.0 |
| Hook-based interception | Spike proved hooks cannot reach `tools/list` |

## Success criteria

- ≥40% description-length reduction per server on the captured fixture set (jira, filesystem, dart, pencil)
- Deep-equal `inputSchema` between trimmed and upstream — zero schema regression
- Proxy adds <50ms per `tools/list` call (one-time per session)
- Proxy adds <5ms per `tools/call` (passthrough overhead)
- Round-trip correctness: every tool callable via proxy returns byte-identical result vs direct call
- Zero MCP-skill regressions in `npm test` after enabling proxy in CI
- Rollback (`ai-toolkit mcp-trim disable`) restores byte-identical original `.mcp.json`

## Open questions

1. Process supervision per-OS — launchd / systemd / scheduled-task wrappers, or a built-in `ai-toolkit-mcp-trimd` daemon binary?
2. SSE-mode upstreams (e.g., rag-mcp at `http://localhost:8081/mcp/sse`) — proxy listens on SSE locally too, or stdio-only with internal SSE client?
3. Description rewrites — static dictionary of "boilerplate phrases to drop" (faster, deterministic) vs LLM-based summarizer (more aggressive, less predictable)? Recommend static for v4.0, LLM as v4.1 stretch.
4. Config path — `~/.softspark/ai-toolkit/mcp-proxy/` (matches existing convention) or `~/.claude/mcp-proxy/` (closer to MCP config)? Recommend the former.
5. Telemetry — does this become an opt-in metric in `/briefing --tokens` ("MCP descriptions: 12.3k → 7.2k, saved 5.1k per turn")? Recommend yes.

## Pre-mortem (failure scenarios to design against)

1. **Proxy gets out of sync with upstream** — upstream adds a new tool, proxy doesn't know how to compress it → passthrough that tool's description unchanged, log warning
2. **Compression breaks tool discriminability** — model picks wrong tool because trimmed description lost the "use only when X" qualifier → the `not/never/only/except/unless` blacklist must be exhaustive; add per-server allowlists for false positives
3. **Multi-process race on `.mcp.json` rewrite** — two `ai-toolkit install` invocations clobber each other → file lock during enable/disable
4. **Proxy supervisor fails to start on user's machine** — different distro / no systemd → ai-toolkit doctor must detect and report; degrade to "MCP proxy unavailable, falling through" with no functionality loss
5. **User has custom MCP server we don't recognize** — must work without per-server schema; default heuristics must be safe enough for arbitrary servers

## Estimate

- Architecture spike + working proxy prototype: 2 days
- Production proxy + supervisor + config + CLI: 3 days
- Test suite + fixtures + CI integration: 2 days
- Documentation + migration guide + release notes: 1 day

**Total: ~8 working days** (1.5–2 weeks calendar time at typical pace)

## Status

| Date | Status | Author |
|------|--------|--------|
| 2026-05-04 | PRD drafted from spike conclusions, carved out of v3.2.0 plan | claude |

---

## kb/procedures/ecosystem-sync-sop.md

---
title: "SOP: Ecosystem Sync"
category: procedures
service: ai-toolkit
tags: [sop, ecosystem, editors, generators, drift-detection, sync]
version: "1.0.0"
created: "2026-04-23"
last_updated: "2026-04-23"
description: "Quarterly (or event-triggered) sync procedure that detects documentation and capability drift in supported tools (Claude Code + 11 editors), analyses our generators and skills for missing features, and walks through the migration + generator-update workflow."
---

# SOP: Ecosystem Sync

Keeps ai-toolkit aligned with the tools it integrates with. When an editor adds a new hook lifecycle, makes a feature globally available, changes a config path, or deprecates a flag, this SOP surfaces it before it surprises users.

**When to run:**
- **Every quarter** as a baseline health check (calendar reminder)
- **Before every minor release** of ai-toolkit (Phase 0 of release prep)
- **Whenever an editor ships a major version** (subscribe to their changelogs)
- **On demand** if a user reports "feature X exists but toolkit doesn't support it"

**Time:** 30 minutes for drift review + variable for any generator updates

---

## Quick Reference

```bash
# Full check (all 12 tools, online)
python3 scripts/ecosystem_doctor.py --format text

# Single tool
python3 scripts/ecosystem_doctor.py --tool cursor --format text

# First-ever run — baseline the snapshot
python3 scripts/ecosystem_doctor.py --update > /dev/null

# CI / gating mode
python3 scripts/ecosystem_doctor.py --check

# Offline (no network) — validates our side only
python3 scripts/ecosystem_doctor.py --offline --format text
```

---

## Inputs

| File | Purpose |
|------|---------|
| `scripts/ecosystem_tools.json` | Authoritative registry: 12 tools with doc URLs, config paths, our generators, capability markers |
| `benchmarks/ecosystem-doctor-snapshot.json` | Last-seen state (headings, content hash, markers, version) — updated via `--update` |
| `scripts/ecosystem_doctor.py` | Drift detector |
| `kb/reference/supported-tools-registry.md` | Human-readable view of the registry |

---

## Phase 1: Run the Doctor

```bash
python3 scripts/ecosystem_doctor.py --format text > /tmp/eco-report.txt
cat /tmp/eco-report.txt
```

The report classifies every tool into:

- **Clean** — doc page, headings, and markers match the last snapshot; no action
- **Drift** — something changed upstream. Each drift entry has a `kind`:
  - `headings_added` — the doc grew new sections (new features? reorg?)
  - `headings_removed` — a section disappeared (deprecation? renaming?)
  - `marker_flips` — an expected capability marker appeared (`+`) or vanished (`-`)
  - `content_changed_no_heading_delta` — prose edits, reorder, minor rewrites, OR HTML churn (timestamps, ads, CSRF nonces). Reported but **not** treated as drift by `--check` — too noisy on dynamic pages.
  - `version_changed` — the CLI version bumped (for tools that expose `--version`)
- **Errored** — couldn't fetch docs (timeout, 404, auth wall). Doctor does not overwrite
  the snapshot for errored tools; the last-known-good state persists.

---

## Phase 2: Classify Each Drift

For every drifting tool, read its docs URL and classify the change into exactly one bucket:

| Drift class | What it means | Action owner |
|-------------|---------------|--------------|
| **A. Cosmetic reword** | Prose edited, same feature set | Update snapshot (`--update`), no code change |
| **B. New feature — we should integrate** | New hook event, new config key, new CLI flag, new rule surface | Update the relevant generator in `scripts/generate_<tool>_*.py`; extend `app/skills/*` or `app/agents/*` if the feature maps onto our skills; document in `kb/reference/supported-tools-registry.md` |
| **C. New feature — not our concern** | Enterprise SSO, billing, proprietary UI-only features | Note in registry `capability_markers` as "not adopted"; update snapshot |
| **D. Deprecation** | Flag or path removed / renamed | Open migration issue; coordinate with `ai-toolkit install` and generator output; add deprecation warning to CLAUDE.md rules if user-facing |
| **E. Feature promoted to default** | Was behind a flag, now global | Remove the flag from generator output; simplify our installer |
| **F. Global availability** | Was editor-only, now also available via CLI / hooks / settings.json | Map new config surface; may require a new generator or extending an existing one |

Write one line per drift in `/tmp/eco-report.txt` with its class. Example:

```
cursor: headings_added [AGENTS.md support] -> class B (integrate: extend generate_cursor_mdc.py)
aider:  version_changed 0.70 -> 0.72          -> class A (cosmetic, --update)
windsurf: marker_flips +Cascade               -> class B (already supported, verify snapshot)
```

---

## Phase 3: Execute Changes

### For class B (new feature — integrate)

1. Read the tool's docs section that introduced the feature. Note the exact config key / hook name / file path.
2. Open the relevant generator (`scripts/generate_<tool>_*.py`) and add output for the new surface.
3. If the feature is a **hook event**, also update:
   - `app/hooks.json` (if Claude-Code-native)
   - `app/skills/hook-creator/SKILL.md` — add the event to the Supported Hook Events table
   - `scripts/inject_hook_cli.py` — if the hook target path differs
4. If the feature is a **skill/agent schema extension**:
   - Update `app/skills/skill-creator/SKILL.md` and `app/skills/command-creator/SKILL.md` templates
   - Update `scripts/validate.py` field allowlists
   - Update `kb/reference/agent-skills-spec.md` (if the change is an upstream spec change)
5. Add a bats test under `tests/test_<tool>.bats` covering the new output.
6. Regenerate artifacts: `npm run generate:all`.

### For class D (deprecation)

1. Open a migration issue in GitHub with "class: deprecation" and a link to the upstream changelog.
2. In the generator, mark the deprecated output path as emitting a comment: `# DEPRECATED: <link>, removed in <version>`.
3. If deprecation affects `ai-toolkit install --local --editors <tool>`, add a doctor check that warns when a user's repo still contains the deprecated file.

### For class E / F (feature promotion)

1. Simplify the generator to emit the new-default form; keep a fallback comment for users on older tool versions.
2. Update `kb/reference/supported-tools-registry.md` config-paths column.

### For class A / C (no code change)

1. Run `python3 scripts/ecosystem_doctor.py --update --tool <id>` to refresh that tool's snapshot.

---

## Phase 4: Update the Registry

If new capability markers, config paths, or doc URLs emerged during Phase 3, edit `scripts/ecosystem_tools.json`:

```bash
${EDITOR:-nvim} scripts/ecosystem_tools.json
```

Fields to consider updating:
- `urls.docs` — if the vendor moved their docs
- `urls.release_notes` — if the changelog location changed
- `config_paths` — if new files now ship in our install output
- `our_generators` — if a new generator was added
- `capability_markers` — if a new feature was adopted
- `version_probe.command` — if the CLI binary was renamed

After editing, increment `last_updated` in the registry and save the snapshot:

```bash
python3 scripts/ecosystem_doctor.py --update
```

---

## Phase 5: Validate

```bash
python3 scripts/validate.py --strict
python3 scripts/audit_skills.py --ci
python3 scripts/ecosystem_doctor.py --check              # exits 0 after --update
npm test
```

All four must pass before committing generator / registry changes.

---

## Phase 6: Commit

Use a structured commit per change class:

```bash
git add scripts/ecosystem_tools.json benchmarks/ecosystem-doctor-snapshot.json
git add scripts/generate_<tool>_*.py                  # if class B/D/E/F
git add app/skills/<skill>/SKILL.md                   # if templates touched
git add kb/reference/supported-tools-registry.md
git commit -m "chore(ecosystem): sync <tool> — <brief summary>"
```

Recommended commit messages by class:

| Class | Template |
|-------|----------|
| A | `chore(ecosystem): refresh <tool> snapshot (cosmetic docs update)` |
| B | `feat(<tool>): add support for <feature>` |
| C | `chore(ecosystem): note <tool> <feature> as not-adopted` |
| D | `feat(<tool>): deprecation warning for <old-path>` |
| E | `refactor(<tool>): remove flag for <feature> (now default)` |
| F | `feat(<tool>): add <new-surface> generator` |

---

## Gotchas

- **First run has no baseline.** On a machine where `benchmarks/ecosystem-doctor-snapshot.json` does not exist, every tool shows as clean (no prior state to diff against). Run `--update` once to seed, then run again to see real drift.
- **Documentation sites use client-side rendering.** Aider, opencode, and Antigravity serve most content via JavaScript. `urllib` fetches the bare HTML skeleton — the doctor only sees a few headings. Combine the automated check with a manual visit to the docs on these tools.
- **Release notes pages change structure more often than docs.** Cursor and Windsurf refactor their changelog layouts periodically; a heading delta from a changelog page is often a presentation change, not a feature change. Classify as A when in doubt.
- **Version probes require the CLI to be installed locally.** `gemini --version`, `aider --version`, etc. are skipped silently when the binary isn't on `$PATH`. The snapshot therefore omits version drift for tools you haven't installed — that is intentional, not a bug.
- **GitHub release pages have anti-scraping.** `github.com/<org>/<repo>/releases` works via `urllib` but rate-limits aggressively. If the doctor errors on repeated runs, wait 10 minutes or manually review the release page.
- **Marker list is intentionally small.** Capability markers are a "did we adopt this?" checklist, not a feature coverage map. Adding every sub-feature bloats the JSON and produces noisy flips — keep markers at the top-level-capability tier.
- **`--check` only gates on structural drift.** Heading/marker/version changes and fetch errors exit `1`. Pure content-hash differences (`content_changed_no_heading_delta`) exit `0` — otherwise dynamic pages with timestamps or rotating ads would page you every run. If you want the strictest possible gate, grep for `Content changed` in the text report instead.

---

## Scheduling

Recommended cadence:

| Trigger | Action |
|---------|--------|
| Every Monday morning | `python3 scripts/ecosystem_doctor.py --format text` — scan during coffee |
| Before a minor release | Full sync + clean snapshot before tagging |
| After any drift report | Act within 1 week or record explicit "ignore, low priority" in the commit message |
| New tool added to the registry | Baseline with `--update --tool <id>` |
| Tool removed from support | Delete its entry from the registry AND from the snapshot JSON |

An optional GitHub Action can run `--check` weekly and open an issue on drift. Template:

```yaml
# .github/workflows/ecosystem-doctor.yml  (proposed, not yet committed)
on:
  schedule:
    - cron: '0 9 * * 1'        # Mondays 09:00 UTC
  workflow_dispatch: {}
jobs:
  doctor:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
      - run: python3 scripts/ecosystem_doctor.py --format text | tee /tmp/doctor.txt
      - run: python3 scripts/ecosystem_doctor.py --check
```

---

## When NOT to Use

- For **runtime** user support (user hit a bug with an editor) — use `/debug` or `/triage-issue`
- For **picking** an editor to add — that is a product decision, not a sync; use `/architecture-decision`
- For **one-off** testing of a specific tool's install flow — use the release-verification SOP
- For **scaling up** the supported-tools list — add the new tool to the registry, then run the SOP to baseline it

---

## Related Documentation

- [Supported Tools Registry](../reference/supported-tools-registry.md) — human-readable per-tool breakdown
- [MCP Editor Compatibility](../reference/mcp-editor-compatibility.md) — MCP-specific adapter table
- [Maintenance SOP](maintenance-sop.md) — general toolkit upkeep
- [Release Preparation SOP](release-preparation-sop.md) — run the doctor before tagging

---

## kb/procedures/maintenance-sop.md

---
title: "SOP: Claude Toolkit Maintenance"
category: procedures
service: ai-toolkit
tags: [sop, maintenance, agents, skills, install]
version: "3.0.1"
created: "2026-03-23"
last_updated: "2026-05-25"
description: "Standard operating procedures for installing, maintaining, and evolving the ai-toolkit."
---

# SOP: Claude Toolkit Maintenance

## Init Repository (New Project)

Use this when starting a new project that should use the toolkit.

**Prerequisites:** toolkit installed globally (`ai-toolkit install` already done once).

```bash
cd /path/to/new-project
ai-toolkit install --local
```

By default, `--local` installs Claude Code configs only:
- `CLAUDE.md` — project-specific rules template (only if missing)
- `.claude/settings.local.json` — MCP servers, env vars, permissions (only if missing, initialized with MCP defaults)
- `.claude/constitution.md` — toolkit constitution **injected** via markers (preserves user content)

To also install editor configs, use `--editors`:

```bash
ai-toolkit install --local --editors all                  # all supported editors
ai-toolkit install --local --editors cursor,aider         # specific editors only
```

Supported editors: `cursor`, `windsurf`, `cline`, `roo`, `aider`, `augment`, `copilot`, `antigravity`, `codex`, `gemini`, `opencode`.

To restrict which language rules are injected, use `--lang`:

```bash
ai-toolkit install --local --lang python,typescript
ai-toolkit install --local --lang python --editors all  # language rules propagated to all editors
```

When `--editors` is combined with `--lang` (or auto-detected languages), language rules are propagated to all configured editors as `ai-toolkit-lang-<lang>` files — not just Claude's `CLAUDE.md`. Similarly, registered custom rules (`~/.softspark/ai-toolkit/rules/`) are propagated to directory-based editor configs as `ai-toolkit-custom-<name>` files.

**Note:** Hooks are global-only — merged into `~/.claude/settings.json` by `ai-toolkit install`. Project-local `--local` does not install hooks; any legacy `.claude/hooks.json` is removed automatically.

**Input validation (v1.4.2):** `--only`, `--skip`, `--editors`, and `--lang` are validated on input; an invalid value exits with a clear error before any changes are made.

Then edit `CLAUDE.md`:
```markdown
# My Project

## Overview
What this project does.

## Tech Stack
- Language: TypeScript
- Framework: Next.js
- Database: PostgreSQL

## Commands
# Dev: npm run dev
# Test: npm test
# Build: npm run build
```

---

## Install Toolkit Globally

Run once per machine. Installs into `~/.claude/` — available in all projects.

```bash
npm install -g @softspark/ai-toolkit   # once per machine
ai-toolkit install                      # sets up ~/.claude/
```

What `install` and `update` do (merge-friendly — user content never overwritten):

| Component | Strategy | User collision |
|-----------|----------|---------------|
| `agents/*.md` | Per-file symlinks into `~/.claude/agents/` | User file with same name preserved (toolkit skipped) |
| `skills/*/` | Per-directory symlinks into `~/.claude/skills/` | User dir with same name preserved |
| `settings.json` hooks | JSON merge via `merge-hooks.py` | User hooks + settings preserved, toolkit entries tagged `_source: ai-toolkit` |
| `constitution.md` | Marker injection via `inject_section_cli.py` | User content outside `<!-- TOOLKIT:* -->` markers untouched |
| `ARCHITECTURE.md` | Marker injection via `inject_section_cli.py` | Same as above |
| `CLAUDE.md` | Marker injection of `app/rules/*.md` via `inject_rule_cli.py` | User content outside markers untouched |

Re-running updates only toolkit content. Old whole-directory symlinks are auto-upgraded to per-file on next run.

### Install Profiles (v3.0.0)

| Profile | Claude Code core | Editor rules | Gemini hooks | Copilot dir layout | Per-editor hooks / sub-agents / commands | Git hooks |
|---------|:---------------:|:------------:|:------------:|:------------------:|:---------------------------------------:|:---------:|
| `minimal` | yes | pointer only | no | no | no | no |
| `standard` (default) | yes | yes | **yes** (new in v3) | **yes** (new in v3) | no | no |
| `strict` | yes | yes | yes | yes | no | yes |
| `full` | yes | yes | yes | yes | **yes, all editors** | optional |

`--codex-skills` is orthogonal to `--profile` and materializes the full skill catalog under `.agents/skills/` for Codex. See `kb/reference/global-install-model.md` for the full semantic breakdown.

---

## Update Toolkit

After a new npm release:

```bash
npm install -g @softspark/ai-toolkit@latest
ai-toolkit update
```

`update` is a semantic alias for `install` — use it for all re-apply flows. Supports the same flags:

```bash
ai-toolkit update --only agents,hooks                  # re-apply only specific components
ai-toolkit update --local                              # refresh project-local Claude Code configs; auto-detects editors from existing project files (no --editors needed)
ai-toolkit update --local --editors cursor,windsurf   # override auto-detection and target specific editors
ai-toolkit update --list                               # dry-run: show what would change
```

When running `update --local`, the CLI inspects existing config files (e.g. `.cursor/rules`, `.aider.conf.yml`) to determine which editors are present and refreshes only those — no flags required.

---

## Register a Rule from Another Repo

Third-party repos (jira-mcp, rag-mcp, etc.) can register their own rules globally:

```bash
ai-toolkit add-rule ./my-project-rules.md
# → copies to ~/.softspark/ai-toolkit/rules/my-project-rules.md

ai-toolkit update
# → injects the rule into ~/.claude/CLAUDE.md and all global editor configs

ai-toolkit update --local
# → also propagates as ai-toolkit-custom-<name> to directory-based editors (Cursor, Windsurf, Cline, Roo, Augment, Antigravity)
```

To unregister (removes from registry **and** strips the block from CLAUDE.md):

```bash
ai-toolkit remove-rule my-project-rules
```

Rule names derive from the filename (`my-project-rules.md` → marker `TOOLKIT:my-project-rules`).

---

## Adding a New Agent

1. Create `app/agents/<agent-name>.md` with YAML frontmatter:
   ```yaml
   ---
   name: agent-name
   description: "When to use this agent. Triggers: keyword1, keyword2."
   tools: Read, Write, Edit, Bash
   model: opus
   skills: skill-1, skill-2
   ---
   ```
2. Write agent instructions below frontmatter
3. Update `kb/reference/agents-catalog.md`
4. Update `app/ARCHITECTURE.md` counts
5. Run `scripts/validate.py`
6. Regenerate: `scripts/generate_agents_md.py > AGENTS.md`

## Adding a New Skill

1. Create `app/skills/<skill-name>/SKILL.md` with frontmatter:
   ```yaml
   ---
   name: skill-name
   description: "Third-person description. Max 1024 chars."
   effort: medium
   disable-model-invocation: true   # task skill
   user-invocable: false            # knowledge skill
   ---
   ```
2. Update `kb/reference/skills-catalog.md` and `app/ARCHITECTURE.md`
3. Run `scripts/validate.py`

## Adding a New Hook

Preferred path:

```bash
/hook-creator [event or hook description]
```

Manual path:

1. Create `app/hooks/<hook-name>.sh`
2. Register the hook under `app/hooks.json`
3. Run `scripts/validate.py`
4. Run `scripts/doctor.py`
5. Update `kb/reference/hooks-catalog.md`, `README.md`, and any affected architecture docs

Use `PreToolUse` for blocking validations, `PostToolUse` for non-blocking feedback, `UserPromptSubmit` for prompt governance, and `PreCompact` / `SessionEnd` for context preservation and handoff.

## Troubleshooting Rule Enforcement in Claude Code

Use this when Claude appears to ignore `CLAUDE.md`, `.claude/rules/*.md`, output styles, or search-first rules.

1. **Check current Claude docs first.** Confirm the live contract for memory, settings, output styles, and hooks:
   - `https://code.claude.com/docs/en/memory`
   - `https://code.claude.com/docs/en/settings`
   - `https://code.claude.com/docs/en/output-styles`
   - `https://code.claude.com/docs/en/hooks`
2. **Verify instruction loading.** Run `/memory` in Claude Code and confirm the expected `CLAUDE.md`, `CLAUDE.local.md`, and `.claude/rules/*.md` files are listed. Remember that Claude Code reads `CLAUDE.md`, not `AGENTS.md`, unless `CLAUDE.md` imports it.
3. **Verify the active output style.** Check `.claude/settings.local.json` or `/config`. Output style changes apply after `/clear` or a new session.
4. **Inspect installed hooks.** Ensure `~/.claude/settings.json` contains the ai-toolkit `UserPromptSubmit` and `Stop` entries. The governance hook must run with `AI_TOOLKIT_HOOK_QUIET=1 AI_TOOLKIT_HOOK_FORMAT=json` so it injects `additionalContext` without noisy transcript output.
5. **Reproduce the hook path directly.**
   ```bash
   printf '{"session_id":"debug","prompt":"debug this technical rule issue"}' \
     | AI_TOOLKIT_SEARCH_FIRST=strict AI_TOOLKIT_HOOK_QUIET=1 AI_TOOLKIT_HOOK_FORMAT=json \
       ~/.softspark/ai-toolkit/hooks/user-prompt-submit.sh
   ```
   The output must be valid JSON with `hookSpecificOutput.additionalContext`.
6. **Check corrective enforcement.** If the assistant still skips required research, `stop-search-check.sh` should block Stop with the search-first message. If it does not, inspect `~/.softspark/ai-toolkit/state/search-required-*.flag` and the Codex/Claude transcript logs.
7. **Repair drift.** Run:
   ```bash
   ai-toolkit update --only hooks
   python3 scripts/ecosystem_doctor.py --tool claude-code --format text
   scripts/validate.py
   ```

## Verification

After changing rule-enforcement behavior, run at minimum:

```bash
bats tests/test_hooks.bats tests/test_search_first_flow.bats
bats tests/test_install.bats tests/test_codex.bats
python3 scripts/validate.py --strict
```

## Managing Plugins

```bash
ai-toolkit plugin list                             # show available packs
ai-toolkit plugin install --editor claude <name>  # install for Claude global target
ai-toolkit plugin install --editor codex <name>   # install for Codex global target
ai-toolkit plugin install --editor all --all      # install all 11 packs for both runtimes
ai-toolkit plugin update --editor all --all       # re-apply all installed packs after toolkit updates
ai-toolkit plugin clean <name>                    # prune data older than 90 days
ai-toolkit plugin clean <name> --days 30  # custom retention
ai-toolkit plugin remove --editor codex <name>    # remove from one runtime only
ai-toolkit plugin status --editor all             # show installed packs with runtime details
```

Install copies hooks/scripts, verifies agents+skills are linked, merges hooks into the selected runtime config, and runs init scripts. For Codex, the selected runtime is the global `HOME` layer (`~/AGENTS.md`, `~/.agents/`, `~/.codex/hooks.json`). Update removes and reinstalls from current source while preserving plugin data. Clean prunes old plugin data. Remove reverses install for the selected runtime but leaves plugin data intact. Core agents/skills are never removed.

Memory-pack auto-prunes observations older than 90 days on every session end (configurable via `MEMORY_RETENTION_DAYS`).

State is tracked per runtime in `~/.softspark/ai-toolkit/plugins.json`. After every `ai-toolkit update`, also run `ai-toolkit plugin update --editor all --all` if plugin packs are installed.

## Adding a KB Document

Follow the `documentation-standards` knowledge skill (`app/skills/documentation-standards/SKILL.md`) for full spec. Quick checklist:

1. **Choose category directory:** `kb/reference/`, `kb/howto/`, `kb/procedures/`, `kb/troubleshooting/`, `kb/best-practices/`, or `kb/planning/`
2. **Create file:** kebab-case name, no dates in filename
3. **Add frontmatter** with all 7 required fields: `title`, `category`, `service`, `tags`, `created`, `last_updated`, `description`
4. **Write in English**
5. **Validate:** `scripts/validate.py` (checks all `kb/**/*.md` frontmatter)

**Documents without valid frontmatter will fail `validate.py` and block CI.**

## Adding Scripts to Skills

1. Create `app/skills/<skill-name>/scripts/<script>.py` (stdlib only, JSON output)
2. `chmod +x` the script
3. Reference: `` python3 ${CLAUDE_SKILL_DIR}/scripts/script.py . ``

## Cross-Editor Verification (Mandatory)

**Every addition — skill, hook, MCP template, agent, rule — MUST be verified against all supported editors before merge.**

This toolkit targets 10 platforms. Each has its own config format, file path conventions, and runtime capabilities. A feature that works in Claude Code may silently break in Cursor, Codex, or Copilot if the editor's official spec diverges.

### Verification checklist

When adding or modifying any toolkit component:

1. **Check official docs** — before implementing, fetch the editor's current documentation (web search or Context7) to confirm the config format, file path, and feature support haven't changed
2. **Validate output format** — ensure the generated file matches what the editor expects (JSON schema, TOML structure, MDC frontmatter, directory naming)
3. **Test scope rules** — verify project-local vs global behavior matches the editor's own scope model
4. **Confirm feature parity** — if the feature relies on runtime primitives (hooks, MCP, agent delegation), check whether the target editor supports them; document gaps in `kb/reference/` if not

### Editor documentation sources

| Editor | Where to verify |
|--------|----------------|
| Claude Code | `docs.anthropic.com/claude-code` |
| Cursor | `docs.cursor.com` |
| Windsurf | `docs.codeium.com/windsurf` |
| GitHub Copilot | `docs.github.com/copilot` |
| Gemini CLI | `github.com/google-gemini/gemini-cli` |
| Cline | `github.com/cline/cline` |
| Roo Code | `github.com/RooVetGit/Roo-Code` |
| Aider | `aider.chat` |
| Augment | `docs.augmentcode.com` |
| Codex CLI | `github.com/openai/codex` |
| Google Antigravity | `developers.google.com/project-idx` |

### When to do this

- Adding a new skill → verify it renders correctly for Codex `.agents/skills/` and all directory-based editors
- Adding a new hook → verify event name is valid in Claude and check `.codex/hooks.json` compatibility
- Adding a new MCP template → verify it installs correctly for all 8 native adapters (`mcp_editors.py`)
- Modifying generator output → check that every editor-specific generator still produces valid output
- Adding a new editor → verify ALL existing features render correctly for the new target

### Anti-pattern

Do NOT assume an editor's format based on memory or past behavior. Editors ship breaking changes to their config surfaces. Always verify against current official docs before implementation.

---

## Quality Checks

```bash
scripts/validate.py           # agents, skills, hooks, core files, metadata counts
scripts/doctor.py             # install health, hooks, benchmark freshness, artifact drift diagnostics
scripts/benchmark_ecosystem.py --offline   # ecosystem benchmark snapshot
scripts/benchmark_ecosystem.py --dashboard-json --out benchmarks/ecosystem-dashboard.json
scripts/harvest_ecosystem.py --offline     # refresh machine-readable ecosystem harvest
scripts/evaluate_skills.py    # skill classification report
npm test                      # bats test suite (all workstreams)
```

Or via CLI:

```bash
ai-toolkit validate           # integrity check
ai-toolkit doctor             # install health diagnostics
ai-toolkit benchmark-ecosystem --offline   # benchmark snapshot
```

## Modifying Components

Changes propagate instantly to all machines via symlinks. After any change:

```bash
npm run generate:all          # FIRST: regenerate AGENTS.md, Codex rules, llms.txt, and platform configs
scripts/validate.py           # then validate — must pass before commit
npm test                      # then test — must pass before commit
```

Run `generate:all` before validate and test so that generated artifacts are current when
the metadata contract tests run. Directory-based rule generators now use ownership-aware
cleanup: repo regeneration manages only standard generated files, while `install/update`
manages standard, language, and custom overlays together. That keeps regeneration safe
without leaving stale standard artifacts behind. Committing without regenerating first
causes artifact drift and fails CI.

## Release Checklist

Follow this sequence before every `npm publish` / `git tag`:

### 1. Bump version

```bash
# Edit package.json version field (semver: X.Y.Z)
# Sync package-lock.json: npm install --package-lock-only
# Add entry to CHANGELOG.md
```

### 2. Regenerate all artifacts

```bash
npm run generate:all
```

### 3. Validate and test

```bash
npm run validate    # scripts/validate.py — agents, skills, counts
npm test            # full bats suite including metadata contracts and CLI tests
```

### 4. Verify counts are in sync

The metadata contract tests (`tests/test_metadata_contracts.bats`) catch drift
automatically. If they fail, fix the stale numbers before continuing.

### 5. Check for artifact drift

```bash
git diff --stat
```

Review the diff to confirm that all generated files (`AGENTS.md`, `llms.txt`, platform
configs) reflect the current state. If `generate:all` produced unexpected changes,
investigate before staging.

### 6. Commit and tag

```bash
git add -A
git commit -m "chore: release vX.Y.Z"
git tag vX.Y.Z
git push origin main --tags
```

The publish workflow (`.github/workflows/publish.yml`) picks up the tag, runs full
validation + tests, regenerates AGENTS.md + llms.txt, and publishes to npm.

## Model Tiers

| Agent Type | Model | Examples |
|-----------|-------|---------|
| Complex reasoning | opus | orchestrator, backend-specialist, security-auditor |
| Pattern-following | sonnet | documenter, explorer-agent, data-analyst |

## Uninstall

```bash
ai-toolkit uninstall    # strips toolkit components from ~/.claude/
```

What `uninstall` does:
- Removes per-file agent symlinks (user agents preserved)
- Removes per-directory skill symlinks (user skills preserved)
- Strips toolkit hook entries from `settings.json` (user hooks + settings preserved)
- Strips toolkit markers from `constitution.md` and `ARCHITECTURE.md` (user content preserved; empty files removed)
- `~/.claude/CLAUDE.md` preserved (contains your custom rules + toolkit rule markers)
- Empty `agents/` and `skills/` directories cleaned up

---

## kb/procedures/release-preparation-sop.md

---
title: "SOP: Release Preparation"
category: procedures
service: ai-toolkit
tags: [sop, release, version, publish, changelog, semver, provenance, sarif, ecosystem]
version: "1.10.1"
created: "2026-04-10"
last_updated: "2026-04-28"
description: "Step-by-step checklist for preparing a new ai-toolkit release — ecosystem-sync drift check, version sync, changelog, artifact regeneration, validation, and tagging. Run BEFORE every git tag. Includes mandatory Provenance, SARIF, and checksum-pin checks added in v2.8.0, the single-run npm test discipline added in v1.8.0, the ecosystem-sync gate added in v1.9.0, and the registry-vs-generators drift gate added in v1.10.0."
---

# SOP: Release Preparation

Complete checklist for preparing a new `@softspark/ai-toolkit` release.
Run this **before** tagging. After tagging and publishing, run the
[Release Verification SOP](release-verification-sop.md) to smoke-test.

**Pipeline:**
```
Ecosystem Sync SOP (drift check + generator updates)
      ↓
Release Preparation (this SOP)
      ↓
git tag → CI publish → Release Verification SOP
```

**Time:** 10-20 minutes (includes ecosystem sync review)

---

## Quick Checklist (TL;DR)

```bash
# 0. Ecosystem sync (mandatory for minor/major releases; optional for patch)
#    Full procedure: kb/procedures/ecosystem-sync-sop.md
python3 scripts/ecosystem_doctor.py --format text > /tmp/eco-report.txt
cat /tmp/eco-report.txt
# If drift detected: stop here, follow ecosystem-sync-sop.md Phase 2-4 to
# classify each drift (A-F), update generators as needed, refresh snapshot,
# THEN resume this SOP.
python3 scripts/ecosystem_doctor.py --update    # after all drift resolved

# 1. Decide version bump
#    patch (1.4.2 → 1.4.3): bugfix, typo, doc fix
#    minor (1.4.2 → 1.5.0): new feature, new skill, new flag, any ecosystem-class-B/F change
#    major (1.4.2 → 2.0.0): breaking change, any ecosystem-class-D removed path

# 2. Sync version across all files
python3 scripts/sync_version.py X.Y.Z          # if script exists, else manual

# 3. Write CHANGELOG.md entry
# 4. Regenerate artifacts
python3 scripts/generate_agents_md.py > AGENTS.md
python3 scripts/generate_codex_rules.py .
python3 scripts/generate_llms_txt.py > llms.txt
python3 scripts/generate_llms_txt.py --full > llms-full.txt

# 5. Validate + audit + SARIF + test + ecosystem check
python3 scripts/validate.py --strict && python3 scripts/audit_skills.py --ci && python3 scripts/audit_skills.py --sarif > /tmp/audit.sarif && npm test

# 5a. Supply-chain standard (v2.8.0+) — non-negotiable
grep -q -- '--provenance' .github/workflows/publish.yml || { echo "MISSING --provenance"; exit 1; }
grep -q 'id-token: write'   .github/workflows/publish.yml || { echo "MISSING id-token: write"; exit 1; }
python3 scripts/audit_skills.py --permissions   # review Bash/Write/Edit footprint

# 5b. Ecosystem gate — snapshot must be current before tag
python3 scripts/ecosystem_doctor.py --offline --check || { echo "STALE ecosystem snapshot — re-run doctor"; exit 1; }

# 6. Commit + tag + push
git add -A && git commit -m "chore: release vX.Y.Z"
git tag vX.Y.Z
git push origin main --tags
```

---

## Phase 0: Ecosystem Sync (MANDATORY for minor/major)

Before touching version numbers, confirm the toolkit is aligned with the current state of every editor / platform it integrates with. Skipping this phase ships a release whose generators may lag a month-old CLI refactor, a rename of `.cursorrules` to `.cursor/rules/`, or a new hook event we do not yet emit.

**When this phase is mandatory:**
- Minor release (X.Y.0) — always
- Major release (X.0.0) — always
- Patch release (X.Y.Z) — only if the patch touches a generator or install flow

**When to skip:** pure doc-only patches, SOP edits, internal refactors that do not touch `scripts/generate_*` or `app/skills/*/SKILL.md`.

### 0.1 Run the doctor

```bash
python3 scripts/ecosystem_doctor.py --format text | tee /tmp/eco-report.txt
```

Output classifies every registered tool as **Clean**, **Drift**, or **Errored**.

### 0.2 Act on drift

For each drifting tool, follow [ecosystem-sync-sop.md](ecosystem-sync-sop.md) Phase 2-4:

| Drift class | Release impact |
|-------------|----------------|
| A (cosmetic reword) | No version impact — refresh snapshot, continue |
| B (new feature — integrate) | **Minor** version bump at minimum; new generator or extended generator |
| C (new feature — not adopted) | No impact — note in registry |
| D (deprecation) | **Minor** or **major** depending on user impact; add migration warning |
| E (feature promoted to default) | **Minor**; simplify generator, keep fallback comment |
| F (feature newly globally available) | **Minor**; may require new generator or new config path |

If any B/D/E/F changes land in this preparation pass, mention them explicitly in the CHANGELOG entry (Phase 3) under a `Ecosystem` subsection.

### 0.3 Refresh snapshot

Once every drift is resolved (either by code change or by re-classifying as acceptable):

```bash
python3 scripts/ecosystem_doctor.py --update
```

This writes the new baseline to `benchmarks/ecosystem-doctor-snapshot.json`. Commit it as part of the release commit.

### 0.4 Gate

```bash
python3 scripts/ecosystem_doctor.py --offline --check
```

Must exit `0`. If it exits `1`, the snapshot is stale — rerun Phase 0.3 or review the remaining drift.

---

## Phase 1: Determine Version Bump

Follow [Semantic Versioning](https://semver.org/):

| Change Type | Bump | Examples |
|-------------|------|---------|
| Bugfix, typo, doc-only | **patch** | Fix install flag, correct description |
| New feature, skill, agent, flag | **minor** | Add `/hipaa-validate`, add `--output json` |
| Breaking CLI change, removed skill, config format change | **major** | Rename `install` to `setup`, remove skill |

**Rule:** When in doubt, bump minor.

---

## Phase 2: Sync Version in All Files

The canonical version lives in `package.json`. These files **must** match:

### Mandatory sync (every release)

| File | Field | How to update |
|------|-------|---------------|
| `package.json` | `"version": "X.Y.Z"` | Edit directly |
| `manifest.json` | `"version": "X.Y.Z"` | Edit directly |
| `app/.claude-plugin/plugin.json` | `"version": "X.Y.Z"` | Edit directly |

### Auto-synced (no manual action)

| File | Mechanism |
|------|-----------|
| `package-lock.json` | Regenerated by `npm install --package-lock-only` |

### Conditional sync (only if the doc was modified in this release)

| File | Field | When to update |
|------|-------|---------------|
| `kb/procedures/maintenance-sop.md` | frontmatter `version:` | If SOP content changed |
| `kb/reference/skills-catalog.md` | frontmatter `version:` | If skills added/removed |
| `kb/reference/agents-catalog.md` | frontmatter `version:` | If agents added/removed |
| `kb/reference/hooks-catalog.md` | frontmatter `version:` | If hooks changed |
| `kb/reference/architecture-overview.md` | frontmatter `version:` | If architecture changed |
| `kb/reference/distribution-model.md` | frontmatter `version:` | If install model changed |
| `kb/reference/global-install-model.md` | frontmatter `version:` | If install model changed |

> **Note:** KB `version:` fields track the **document version**, not the toolkit version.
> Only bump them when the document content actually changes in this release.

### Count sync (if skills/agents/hooks changed)

| File | What to check |
|------|---------------|
| `package.json` | `"description"` — skill/agent count |
| `README.md` | Badge counts, "What You Get" table |
| `app/ARCHITECTURE.md` | Section headings with counts |

> **Tip:** `validate.py --strict` catches count drift AND version mismatches
> (package.json vs manifest.json vs plugin.json) automatically.
> If validation passes, counts and versions are correct.

### Verification command

After syncing, verify all mandatory files match:

```bash
VERSION=$(python3 -c "import json; print(json.load(open('package.json'))['version'])")
echo "Target: $VERSION"
echo "manifest.json:     $(python3 -c "import json; print(json.load(open('manifest.json'))['version'])")"
echo "plugin.json:       $(python3 -c "import json; print(json.load(open('app/.claude-plugin/plugin.json'))['version'])")"
echo "package-lock.json: $(python3 -c "import json; print(json.load(open('package-lock.json'))['version'])")"
```

All four must print the same version. If not, fix before proceeding.

---

## Phase 3: Write CHANGELOG Entry

Add entry at the top of `CHANGELOG.md` (after the header, before previous release):

```markdown
## vX.Y.Z — Short Title (YYYY-MM-DD)

### Added
- **Feature name** — description

### Changed
- **What changed** — old behavior → new behavior

### Fixed
- **Bug description** — what was broken and how it's fixed

### Removed
- **What was removed** — migration path if any
```

**Rules:**
- Use **bold** for feature names
- Start descriptions with a verb (Added, Changed, Fixed, Removed)
- Reference skill names with backticks and slash: `/hipaa-validate`
- Include script names: `scripts/hipaa_scan.py`
- Include count changes: `Skill count: 91 → 92`
- Date format: `YYYY-MM-DD`
- Title: short, descriptive, no version number repetition

### Update README "What's New" section

**MANDATORY on every release.** Update the `## What's New in vX.Y.Z` section in `README.md`:

1. Change the heading version: `## What's New in vX.Y.Z`
2. Replace bullet points with 3-5 highlights from this release
3. **Keep only the latest version block.** Delete the previous `## What's New in vA.B.C` section(s). README is the shop window, not the archive — users see the current release, full history lives in `CHANGELOG.md`.
4. Keep the `See [CHANGELOG.md](CHANGELOG.md) for full history.` link directly below the bullet list.

> **Warning:** This section is the first thing users see after the badges.
> A stale version here (e.g., "What's New in v2.1.3" when shipping v2.3.0)
> signals an unmaintained project. Do NOT skip this step.

> **Single-version rule:** README.md must contain **exactly one** `## What's New in vX.Y.Z` heading at any time. If you find multiple stacked (e.g. v2.6.1 + v2.6.0 + v2.5.0), that is a SOP drift — collapse to the latest on the next release commit.

---

## Phase 4: Regenerate Artifacts

```bash
python3 scripts/generate_agents_md.py > AGENTS.md
python3 scripts/generate_codex_rules.py .
python3 scripts/generate_llms_txt.py > llms.txt
python3 scripts/generate_llms_txt.py --full > llms-full.txt
```

Check if anything actually changed:

```bash
git diff --stat AGENTS.md llms.txt llms-full.txt
```

If no diff, the artifacts are already current. If there is a diff, stage them.

---

## Phase 5: Validate, Audit, Test

Run the full quality gate:

```bash
python3 scripts/validate.py --strict
python3 scripts/audit_skills.py --ci
python3 scripts/audit_skills.py --sarif > audit.sarif       # MANDATORY — GHAS ingest
python3 scripts/audit_skills.py --permissions               # review Bash/Write/Edit footprint

# Registry / generator drift (added in 1.10.0). Meta-generators excluded.
META="generate_agents_md.py|generate_llms_txt.py|generate_language_rules_skills.py"
diff \
  <(grep -oE 'scripts/generate_[a-z_]+\.py' kb/reference/supported-tools-registry.md | sort -u) \
  <(ls scripts/generate_*.py | grep -vE "$META" | sort -u) \
  && echo "OK: registry matches filesystem" \
  || { echo "DRIFT: update supported-tools-registry.md before tagging"; exit 1; }

# Run npm test ONCE, cache output, parse from file. The suite is 900+ bats
# cases — rerunning it per check wastes minutes. Do not pipe npm test into
# tail/grep multiple times in the same session.
npm test > /tmp/npm-test.log 2>&1
tail -3 /tmp/npm-test.log
echo "ok: $(grep -c '^ok ' /tmp/npm-test.log) | not ok: $(grep -c '^not ok' /tmp/npm-test.log)"
```

**Expected results:**
- `validate.py`: `Errors: 0 | Warnings: 0 | VALIDATION PASSED`
- `audit_skills.py --ci`: `HIGH: 0 | WARN: 0` (INFO is acceptable)
- `audit_skills.py --sarif`: valid JSON, non-empty `runs[0].tool.driver.rules`
- `audit_skills.py --permissions`: review `Skills with Bash + Write + Edit` list — any newly-added skill with broad access MUST be justified in the CHANGELOG entry
- Registry drift: `OK: registry matches filesystem`. If `DRIFT:` appears, add the missing `scripts/generate_*.py` rows to `kb/reference/supported-tools-registry.md` before tagging.
- `npm test`: `1..N` with zero `not ok` (read from the cached `/tmp/npm-test.log`, do not rerun)

**One-liner:**
```bash
python3 scripts/validate.py --strict && python3 scripts/audit_skills.py --ci && python3 scripts/audit_skills.py --sarif > audit.sarif && diff <(grep -oE 'scripts/generate_[a-z_]+\.py' kb/reference/supported-tools-registry.md | sort -u) <(ls scripts/generate_*.py | grep -vE 'generate_agents_md\.py|generate_llms_txt\.py|generate_language_rules_skills\.py' | sort -u) && npm test
```

**If tests fail:** Fix the issue, do NOT skip. Common failures:
- Stale counts → re-run `generate:all` or fix README/ARCHITECTURE
- Missing frontmatter → add to new KB docs
- Broken symlink → `ai-toolkit doctor --fix`

### Phase 5a: Supply-Chain Hardening Verification (v2.8.0+)

These checks enforce the security standard introduced in v2.8.0. Do NOT tag a release until all pass.

**1. Publish workflow emits provenance:**

```bash
grep -E '\-\-provenance|id-token: write' .github/workflows/publish.yml
```

- [ ] Both markers present (`--provenance` flag + `id-token: write` permission)
- [ ] Any PR that changes `publish.yml` REQUIRES an approved security review

**2. URL-sourced rules and hooks are checksum-pinned:**

```bash
# On a machine that has consumed URL rules/hooks at least once:
jq 'to_entries | map(select(.value.url != null and (.value.sha256 // "" | length) == 0))' ~/.softspark/ai-toolkit/rules/sources.json
jq 'to_entries | map(select(.value.url != null and (.value.sha256 // "" | length) == 0))' ~/.softspark/ai-toolkit/hooks/external/sources.json
```

- [ ] Both queries return empty arrays (every URL entry has a `sha256`)
- [ ] If not, run `ai-toolkit update` to backfill missing hashes before tagging

**3. Audit SARIF output is well-formed:**

```bash
python3 scripts/audit_skills.py --sarif | python3 -c "import json, sys; d=json.load(sys.stdin); assert d['version']=='2.1.0' and d['runs'][0]['tool']['driver']['name']; print('SARIF OK')"
```

- [ ] Prints `SARIF OK`
- [ ] If the script ever grows new rule classes, extend the SARIF `rules[]` coverage before releasing

**4. Strict-pin mode passes on CI** (optional, recommended for stable branches):

```bash
AI_TOOLKIT_STRICT_PIN=1 ai-toolkit update --dry-run
```

- [ ] Exit 0, no `CHECKSUM CHANGED` line
- [ ] Any unexpected upstream change blocks the release until explicitly approved

---

## Phase 6: Commit

Stage all release files:

```bash
git add package.json manifest.json app/.claude-plugin/plugin.json
git add package-lock.json
git add CHANGELOG.md
git add AGENTS.md llms.txt llms-full.txt
git add -p  # review and stage any other changes
```

Commit:

```bash
git commit -m "chore: release vX.Y.Z"
```

---

## Phase 7: Tag and Push

```bash
git tag vX.Y.Z
git push origin main --tags
```

This triggers `.github/workflows/publish.yml` which:
1. Runs `validate.py --strict`
2. Runs `npm test`
3. Publishes to npm as `@softspark/ai-toolkit@X.Y.Z` with `--provenance` (SLSA v1 build attestation)

**Provenance is non-negotiable.** If `id-token: write` permission or the `--provenance` flag is missing from `publish.yml`, fix it BEFORE tagging — an unsigned release is a regression against the v2.8.0 standard.

**After CI completes:** Run the [Release Verification SOP](release-verification-sop.md)
to smoke-test the published package AND verify the provenance attestation landed on npm.

---

## Rollback

If a bad release was published:

```bash
# Unpublish from npm (within 72h)
npm unpublish @softspark/ai-toolkit@X.Y.Z

# Or deprecate (preferred — doesn't break existing installs)
npm deprecate @softspark/ai-toolkit@X.Y.Z "Known issue: <description>. Use vA.B.C instead."

# Delete tag
git tag -d vX.Y.Z
git push origin --delete vX.Y.Z
```

---

## Checklist Summary

| # | Step | Command / Action | Pass Criteria |
|---|------|-----------------|---------------|
| 0a | Ecosystem drift check | `ecosystem_doctor.py --format text` | All tools Clean, or drift classified and resolved |
| 0b | Ecosystem snapshot refresh | `ecosystem_doctor.py --update` | `benchmarks/ecosystem-doctor-snapshot.json` updated |
| 0c | Ecosystem gate | `ecosystem_doctor.py --offline --check` | Exit 0 |
| 1 | Version bump type | Decide patch/minor/major | — |
| 2 | `package.json` version | Edit `"version"` | Matches target |
| 3 | `manifest.json` version | Edit `"version"` | Matches target |
| 4 | `plugin.json` version | Edit `"version"` | Matches target |
| 5 | `package-lock.json` | `npm install --package-lock-only` | Matches target |
| 6 | Count sync | Check `package.json` description, README | `validate.py` passes |
| 7 | CHANGELOG.md | Add release entry (incl. `Ecosystem` subsection if any B/D/E/F drift) | Entry exists for vX.Y.Z |
| 8 | Regenerate artifacts | `generate_agents_md.py`, `generate_codex_rules.py`, `generate_llms_txt.py` | No unexpected diff |
| 9 | Validate | `validate.py --strict` | 0 errors, 0 warnings |
| 10 | Security audit (CI mode) | `audit_skills.py --ci` | 0 HIGH |
| 11 | Security audit (SARIF) | `audit_skills.py --sarif` | Valid SARIF 2.1.0 JSON |
| 12 | Per-skill permissions | `audit_skills.py --permissions` | New broad-access skills justified in CHANGELOG |
| 13 | Provenance flag check | `grep -- '--provenance' .github/workflows/publish.yml` | Present |
| 14 | Checksum-pin backfill | `sources.json` entries all have `sha256` | No unpinned URL sources |
| 15 | Tests | `npm test` | All pass |
| 16 | Commit | `git commit` | Clean working tree |
| 17 | Tag | `git tag vX.Y.Z` | Tag exists |
| 18 | Push | `git push origin main --tags` | CI triggered with `id-token: write` |

---

## kb/procedures/release-verification-sop.md

---
title: "SOP: Release Verification"
category: procedures
service: ai-toolkit
tags: [sop, verification, release, smoke-test, install, update, qa, provenance, sarif]
version: "1.4.3"
created: "2026-04-08"
last_updated: "2026-05-19"
description: "End-to-end smoke test after installing or updating @softspark/ai-toolkit — verifies CLI, install, doctor, validation, tests, eject, npm provenance attestation, SARIF audit, and per-skill permissions. Reflects the v2.8.0 supply-chain standard. v1.3.0 added the single-run npm test discipline; v1.4.0 adds v3.0.0 deep-coverage checks (--profile full, --codex-skills, breaking-change surfaces, idempotence, registry drift, live-JSON parse) and refreshes stale thresholds. v1.4.2 makes the Phase 9.4 idempotence check deterministic by sorting file paths before hashing. v1.4.3 tightens the Phase 8.4 URL pin check so the success-message count includes only entries with a `url:` field, not local `path:` entries, and documents the `sources.json` envelope shape."
---

# SOP: Release Verification

End-to-end smoke test after installing or updating `@softspark/ai-toolkit`.
Verifies all critical paths from the user's perspective.

**Use this SOP when:**
- After `npm install -g @softspark/ai-toolkit@latest`
- After `ai-toolkit update`
- Before tagging a new version (`git tag`)
- Before publishing to npm (`npm publish`)
- As a smoke test in CI/CD

**Prerequisites:**
- Node.js >= 18, Python 3, `bats`, git
- `@softspark/ai-toolkit` installed globally

**Time:** 10-15 minutes (full), 2 minutes (quick checklist)

---

## Quick Checklist (TL;DR)

13 commands — if all pass, the release is ready:

```bash
# Pre-commit (Phase 0)
python3 scripts/generate_agents_md.py > AGENTS.md           # 1. Regenerate AGENTS.md
python3 scripts/generate_codex_rules.py .                   # 2. Refresh standard Codex rules
python3 scripts/generate_llms_txt.py > llms.txt             # 3. Regenerate llms.txt
python3 scripts/validate.py --strict                        # 4. Validation passed?
npm test > /tmp/npm-test.log 2>&1 && grep -c '^ok ' /tmp/npm-test.log && ! grep -q '^not ok' /tmp/npm-test.log  # 5. All tests passed? (single run, cached)

# Post-install verification (Phases 1-7)
ai-toolkit --version                                        # 6. Version OK?
ai-toolkit status                                           # 7. Status OK?
ai-toolkit doctor                                           # 8. Health check passed?
ai-toolkit install --dry-run                                # 9. Global install OK?
python3 scripts/audit_skills.py --ci                        # 10. Security audit clean?

# Supply-chain verification (Phase 8, v2.8.0+)
python3 scripts/audit_skills.py --sarif | python3 -c "import json,sys; assert json.load(sys.stdin)['version']=='2.1.0'; print('SARIF OK')"   # 11. SARIF 2.1.0 well-formed?
python3 scripts/audit_skills.py --permissions | head -30    # 12. Broad-access skills reviewed?
npm view @softspark/ai-toolkit@X.Y.Z --json | python3 -c "import json,sys; d=json.load(sys.stdin); assert d['dist']['attestations']['provenance']['predicateType']=='https://slsa.dev/provenance/v1'; print('PROVENANCE OK')"   # 13. Provenance attested on npm?

# Deep-coverage verification (Phase 9, v3.0.0+)
META="generate_agents_md.py|generate_llms_txt.py|generate_language_rules_skills.py"
diff <(grep -oE 'scripts/generate_[a-z_]+\.py' kb/reference/supported-tools-registry.md | sort -u) <(ls scripts/generate_*.py | grep -vE "$META" | sort -u) && echo "OK: registry matches"   # 14. Registry <-> generators drift?
```

---

## Phase 0: Pre-Commit & Pre-Push (2 min)

Run these commands **before every commit and push to main**. CI validates
counts but does NOT auto-regenerate — you must do it locally.

```bash
# 1. Regenerate generated artifacts
python3 scripts/generate_agents_md.py > AGENTS.md
python3 scripts/generate_codex_rules.py .
python3 scripts/generate_llms_txt.py > llms.txt
python3 scripts/generate_llms_txt.py --full > llms-full.txt

# 2. Validate everything (catches stale counts, missing assets)
python3 scripts/validate.py --strict

# 3. Security audit
python3 scripts/audit_skills.py --ci

# 4. Run tests
npm test

# 5. Stage and commit
git add AGENTS.md .agents/rules/ai-toolkit-*.md llms.txt llms-full.txt
git add -p  # stage your other changes
git commit -m "feat: your change description"
```

**Why local?** Branch protection on `main` requires PRs and status checks.
CI cannot push directly to `main`, so generated artifacts must be committed
by the developer as part of their PR.

**One-liner (copy-paste):**
```bash
python3 scripts/generate_agents_md.py > AGENTS.md && python3 scripts/generate_codex_rules.py . && python3 scripts/generate_llms_txt.py > llms.txt && python3 scripts/generate_llms_txt.py --full > llms-full.txt && python3 scripts/validate.py --strict && python3 scripts/audit_skills.py --ci && npm test
```

---

## Phase 1: CLI & Version (1 min)

```bash
ai-toolkit --version
ai-toolkit --help
which ai-toolkit
```

**Verify:**
- [ ] `--version` returns correct semver (e.g., `1.4.0`)
- [ ] `--help` displays full command list without errors
- [ ] `which` points to global npm bin path

---

## Phase 2: Global Install & Status (2 min)

```bash
ai-toolkit install --dry-run
ai-toolkit status
```

**Verify `--dry-run`:**
- [ ] Agents >= 44
- [ ] Skills >= 99
- [ ] Hooks merged into settings.json
- [ ] "Other AI Tools" section lists documented global targets only: aider, augment, cline, codex, gemini, opencode, roo, windsurf (Cursor, Copilot, Antigravity via --local for rules)

**Verify `status`:**
- [ ] Version matches expected
- [ ] Profile: minimal/standard/strict
- [ ] Modules: list of installed modules
- [ ] Latest: up to date / update available

---

## Phase 3: Doctor Health Check (1 min)

```bash
ai-toolkit doctor
```

**Expected sections (all OK):**
- Environment: node, bash, python3, bats
- Global Install: .claude exists, agents/skills symlinks (0 broken), settings.json hooks
- Hook Scripts: all present and executable
- Hook Configuration: 14 events registered
- Generated Artifacts: AGENTS.md, llms.txt, llms-full.txt
- Planned Assets: plugin.json, benchmarks, plugin packs
- Benchmark Freshness: < 30 days
- Stale Rules: all healthy

**Verify:**
- [ ] `Errors: 0 | Warnings: 0`
- [ ] `HEALTH CHECK PASSED`

If doctor detects problems: `ai-toolkit doctor --fix` auto-repairs
(broken symlinks, non-executable hooks, missing scripts, missing llms-full.txt).

---

## Phase 4: Local Install (2 min)

```bash
mkdir -p /tmp/ai-toolkit-verify && cd /tmp/ai-toolkit-verify
git init -q
ai-toolkit install --local --editors all --dry-run
cd - && rm -rf /tmp/ai-toolkit-verify
```

**Verify "Project-local" section:**
- [ ] Would create: CLAUDE.md
- [ ] Would create: .claude/settings.local.json
- [ ] Would inject: .claude/constitution.md
- [ ] Editors: all 11 listed (copilot, cursor, windsurf, cline, roo, aider, augment, antigravity, codex, gemini, opencode)
- [ ] Would generate configs for each editor (legacy + directory-based)
- [ ] Would install: .git/hooks/pre-commit
- [ ] Would inject language rules (auto-detected)

**Also test auto-detect (no --editors flag):**
```bash
ai-toolkit install --local --dry-run
# → Editors: none (empty project has no existing configs)
```

---

## Phase 5: Validation & Security Audit (3 min)

```bash
python3 scripts/validate.py --strict
python3 scripts/audit_skills.py --ci
```

**Verify validate.py:**
- [ ] Agents >= 44, Skills >= 99, Tests >= 900
- [ ] Hook events: 14, Hook scripts: >= 30
- [ ] Plugin packs >= 10, KB documents >= 20
- [ ] `Errors: 0 | Warnings: 0` → `VALIDATION PASSED`

**Verify audit_skills.py:**
- [ ] `HIGH: 0` (MUST be zero — CI fails otherwise)
- [ ] `WARN: 0`
- [ ] `INFO: N` (acceptable — broad-access skills: orchestrate, swarm, workflow)

---

## Phase 6: Tests (3-5 min)

```bash
# Run ONCE, capture to file, then parse. Full suite is 669+ bats cases —
# re-running it per check (tail / grep ok / grep not ok piped separately)
# wastes minutes every release. Always cache the output.
npm test > /tmp/npm-test.log 2>&1
exit=$?
tail -3 /tmp/npm-test.log
echo "ok:     $(grep -c '^ok '    /tmp/npm-test.log)"
echo "not ok: $(grep -c '^not ok' /tmp/npm-test.log)"
echo "exit:   $exit"
```

**Verify:**
- [ ] `exit == 0`
- [ ] `ok == expected test count` (e.g., 945 on v3.0.0)
- [ ] `not ok == 0`
- [ ] Bats runs tests in parallel (4 jobs)
- [ ] Groups: agents, autodetect, cli, generators, guards, hooks, inject,
      install, kb, mcp, readme, profiles, uninstall, validate

**Anti-pattern — do NOT do this:**
```bash
# Runs the full suite THREE times. Adds 1-3 min and pressures CI capacity.
npm test 2>&1 | tail -3
npm test 2>&1 | grep -c '^ok '
npm test 2>&1 | grep -c '^not ok'
```

**Key test areas:**
- Guards: rm -rf, DROP TABLE, git push --force blocked
- Install: idempotent, profiles, --only/--skip, orphan cleanup
- Eject: real files (not symlinks), inlined rules
- Uninstall: removes toolkit, preserves user content

---

## Phase 7: Eject (1 min)

```bash
mkdir -p /tmp/ai-toolkit-eject-test
cd /tmp/ai-toolkit-eject-test
ai-toolkit eject
cd - && rm -rf /tmp/ai-toolkit-eject-test
```

**Verify:**
- [ ] Agents copied as real files (not symlinks)
- [ ] Skills copied as real directories
- [ ] Rules inlined into CLAUDE.md
- [ ] constitution.md and ARCHITECTURE.md copied
- [ ] `output-styles/` directory present (v2.7.1+)

---

## Phase 8: Supply-Chain Verification (2 min, v2.8.0+)

These checks enforce the v2.8.0 security standard on a freshly-published release. Run AFTER the `publish.yml` workflow completes on the tag.

### 8.1 Provenance attestation on npm

```bash
VERSION="X.Y.Z"                                           # the tag just published
npm view "@softspark/ai-toolkit@${VERSION}" --json \
  | python3 -c "import json, sys; d=json.load(sys.stdin); att=d['dist'].get('attestations', {}); assert att.get('provenance', {}).get('predicateType') == 'https://slsa.dev/provenance/v1', f'NO PROVENANCE for {d[\"version\"]}'; print(f'PROVENANCE OK: {att[\"url\"]}')"
```

**Verify:**
- [ ] Exit 0 and prints `PROVENANCE OK: https://registry.npmjs.org/...`
- [ ] `https://www.npmjs.com/package/@softspark/ai-toolkit/v/${VERSION}` shows the green "Provenance" badge

**If provenance is missing:** the `publish.yml` ran without `id-token: write` or `--provenance`. Restore them and cut a patch release — a silently unsigned publish is a regression against the v2.8.0 standard.

### 8.2 Audit SARIF output (for GHAS ingest)

```bash
python3 scripts/audit_skills.py --sarif > /tmp/audit.sarif
python3 -c "import json; d=json.load(open('/tmp/audit.sarif')); assert d['version']=='2.1.0' and d['runs'][0]['tool']['driver']['name']=='ai-toolkit-audit-skills'; print(f'SARIF OK: {len(d[\"runs\"][0][\"results\"])} results across {len(d[\"runs\"][0][\"tool\"][\"driver\"][\"rules\"])} rules')"
```

**Verify:**
- [ ] Valid SARIF 2.1.0
- [ ] In the publishing repo, the CI job uploads `audit.sarif` via `github/codeql-action/upload-sarif@v3` so findings appear in the Security tab

### 8.3 Per-skill permissions report

```bash
python3 scripts/audit_skills.py --permissions | head -40
```

**Verify:**
- [ ] Bash skill count has NOT jumped unexpectedly since the previous release
- [ ] Any newly added entry under `Skills with Bash + Write + Edit` matches a CHANGELOG bullet that justifies the broad scope
- [ ] JSON form (`--permissions --json`) is available for automated drift dashboards

### 8.4 URL-sourced rules/hooks are checksum-pinned

`sources.json` is an envelope of the form `{"schema_version": 1, "rules"|"hooks": {...}}`, so the jq filter must pick the nested map before piping into the pin assertion. The URL-count in the success message ignores local `path:`-only entries — only entries with a `url:` field are pinned and counted.

```bash
jq '.rules // .hooks // {}' ~/.softspark/ai-toolkit/rules/sources.json 2>/dev/null \
  | python3 -c "import json, sys; d=json.load(sys.stdin) or {}; bad=[n for n,v in d.items() if v.get('url') and not v.get('sha256')]; assert not bad, f'UNPINNED: {bad}'; url_n=sum(1 for v in d.values() if v.get('url')); print(f'RULE PIN OK: {url_n} URL rules, all with sha256')"
jq '.hooks // {}' ~/.softspark/ai-toolkit/hooks/external/sources.json 2>/dev/null \
  | python3 -c "import json, sys; d=json.load(sys.stdin) or {}; bad=[n for n,v in d.items() if v.get('url') and not v.get('sha256')]; assert not bad, f'UNPINNED: {bad}'; url_n=sum(1 for v in d.values() if v.get('url')); print(f'HOOK PIN OK: {url_n} URL hooks, all with sha256')"
```

**Verify:**
- [ ] Both commands print `... PIN OK`
- [ ] If any entry is unpinned, the `register_url_source()` call missed passing `content=` — fix the call-site and retag

### 8.5 Strict-pin smoke test (optional but recommended)

```bash
AI_TOOLKIT_STRICT_PIN=1 ai-toolkit update --dry-run
```

**Verify:**
- [ ] Exit 0, no `CHECKSUM CHANGED` line
- [ ] If a checksum change was intentional (e.g. upstream rule update), document it in the CHANGELOG entry before tagging

---

## Phase 9: Deep-Coverage Checks (v3.0.0+)

These verify the native-surface generators shipped in v3.0.0 actually emit the right files for the right profiles, and that the tool registry stays in sync with shipped generators.

> **Safety warning — HOME-scoped writes:** Running `--profile full` with `augment` in the editor list writes to `$HOME/.augment/settings.json` (Augment stores hooks under HOME, not per-project). Use `--dry-run` for verification unless you intend to carry ai-toolkit hook entries on this machine. The generator is marker-safe (only rewrites its own `_source: ai-toolkit` entries) but is still a side-effect.

### 9.1 `--profile full` emits every native surface

```bash
D=/tmp/aitk-profile-full-${RANDOM} && mkdir -p "$D" && cd "$D" && git init -q
ai-toolkit install --local --editors cursor,windsurf,gemini,augment,codex \
  --profile full --codex-skills --dry-run 2>&1 \
  | grep -E "\\.cursor/(hooks\\.json|agents)|\\.windsurf/hooks\\.json|\\.gemini/(settings\\.json|commands)|\\.augment/(agents|commands)|\\.agents/skills"
```

**Verify** — at least the following lines appear:
- [ ] `.cursor/hooks.json` and `.cursor/agents/`
- [ ] `.windsurf/hooks.json`
- [ ] `.gemini/settings.json` hooks AND `.gemini/commands/`
- [ ] `.augment/agents/` + `.augment/commands/` + `$HOME/.augment/settings.json`
- [ ] `.agents/skills/` (Codex native discovery path; refreshed by `--codex-skills`)

### 9.2 `--codex-skills` is orthogonal to `--profile`

```bash
D=/tmp/aitk-codex-skills-${RANDOM} && mkdir -p "$D" && cd "$D" && git init -q
ai-toolkit install --local --editors codex --profile standard --codex-skills --dry-run 2>&1 \
  | grep -q "Would refresh: .agents/skills" && echo "OK: --codex-skills refreshes .agents/skills without --profile full"
ai-toolkit install --local --editors codex --profile full --dry-run 2>&1 \
  | grep -q "Would generate: .agents/skills" && echo "OK: Codex skills use .agents/skills at profile full"
```

**Verify:**
- [ ] `--codex-skills` refreshes `.agents/skills/` at any profile
- [ ] `--profile full` never emits `.codex/skills/`; Codex skills use `.agents/skills/`

### 9.3 Breaking-change surfaces land on `--profile standard`

v3.0.0 moved two surfaces from opt-in to default:
- Copilot directory layout (`.github/instructions/`, `.github/prompts/`)
- Gemini hooks (`.gemini/settings.json`)

```bash
D=/tmp/aitk-breaking-${RANDOM} && mkdir -p "$D" && cd "$D" && git init -q
ai-toolkit install --local --editors copilot,gemini --profile standard --dry-run 2>&1 \
  | tee /tmp/aitk-breaking.log
grep -q "\\.github/instructions/" /tmp/aitk-breaking.log && echo "OK: Copilot dir layout at standard"
grep -q "\\.gemini/settings\\.json hooks" /tmp/aitk-breaking.log && echo "OK: Gemini hooks at standard"
```

**Verify both lines print `OK:`**. If either is missing, a regression has unwound the v3.0.0 breaking change.

### 9.4 Install is idempotent

```bash
D=/tmp/aitk-idem-${RANDOM} && mkdir -p "$D" && cd "$D" && git init -q
# Sort file paths before hashing — find traversal order follows inode order,
# which can shift between runs even when content is byte-identical, producing
# false FAIL signals.
ai-toolkit install --local --editors cursor,gemini --profile full >/dev/null 2>&1
SHA1=$(find .cursor .gemini -type f -print0 | LC_ALL=C sort -z | xargs -0 shasum | shasum | awk '{print $1}')
ai-toolkit install --local --editors cursor,gemini --profile full >/dev/null 2>&1
SHA2=$(find .cursor .gemini -type f -print0 | LC_ALL=C sort -z | xargs -0 shasum | shasum | awk '{print $1}')
[ "$SHA1" = "$SHA2" ] && echo "OK: idempotent" || echo "FAIL: install is not idempotent"
```

**Verify:** prints `OK: idempotent`. A second run must produce byte-identical files in every managed path.

### 9.5 Live-install JSON outputs parse

The bats suite validates JSON shape at generation time. This re-checks that what actually landed on disk after a live install parses without errors.

```bash
D=/tmp/aitk-json-${RANDOM} && mkdir -p "$D" && cd "$D" && git init -q
ai-toolkit install --local --editors cursor,windsurf,gemini,augment --profile full >/dev/null 2>&1
for f in .cursor/hooks.json .windsurf/hooks.json .gemini/settings.json $HOME/.augment/settings.json; do
  [ -f "$f" ] && python3 -c "import json; json.load(open('$f'))" && echo "OK: $f"
done
```

**Verify:** each emitted file prints `OK: <path>`. Any `json.decoder.JSONDecodeError` means the merge logic corrupted the output.

### 9.6 Registry / generator drift check

`kb/reference/supported-tools-registry.md` should enumerate every per-editor `scripts/generate_*.py` we ship. Meta-generators (`generate_agents_md.py`, `generate_llms_txt.py`) are excluded — they produce docs/artifacts, not editor configs.

```bash
META="generate_agents_md.py|generate_llms_txt.py|generate_language_rules_skills.py"
REG=$(grep -oE 'scripts/generate_[a-z_]+\.py' kb/reference/supported-tools-registry.md | sort -u)
FS=$(ls scripts/generate_*.py | grep -vE "$META" | sort -u)
diff <(echo "$REG") <(echo "$FS") && echo "OK: registry matches filesystem" || echo "DRIFT: update supported-tools-registry.md"
```

**Verify:** prints `OK: registry matches filesystem`. If not, add the missing rows to the registry before tagging the next release.

---

## Troubleshooting

### `ai-toolkit: command not found`

```bash
npm install -g @softspark/ai-toolkit
# or check PATH:
export PATH="$(npm config get prefix)/bin:$PATH"
```

### Doctor: broken symlinks

```bash
ai-toolkit doctor --fix    # auto-repair
ai-toolkit update          # or full re-install
```

### Tests fail: missing bats

```bash
brew install bats-core     # macOS
npm install -g bats        # cross-platform
```

### validate.py: stale counts

README badges don't match the current agents/skills/tests counts.
Update README.md and re-run.

### Eject: missing skills

```bash
ai-toolkit update          # re-link missing symlinks
ai-toolkit eject /tmp/test # retry
```

---

## Success Criteria

| Area | Criterion |
|------|-----------|
| CLI | `--version` correct, `--help` full list, `status` current |
| Health | `doctor`: 0 errors, 0 warnings, PASSED |
| Install | `--dry-run` correct counts, `--local` all configs |
| Quality | `validate.py --strict`: PASSED |
| Security (baseline) | `audit_skills.py --ci`: 0 HIGH |
| Security (SARIF) | `audit_skills.py --sarif`: valid SARIF 2.1.0 with non-empty rules array |
| Security (permissions) | `audit_skills.py --permissions`: broad-access skills unchanged or justified in CHANGELOG |
| Supply chain | `dist.attestations.provenance.predicateType == https://slsa.dev/provenance/v1` on npm |
| Supply chain | All `sources.json` URL entries carry a `sha256`; `AI_TOOLKIT_STRICT_PIN=1 ai-toolkit update --dry-run` passes |
| Tests | `npm test`: N/N passed, 0 failures |
| Eject | Standalone `.claude/` with real files AND `output-styles/` directory |
| Guards | Destructive commands blocked |
| Deep coverage | `--profile full` emits all 9 v3.0.0 native surfaces; `--codex-skills` works orthogonally |
| Breaking changes | Copilot directory layout + Gemini hooks emit at `--profile standard` (v3.0.0 contract) |
| Idempotence | Second `install` run produces byte-identical output in every managed path |
| Live JSON | Every generated `.json` file on disk parses as valid JSON |
| Registry | `supported-tools-registry.md` enumerates every `scripts/generate_*.py` we ship |

---

## kb/reference/agents-catalog.md

---
title: "AI Toolkit - Agents Catalog"
category: reference
service: ai-toolkit
tags: [agents, catalog, roles, ai-development]
version: "1.4.2"
created: "2026-03-23"
last_updated: "2026-04-09"
description: "Complete catalog of specialized agents with roles, models, and use cases."
---

# Agents Catalog

## By Category

### Orchestration & Planning (4)

| Agent | Model | Use Case |
|-------|-------|----------|
| **orchestrator** | opus | Multi-agent coordination, 3+ agents per task |
| **project-planner** | opus | Task breakdown, dependency graphs, file structure |
| **product-manager** | opus | Requirements, user stories, acceptance criteria, backlog prioritization |
| **tech-lead** | opus | Code quality authority, architecture patterns |

### Development (5)

| Agent | Model | Use Case |
|-------|-------|----------|
| **backend-specialist** | opus | Node.js, Python, PHP, FastAPI, APIs |
| **frontend-specialist** | opus | React, Next.js, Vue, Nuxt, Tailwind |
| **mobile-developer** | opus | React Native, Flutter, native iOS/Android |
| **game-developer** | opus | Unity, Godot, Unreal, Phaser, Three.js |
| **database-architect** | opus | Schema design, migrations, query optimization, operations |

### AI/ML (6)

| Agent | Model | Use Case |
|-------|-------|----------|
| **ai-engineer** | opus | LLM integration, vector databases, RAG pipelines, agent orchestration |
| **ml-engineer** | opus | Model training, MLOps, TensorFlow, PyTorch |
| **nlp-engineer** | opus | NLP pipelines, NER, text classification, transformers |
| **data-scientist** | opus | Statistics, visualization, EDA, hypothesis testing |
| **data-analyst** | sonnet | SQL, analytics, reporting, dashboards |
| **prompt-engineer** | opus | Prompt design, chain-of-thought, few-shot, optimization |

### Quality & Security (6)

| Agent | Model | Use Case |
|-------|-------|----------|
| **code-reviewer** | opus | Code review, standards, quality audit |
| **test-engineer** | opus | Test strategy, TDD, unit/integration/E2E tests |
| **qa-automation-engineer** | opus | Playwright, Cypress, API testing, performance testing |
| **security-auditor** | opus | OWASP, CVE analysis, pen testing, vulnerability assessment |
| **security-architect** | opus | Threat modeling, secure design, AuthN/AuthZ |
| **system-governor** | opus | Constitution guardian, validates changes, VETO power |

### Infrastructure & DevOps (6)

| Agent | Model | Use Case |
|-------|-------|----------|
| **devops-implementer** | opus | Terraform, Ansible, Docker, Kubernetes, CI/CD |
| **infrastructure-architect** | opus | System design, architecture notes, trade-off analysis |
| **infrastructure-validator** | sonnet | Deployment verification, health checks, rollback |
| **incident-responder** | sonnet | P1-P4 incidents, emergency fixes, postmortem |
| **performance-optimizer** | opus | Profiling, bottleneck analysis, latency, scaling |
| **llm-ops-engineer** | opus | LLM caching, fallback, cost optimization, observability |

### Research & Documentation (5)

| Agent | Model | Use Case |
|-------|-------|----------|
| **explorer-agent** | sonnet | Codebase discovery (READ-ONLY, never writes) |
| **technical-researcher** | opus | Deep technical investigation, research synthesis |
| **search-specialist** | sonnet | Search optimization, relevance ranking |
| **fact-checker** | sonnet | Claim verification, source validation |
| **documenter** | sonnet | Documentation, KB management, SOPs, API docs, tutorials |

### MCP (2)

| Agent | Model | Use Case |
|-------|-------|----------|
| **mcp-specialist** | opus | MCP server design, client config, troubleshooting |
| **mcp-testing-engineer** | sonnet | MCP protocol compliance, transport testing |

### Management & Evolution (4)

| Agent | Model | Use Case |
|-------|-------|----------|
| **chief-of-staff** | sonnet | Executive summaries, daily briefings, noise reduction |
| **meta-architect** | opus | Self-optimization, agent definition updates |
| **predictive-analyst** | sonnet | Impact prediction, regression forecasting |
| **business-intelligence** | sonnet | Opportunity discovery, KPI gaps, value creation |

### Autonomous (2)

| Agent | Model | Use Case |
|-------|-------|----------|
| **night-watchman** | sonnet | Autonomous maintenance: dependency updates, dead code |
| **chaos-monkey** | opus | Resilience testing: fault injection, failure verification |

### Specialist (4)

| Agent | Model | Use Case |
|-------|-------|----------|
| **debugger** | opus | Root cause analysis, stack traces, intermittent failures |
| **code-archaeologist** | sonnet | Legacy code investigation, technical debt |
| **command-expert** | sonnet | CLI commands, bash scripting, build scripts |
| **seo-specialist** | sonnet | SEO optimization, meta tags, Core Web Vitals |

## Agent Selection Matrix

| Task Type | Primary | Supporting | Validation |
|-----------|---------|------------|------------|
| New Feature | backend/frontend-specialist | test-engineer | code-reviewer |
| Bug Fix | debugger | backend/frontend | test-engineer |
| Performance | performance-optimizer | database-architect | infrastructure-validator |
| Security | security-auditor | security-architect | code-reviewer |
| Architecture | infrastructure-architect | devops-implementer | security-auditor |
| Documentation | documenter | explorer-agent | tech-lead |
| AI/ML | ai-engineer | ml-engineer | data-scientist |
| Research | technical-researcher | search-specialist | fact-checker |

---

## kb/reference/anti-pattern-registry-format.md

---
title: "Anti-Pattern Registry Format"
category: reference
service: ai-toolkit
description: "Structured JSON format for anti-patterns with severity, auto-fixability, and conflict rules. Used by domain skills with reasoning engines."
tags: [anti-patterns, skills, reasoning-engine, format]
created: 2026-04-01
last_updated: 2026-04-01
---

# Anti-Pattern Registry Format

## Overview

The anti-pattern registry is a structured JSON format used by domain skills that
employ reasoning engines. It provides a machine-readable catalog of known
anti-patterns with severity levels, auto-fix capabilities, and conflict rules.

## When to Use

Use structured JSON registries (this format) when:

- The skill catalogs **more than 50 items** across **more than 3 compatibility
  dimensions** (e.g., domain, severity, language, framework).
- Items have relationships (conflicts, prerequisites, alternatives) that must be
  queryable at runtime.
- The reasoning engine (`search.py`) needs to filter, score, and exclude
  conflicting entries programmatically.

Use Markdown tables when:

- Fewer than 50 items with 3 or fewer dimensions.
- No inter-item relationships.
- Human readability is the only consumer.

## JSON Schema

Each entry in the registry follows this schema:

```json
{
  "id": "string (required)",
  "name": "string (required)",
  "domain": "string (required)",
  "description": "string (required)",
  "pattern": "string (optional)",
  "severity": "string (required)",
  "auto_fixable": "boolean (required)",
  "conflicts_with": ["string (optional)"],
  "remediation": "string (required)",
  "tags": ["string (optional)"]
}
```

### Field Definitions

#### `id` (required)

Unique identifier in kebab-case. Must be globally unique across all registry
files within the same assets directory.

```
"id": "n-plus-one-query"
```

#### `name` (required)

Human-readable display name. Used in reports and dashboards.

```
"name": "N+1 Query Problem"
```

#### `domain` (required)

The skill domain this anti-pattern belongs to. Used for filtering when a
reasoning engine serves multiple domains.

Valid domains include: `security`, `database`, `api`, `architecture`,
`performance`, `testing`, `general`. Skills may define additional domains as
needed.

```
"domain": "database"
```

#### `description` (required)

Clear explanation of what this anti-pattern is and why it is problematic. Should
be actionable -- a developer reading this should understand the risk.

```
"description": "Executing one query per item in a loop instead of a single batch query. Causes O(n) database round-trips where O(1) is possible."
```

#### `pattern` (optional)

A regex pattern for automated detection in source code. When present, tooling
can scan codebases for occurrences. Omit if the anti-pattern is architectural
or cannot be detected via regex.

```
"pattern": "for\\s+.*\\sin\\s+.*:\\s*\\n\\s+.*\\.objects\\.get"
```

#### `severity` (required)

Impact level. Must be one of:

| Value | Meaning |
|-------|---------|
| `critical` | Causes security vulnerabilities, data loss, or production outages. Must fix before merge. |
| `important` | Degrades performance, maintainability, or reliability significantly. Should fix in current sprint. |
| `suggestion` | Improvement opportunity. Fix when convenient or during refactoring. |

```
"severity": "important"
```

#### `auto_fixable` (required)

Boolean indicating whether tooling can automatically remediate this
anti-pattern. When `true`, the reasoning engine or a companion script can
generate a fix.

```
"auto_fixable": true
```

#### `conflicts_with` (optional)

List of anti-pattern IDs that conflict with this entry. The reasoning engine
uses this for mutual exclusion -- if one pattern is selected/detected, the
conflicting ones are filtered out of results.

This prevents contradictory advice (e.g., "use eager loading" and "use lazy
loading" simultaneously).

```
"conflicts_with": ["eager-load-everything"]
```

#### `remediation` (required)

Concrete instructions for fixing the anti-pattern. Should include a code
example or reference to a known-good pattern when possible.

```
"remediation": "Replace loop queries with select_related() or prefetch_related() for Django, or use JOIN/eager loading in your ORM."
```

#### `tags` (optional)

Freeform tags for cross-cutting search. Useful for filtering by technology,
language, or concern that does not map to a single domain.

```
"tags": ["orm", "django", "sqlalchemy", "performance"]
```

## Complete Example

```json
[
  {
    "id": "n-plus-one-query",
    "name": "N+1 Query Problem",
    "domain": "database",
    "description": "Executing one query per item in a loop instead of a single batch query. Causes O(n) database round-trips where O(1) is possible.",
    "pattern": "for\\s+.*\\sin\\s+.*:\\s*\\n\\s+.*\\.objects\\.get",
    "severity": "important",
    "auto_fixable": false,
    "conflicts_with": [],
    "remediation": "Replace loop queries with select_related() or prefetch_related() for Django, or use JOIN/eager loading in your ORM.",
    "tags": ["orm", "django", "sqlalchemy", "performance"]
  },
  {
    "id": "hardcoded-secrets",
    "name": "Hardcoded Secrets",
    "domain": "security",
    "description": "API keys, passwords, or tokens embedded directly in source code. Exposed in version control history even after removal.",
    "pattern": "(api_key|secret|password|token)\\s*=\\s*[\"'][^\"']+[\"']",
    "severity": "critical",
    "auto_fixable": true,
    "conflicts_with": [],
    "remediation": "Move secrets to environment variables or a secrets manager (AWS SSM, Vault, dotenv for local). Reference via os.environ or settings module.",
    "tags": ["secrets", "env", "vault", "ci"]
  }
]
```

## File Organization

Registry files live in the `assets/` directory alongside the reasoning engine:

```
templates/reasoning-engine/
  search.py           # Reasoning engine
  assets/
    example.json      # Template/example entries
    security.json     # Security anti-patterns
    database.json     # Database anti-patterns
    api.json          # API anti-patterns
```

Each file is a JSON array. The reasoning engine loads and merges all `*.json`
files from `assets/` at startup. Keep files organized by domain to avoid merge
conflicts and improve discoverability.

## Integration with Reasoning Engine

The `search.py` reasoning engine uses registry entries as follows:

1. **Load**: All JSON files in `assets/` are loaded and merged into a flat list.
2. **Match**: User query is tokenized and scored against all fields.
3. **Filter**: `conflicts_with` entries are excluded based on already-selected
   items via `filter_anti_patterns()`.
4. **Return**: Top results are returned as JSON to stdout.

Skills that use this pattern should document the `--domain` flag to scope
searches to their specific domain.

---

## kb/reference/architecture-overview.md

---
title: "AI Toolkit - Architecture Overview"
category: reference
service: ai-toolkit
tags: [architecture, overview, design, structure]
version: "1.4.5"
created: "2026-03-23"
last_updated: "2026-05-12"
description: "Architecture of ai-toolkit: directory layout, global install model, editor-aware MCP install, Codex translation layer, skill tiers, and integration with projects."
---

# AI Toolkit Architecture

## Purpose

Shared, project-agnostic AI development toolkit for Claude Code (and compatible assistants like Cursor, Windsurf, Copilot, Gemini, Cline, Roo Code, Aider, Augment, and Google Antigravity). Provides specialized agents, skills (slash commands + knowledge), expanded lifecycle hooks, persona presets, and experimental opt-in plugin packs that teams can adopt separately from the default global install.

## Design Principles

1. **Global install** — one `~/.claude/` install works for all projects; no per-project setup beyond `init`
2. **Merge-friendly** — per-file symlinks, JSON merge, marker injection; user content never overwritten
3. **Composable** — agents reference skills; skills invoke agents; hooks validate all work
4. **Multi-language** — hooks and skills support Python, TypeScript, PHP, Dart, Go
5. **Cost-optimized** — simpler agents run on `sonnet`, complex reasoning on `opus`

## Directory Structure

```
ai-toolkit/
  bin/
    ai-toolkit.js        # CLI entry point (install, init, add-rule, ...)
  app/                       # All toolkit components
    agents/                  # Agent definitions (.md + YAML frontmatter)
    skills/                  # skills: task, hybrid, knowledge
    rules/                   # Rules auto-injected into ~/.claude/CLAUDE.md
    hooks/                   # Hook scripts (copied to ~/.softspark/ai-toolkit/hooks/)
    hooks.json               # Hook definitions (merged into ~/.claude/settings.json)
    constitution.md          # Immutable safety rules, 6 articles (marker-injected)
    ARCHITECTURE.md          # System architecture reference (marker-injected)
    CLAUDE.md.template       # Template for project CLAUDE.md (used by init)
    settings.local.json.template
    .claude-plugin/
      plugin.json            # Official plugin manifest
    plugins/                 # Experimental opt-in plugin packs + optional modules
  scripts/                   # All scripts
    install.py               # Global installer → ~/.claude/ (--local for project-local setup)
    uninstall.py             # Removes toolkit components from ~/.claude/
    inject_rule_cli.py       # Injects a rule into CLAUDE.md (delegates to inject_section_cli.py)
    inject_section_cli.py    # Marker-based content injection (canonical implementation)
    _common.py               # Shared helper for generators (frontmatter, agents/skills emission)
    merge-hooks.py           # JSON merge for hooks into settings.json (inject/strip modes)
    validate.py              # Toolkit integrity check
    evaluate_skills.py       # Skill quality report
    generate_agents_md.py    # Regenerates AGENTS.md
    generate_cursor_rules.py # Generates .cursorrules (sources _common.py)
    generate_windsurf.py     # Generates .windsurfrules (sources _common.py)
    generate_copilot.py      # Generates .github/copilot-instructions.md (sources _common.py)
    generate_gemini.py       # Generates GEMINI.md (sources _common.py)
    generate_cline.py        # Generates .clinerules (sources _common.py)
    generate_roo_modes.py    # Generates .roomodes
    generate_aider_conf.py   # Generates .aider.conf.yml
    generate_llms_txt.py     # Generates llms.txt
    install_git_hooks.py     # Installs fallback pre-commit hook
    plugin.py                # Plugin pack management (install, remove, list, status)
    benchmark_ecosystem.py   # Generates ecosystem benchmark snapshot
    harvest_ecosystem.py     # Writes machine-readable ecosystem harvest JSON
    compile_slm.py           # Compiles toolkit into minimal SLM system prompt (2K-16K tokens)
  tests/                     # Bats test suite
  benchmarks/                # Benchmark tasks + results
  kb/                        # Knowledge base
    reference/               # Catalogs, architecture, usage guides
    procedures/              # SOPs (install, maintenance)
    reference/               # architecture, operating models, and usage guides
```

## Install Model

All components use merge-friendly strategies — user content is never overwritten.

```
Machine (global)                              Project (local)
──────────────────────────────────────────    ──────────────────────────────────────
~/.claude/                                    ~/.softspark/ai-toolkit/
  agents/*.md    → per-file symlinks             rules/     ← registered rules
  skills/*/      → per-dir symlinks              hooks/     ← hook scripts (copied)
  settings.json  ← hooks merged here
  constitution.md ← marker injection            my-project/
  ARCHITECTURE.md ← marker injection              CLAUDE.md            ← project rules
  CLAUDE.md       ← marker injection (rules)      .claude/
                                                    settings.local.json  ← MCP, perms
                                                    constitution.md     ← marker injection
```

| Component | Strategy | Collision handling |
|-----------|----------|-------------------|
| `agents/*.md` | Per-file symlinks | User file with same name wins (toolkit skipped) |
| `skills/*/` | Per-directory symlinks | User dir with same name wins (toolkit skipped) |
| `settings.json` hooks | JSON merge via `merge-hooks.py` | User hooks + settings preserved, toolkit entries tagged with `_source` |
| `constitution.md` | Marker injection via `inject_section_cli.py` | User content outside `<!-- TOOLKIT:* -->` markers untouched |
| `ARCHITECTURE.md` | Marker injection via `inject_section_cli.py` | Same as above |
| `CLAUDE.md` | Marker injection via `inject_rule_cli.py` | Same as above |

**`ai-toolkit install`** — run once per machine, merges toolkit into `~/.claude/`. Auto-upgrades old whole-directory symlinks.

**`ai-toolkit update`** — re-apply after `npm install -g @softspark/ai-toolkit@latest` or after `add-rule` / `remove-rule`. Same as `install` but semantically correct for update flows.

**`ai-toolkit install --local`** — run per project. Always installs Claude Code configs (CLAUDE.md, settings.local.json, constitution.md, language rules). Editor configs are opt-in via `--editors`:
- `--editors all` — install all 11 editors (Cursor, Windsurf, Cline, Roo, Aider, Augment, Copilot, Antigravity, Codex, Gemini, opencode)
- `--editors cursor,aider` — install only selected editors
- (no flag) — auto-detect from existing project files; `update --local` picks up whatever editors already have configs

Each editor gets directory-based format (`.cursor/rules/*.mdc`, `.windsurf/rules/*.md`, `.clinerules/*.md`, `.roo/rules/*.md`, `.augment/rules/ai-toolkit-*.md`, `.agent/rules/*.md`, `CONVENTIONS.md`). Full-profile installs also emit native skill pointer catalogues for Cursor, Windsurf, and Cline. Codex local install additionally generates `AGENTS.md`, `.agents/rules/*.md`, `.agents/skills/*`, and `.codex/hooks.json`. Hooks are global-only — not merged into project settings except for editor-native local hook files such as Codex `.codex/hooks.json`. Experimental plugin packs can also layer a global Codex target in `HOME` (`~/AGENTS.md`, `~/.agents/`, `~/.codex/hooks.json`) when installed with `ai-toolkit plugin install --editor codex`.

If a project already has `.mcp.json`, local install mirrors its `mcpServers` entries into `.claude/settings.local.json` plus any selected editors with project-scoped native MCP files (`.cursor/mcp.json`, `.github/mcp.json`).

## CLI Commands

| Command | Target | What it does |
|---------|--------|-------------|
| `install` | `~/.claude/` | First-time: per-file symlinks + JSON merge + marker injection + rules |
| `install --local` | `./` | Claude Code configs + editors via `--editors` (auto-detect or explicit) |
| `update` | `~/.claude/` | Re-apply after npm update or after add-rule/remove-rule |
| `update --local` | `./` | Re-apply + refresh project-local configs |
| `uninstall` | `~/.claude/` | Strips toolkit components (preserves user content) |
| `add-rule <file>` | `~/.softspark/ai-toolkit/rules/` | Register rule — auto-applied on every `update` |
| `remove-rule <name>` | `~/.softspark/ai-toolkit/rules/` + `~/.claude/CLAUDE.md` | Unregister rule and remove its block |
| `mcp add <name...>` | `./.mcp.json` | Merge canonical MCP template(s) into project config |
| `mcp install --editor <name...>` | native editor config | Render MCP template(s) into editor-native config files |
| `validate` | toolkit | Integrity check |
| `doctor` | toolkit | Install health, hooks, benchmark freshness, and artifact drift diagnostics |
| `benchmark-ecosystem` | toolkit | Benchmark snapshot for official Claude Code and external ecosystem repos |
| `evaluate` | toolkit | Skill quality report |
| `cursor-rules` | `./` | Generates `.cursorrules` (legacy) |
| `cursor-mdc` | `./` | Generates `.cursor/rules/*.mdc` (recommended) |
| `windsurf-rules` | `./` | Generates `.windsurfrules` (legacy) |
| `windsurf-dir-rules` | `./` | Generates `.windsurf/rules/*.md` |
| `copilot-instructions` | `./` | Generates `.github/copilot-instructions.md` |
| `gemini-md` | `./` | Generates `GEMINI.md` |
| `cline-rules` | `./` | Generates `.clinerules` (legacy) |
| `cline-dir-rules` | `./` | Generates `.clinerules/*.md` |
| `roo-modes` | `./` | Generates `.roomodes` |
| `roo-dir-rules` | `./` | Generates `.roo/rules/*.md` |
| `aider-conf` | `./` | Generates `.aider.conf.yml` |
| `conventions-md` | `./` | Generates `CONVENTIONS.md` (Aider auto-loaded) |
| `augment-dir-rules` | `./` | Generates `.augment/rules/ai-toolkit-*.md` |
| `antigravity-rules` | `./` | Generates `.agent/rules/` + `.agent/workflows/` |
| `codex-md` | `./` | Generates Codex-facing `AGENTS.md` |
| `codex-rules` | `./` | Generates `.agents/rules/*.md` |
| `codex-hooks` | `./` | Generates `.codex/hooks.json` |
| `agents-md` | toolkit | Regenerates `AGENTS.md` |
| `llms-txt` | `./` | Generates `llms.txt` |
| `generate-all` | `./` | Generates all platform configs at once |

## Skill Tiers

Three tiers determine how to approach a task:

| Tier | Skills | When to use |
|------|--------|-------------|
| **1 — Quick single-agent** | `/debug`, `/review`, `/refactor`, `/analyze`, `/docs`, `/plan`, `/explain` | One concern, one file area, fast |
| **2 — Multi-agent workflow** | `/workflow <type>` | Cross-cutting task with a known pattern |
| **3 — Custom parallelism** | `/orchestrate`, `/swarm` | No predefined workflow matches |

### `/workflow` types (15)

| Type | Use case |
|------|----------|
| `feature-development` | New feature, full stack |
| `backend-feature` | Backend only: API + logic + tests |
| `frontend-feature` | UI component + state + tests |
| `api-design` | New API endpoint design → implement → document |
| `database-evolution` | Schema change + migration + code update |
| `test-coverage` | Boost test coverage for a module |
| `security-audit` | Multi-vector security assessment |
| `codebase-onboarding` | Understand unfamiliar codebase (read-only) |
| `spike` | Time-boxed technical research → architecture note |
| `debugging` | Bug spanning multiple layers |
| `incident-response` | Production down |
| `performance-optimization` | Degradation >50% |
| `infrastructure-change` | Docker, CI/CD, infra |
| `application-deploy` | Deploy |
| `proactive-troubleshooting` | Warning / trend |

## Skill Classification

| Type | Field | Invocation | Count |
|------|-------|-----------|-------|
| Task | `disable-model-invocation: true` | User via `/skill` only | 32 |
| Hybrid | (neither) | User via `/skill` + agent knowledge | 30 |
| Knowledge | `user-invocable: false` | Claude auto-loads | 45 |

## Multi-Agent Execution

Skills that spawn real parallel agents use:
- `agent: <name>` — delegates to a specialized agent persona
- `context: fork` — runs in isolated forked context
- `Agent` tool — spawns subagents in parallel within the agent's response

`CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` must be set for Agent Teams (tmux-based) support.

### Codex Translation Layer

Codex does not expose Claude's `Agent`, `Team*`, and `Task*` primitives with the
same runtime semantics. To keep the skill catalog aligned, local Codex install
uses a translation layer:

- native Codex-compatible skills are linked directly
- Claude-only orchestration skills are emitted as generated wrappers
- wrapper guidance maps delegation to `spawn_agent`, `send_input`, `wait_agent`, `close_agent`, and `update_plan`

Codex therefore receives the full skill catalog, but not the full Claude hook
surface or tmux-backed Agent Teams lifecycle. Plugin packs reuse the same
translation and hook-compatibility model when targeting the global Codex layer.

See `kb/reference/codex-cli-compatibility.md` for the detailed mapping.

## MCP Rendering Layer

`.mcp.json` is the canonical project-level template format. ai-toolkit can render that configuration into editor-native MCP files through `scripts/mcp_editors.py`.

Current native adapters:
- Claude Code: `.claude/settings.local.json` and `~/.claude/settings.json`
- Cursor: `.cursor/mcp.json` and `~/.cursor/mcp.json`
- GitHub Copilot: `.github/mcp.json` and `~/.copilot/mcp-config.json`
- Gemini CLI: `.gemini/settings.json` and `~/.gemini/settings.json`
- Windsurf: `~/.codeium/windsurf/mcp_config.json`
- Cline: `~/.cline/data/settings/cline_mcp_settings.json`
- Augment: `~/.augment/settings.json`
- Codex CLI: `~/.codex/config.toml`

See `kb/reference/mcp-editor-compatibility.md` for the support matrix and scope rules.

## Quality Guardrails

### Anti-Rationalization Tables
15 core skills include `## Common Rationalizations` tables — domain-specific excuses with rebuttals that prevent agent drift. Skills: `/review`, `/debug`, `/refactor`, `/tdd`, `/plan`, `/docs`, `/analyze`, `security-patterns`, `testing-patterns`, `api-patterns`, `ci-cd-patterns`, `clean-code`, `performance-profiling`, `git-mastery`, `database-patterns`.

### Confidence Scoring & LLM-as-Judge (`/review`)
Review findings include per-issue confidence scores (1-10) and severity tiers (critical/major/minor/nit). A self-evaluation pass after review checks for anchoring bias, assumption vs verification, and calibrates confidence.

### Agent Verification Checklists
10 agents have `## Verification Checklist` — domain-specific exit criteria: `code-reviewer`, `test-engineer`, `security-auditor`, `debugger`, `backend-specialist`, `frontend-specialist`, `database-architect`, `performance-optimizer`, `devops-implementer`, `documenter`.

### Skill Reference Routing
7 core skills include `## Related Skills` suggesting follow-up skills: `/review`, `/debug`, `/plan`, `/refactor`, `/tdd`, `/docs`, `/analyze`.

### Intent Capture Interview (`/onboard`)
Step 0 interview — 5 questions to capture undocumented project intent before setup.

## Component Relationships

```
Skills (/review, /deploy, /debug, ...)
    │
    ▼
Agents (code-reviewer, debugger, devops-implementer, ...)
    │
    ├── load: knowledge skills (clean-code, typescript-patterns, ...)
    │
    ├── validated by: hooks in settings.json (SessionStart, PreToolUse, UserPromptSubmit, PostToolUse, Stop, TaskCompleted, TeammateIdle, SubagentStart, SubagentStop, PreCompact, SessionEnd)
    │
    └── constrained by: constitution.md (5 safety articles)
```

## Quality Hooks

28 entries across 14 lifecycle events. See [hooks-catalog.md](hooks-catalog.md) for full details.

| Hook | Trigger | Script | Action |
|------|---------|--------|--------|
| SessionStart | Session start + compact | `session-start.sh` | MANDATORY rules reminder + session context + instincts |
| SessionStart | Session start | `mcp-health.sh` | Check MCP runtime availability |
| SessionStart | Session start | `session-context.sh` | Capture environment snapshot |
| Notification | Claude waiting for input | *(inline)* | macOS desktop notification |
| PreToolUse | Before Bash | `guard-destructive.sh` | Block destructive commands |
| PreToolUse | Before file ops (Bash, Read, Edit, Write, MultiEdit, Glob, Grep, NotebookEdit, mcp\_filesystem) | `guard-path.sh` | Block wrong-user path hallucination |
| PreToolUse | Before Edit/Write/MultiEdit | `guard-config.sh` | Block config file edits without explicit acknowledgment |
| PreToolUse | Before Bash (git commit) | `commit-quality.sh` | Advisory Conventional Commits format check |
| UserPromptSubmit | Before user prompt execution | `user-prompt-submit.sh` | Prompt governance reminder |
| UserPromptSubmit | Before user prompt execution | `track-usage.sh` | Record skill invocations to stats.json |
| PostToolUse | After edit/write tools | `post-tool-use.sh` | Lightweight validation reminders |
| PostToolUse | After any tool | `governance-capture.sh` | Log security-sensitive operations |
| Stop | After response | `quality-check.sh` | Multi-language lint |
| Stop | After response | `save-session.sh` | Persist session context |
| Stop | Before final stop | `quality-gate.sh` | Block final response on lint/type errors |
| TaskCompleted | Agent Teams: task done | `quality-gate.sh` | Block completion on errors |
| TeammateIdle | Agent Teams: idle | *(inline)* | Completeness reminder |
| SubagentStart | Subagent spawn | `subagent-start.sh` | Scope reminder for subagents |
| SubagentStop | Subagent completion | `subagent-stop.sh` | Handoff checklist for subagents |
| PreCompact | Before compaction | `pre-compact.sh` | Save prioritized context: instincts > tasks > git state > decisions |
| PreCompact | Before compaction | `pre-compact-save.sh` | Timestamped context snapshot to audit trail |
| SessionEnd | Session end | `session-end.sh` | Persist handoff note for the next session |

Scripts at `~/.softspark/ai-toolkit/hooks/`. See [hooks-catalog.md](hooks-catalog.md) for details.

## Constitution (6 Articles)

| Article | Key Rule |
|---------|----------|
| I Safety First | No data loss, no blind execution, max 5 loop iterations |
| II Hierarchy of Truth | KB is source of truth, research protocol mandatory |
| III Operational Integrity | Green tests = Done, logs are evidence |
| IV Self-Preservation | Constitution is read-only, kill switch via system-governor |
| V Resource Governance | No destructive commands without confirmation |
| VI Repair Discipline | No dead code, fix every found bug, tests and docs follow behavior, verify before done |

## Persona Presets

Optional engineering personas injected via `ai-toolkit install --persona <name>`. Each persona adds role-specific communication style, preferred skills, and code review priorities to CLAUDE.md.

| Persona | Focus |
|---------|-------|
| `backend-lead` | System design, scalability, data integrity, API stability |
| `frontend-lead` | Component architecture, a11y, state management, Core Web Vitals |
| `devops-eng` | Infrastructure as code, CI/CD, rollback safety, observability |
| `junior-dev` | Step-by-step explanations, learning resources, small PRs |

Persona files live in `app/personas/*.md` and use the same `inject_rule` mechanism as registered rules.

## Skill Security Auditing

`/skill-audit` scans `app/skills/` and `app/agents/` for security risks:
- **Frontmatter**: overly permissive `allowed-tools`, knowledge skills with Bash
- **Scripts**: `eval()`, `exec()`, `os.system()`, `subprocess(shell=True)`, `pickle.loads`
- **Secrets**: AWS keys, GitHub PATs, private keys, hardcoded passwords
- **Bash**: `curl | bash`, unquoted variables, `chmod 777`

Severity levels: HIGH (blocks deployment), WARN (should fix), INFO (best practice). CI-ready with non-zero exit on HIGH findings.

## Agent Model Tiers

| Model | Purpose | Count |
|-------|---------|-------|
| opus | Complex reasoning, code generation, security | 32 |
| sonnet | Documentation, analysis, pattern-following | 15 |

## Extension Points

### MCP Templates
`app/mcp-templates/` contains 26 ready-to-use MCP server config templates. Opt-in via `ai-toolkit install --modules mcp-templates` or activated automatically with `--profile strict|full`.

### Language Rules
`app/rules/` provides language-specific rule files covering 13 languages (TypeScript, Python, Go, Rust, Java, Kotlin, Swift, Dart, C#, PHP, C++, Ruby, common). Auto-detected from project files via `--auto-detect` or selectable with `--modules rules-<lang>`. See README.md for current count.

### Extension API (`inject-hook`, `inject-rule`)
`inject_section_cli.py` provides a stable marker-based API for injecting content into `CLAUDE.md`, `constitution.md`, or `ARCHITECTURE.md` without overwriting user content. `inject_hook_cli.py` injects hooks into `settings.json` with `_source` tags — supports both local files and HTTPS URLs (cached in `~/.softspark/ai-toolkit/hooks/external/`, auto-refreshed on `update`).

### Manifest Install (`--modules`, `--auto-detect`)
`manifest.json` defines all installable components as named modules. Install individual modules with `ai-toolkit install --modules <name>` or enable auto-detection to select language rules based on files found in the project.

---

## kb/reference/benchmark-config.md

---
title: "AI Toolkit - Config Benchmark"
category: reference
service: ai-toolkit
tags: [benchmark, config, comparison, coverage]
version: "1.0.0"
created: "2026-03-29"
last_updated: "2026-03-29"
description: "Compare your installed ai-toolkit config vs toolkit defaults vs ecosystem competition."
---

# Config Benchmark

## Usage

```bash
ai-toolkit benchmark --my-config
```

## What It Shows

1. **Your Configuration** — counts of installed agents, skills, hooks in `~/.claude/`
2. **Toolkit Totals** — counts of available assets in the toolkit package
3. **Coverage** — percentage of toolkit assets you have installed
4. **Missing Components** — up to 10 agents and skills not yet installed
5. **Ecosystem Comparison** — your config vs public Claude Code ecosystem repos

## Output Example

```
AI Toolkit Config Benchmark
========================

## Your Configuration (~/.claude/)
  Agents:  44
  Skills:  80
  Hooks:   12

## Toolkit Totals
  Agents:  44
  Skills:  80
  Hooks:   12

## Coverage
  Agents:  100%  (44 / 44)
  Skills:  100%  (80 / 80)
  Hooks:   100%  (12 / 12)

## Ecosystem Comparison
Repo                                     Agents  Skills  Hooks
--------------------------------------------------------------
Your config                                  47      80     12
--------------------------------------------------------------
anthropics/claude-code                       15      10      5
affaan-m/everything-claude-code             152     397      2
```

## Data Sources

- User config: `~/.claude/agents/`, `~/.claude/skills/`, `~/.softspark/ai-toolkit/hooks/`
- Toolkit: `app/agents/`, `app/skills/`, `app/hooks/`
- Ecosystem: `benchmarks/ecosystem-dashboard.json`

---

## kb/reference/ci-integration.md

---
title: "AI Toolkit - CI Integration"
category: reference
service: ai-toolkit
tags: [ci, github-actions, automation, validation]
version: "1.0.0"
created: "2026-03-29"
last_updated: "2026-03-29"
description: "Reusable GitHub Action for ai-toolkit validation in CI pipelines."
---

# CI Integration

## GitHub Action

Validate your toolkit setup in CI using the reusable composite action.

### Basic Usage

```yaml
# .github/workflows/validate-toolkit.yml
name: Validate AI Toolkit
on: [push, pull_request]
jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
      - uses: softspark/ai-toolkit@v1
        with:
          command: validate
```

### Inputs

| Input | Default | Description |
|-------|---------|-------------|
| `toolkit-version` | `latest` | npm version of @softspark/ai-toolkit |
| `node-version` | `20` | Node.js version |
| `command` | `validate` | Command to run (`validate` or `doctor`) |

### Outputs

| Output | Description |
|--------|-------------|
| `status` | `pass` or `fail` |

## Alternative: npx

For simpler setups without the action:

```yaml
      - uses: actions/setup-node@v6
        with:
          node-version: 20
      - run: npx @softspark/ai-toolkit validate
```

## What Gets Validated

- Agent frontmatter (name, description, tools, model)
- Skill frontmatter (name, description, format, references)
- Hook event names against whitelist
- Plugin pack manifests (JSON validity, asset references)
- Metadata contracts (README badges vs actual counts)
- Core file presence (LICENSE, CHANGELOG, SECURITY)

---

## kb/reference/claude-ecosystem-benchmark-snapshot.md

---
title: "Claude Ecosystem Benchmark Snapshot"
category: reference
service: ai-toolkit
tags: [benchmark, ecosystem, claude-code, competitive-analysis, roadmap]
version: "1.0.0"
created: "2026-03-28"
last_updated: "2026-04-01"
description: "Repeatable benchmark snapshot of official Claude Code and selected external repositories used to guide ai-toolkit expansion decisions."
---

# Claude Ecosystem Benchmark Snapshot

## Purpose

This document records the external repositories that `ai-toolkit` uses as benchmark inputs for ecosystem expansion work. It complements the planning documents by turning the benchmark set into a stable, repeatable reference.

## Source Set

- `anthropics/claude-code`
- `affaan-m/everything-claude-code`
- `ChrisWiles/claude-code-showcase`
- `disler/claude-code-hooks-mastery`
- `codeaholicguy/ai-devkit`
- `alirezarezvani/claude-code-skill-factory`

## Snapshot (2026-03-28)

| Repository | Category | Why it matters |
|------------|----------|----------------|
| `anthropics/claude-code` | official | Canonical plugin layout, command development, hook development, feature workflows |
| `affaan-m/everything-claude-code` | ecosystem-scale | Scale benchmark for commands, agents, and packaging patterns |
| `ChrisWiles/claude-code-showcase` | practical-showcase | Strong examples of edit-time automation and branch safety hooks |
| `disler/claude-code-hooks-mastery` | hooks-reference | Strong reference for lifecycle breadth and operational hook patterns |
| `codeaholicguy/ai-devkit` | cross-tool | Cross-tool toolkit positioning benchmark |
| `alirezarezvani/claude-code-skill-factory` | meta-tooling | Creator workflow and factory-style inspiration |

## Operational Use

Use the benchmark script for a repeatable snapshot:

```bash
python3 scripts/benchmark_ecosystem.py --offline
python3 scripts/benchmark_ecosystem.py --format json
python3 scripts/benchmark_ecosystem.py --dashboard-json
python3 scripts/harvest_ecosystem.py --offline
python3 scripts/benchmark_ecosystem.py --out /tmp/claude-ecosystem-benchmark.md
```

The script prefers live GitHub metadata when available and falls back to the embedded snapshot when offline.

Machine-readable artifacts:

- `benchmarks/ecosystem-dashboard.json` — curated dashboard summary with freshness and comparison matrix
- `benchmarks/ecosystem-harvest.json` — latest harvested benchmark JSON for roadmap / changelog reuse

## Adoption Matrix

| Pattern | Current ai-toolkit State | Benchmark Signal | Priority |
|---------|--------------------------|------------------|----------|
| Plugin manifest | Present | Strong in official Claude Code | High |
| Hook creator workflow | Present | Reinforced by official plugin-dev assets | High |
| Command creator workflow | Present | Reinforced by command-development patterns | High |
| Agent creator workflow | Present | Reinforced by agent-development patterns | High |
| Lifecycle breadth (`PreCompact`) | Present | Validated by hooks-focused repos | High |
| Lifecycle breadth (`PostToolUse`) | Present | Strong benchmark signal | High |
| Lifecycle breadth (`UserPromptSubmit`) | Present | Prompt-governance benchmark signal | High |
| Lifecycle breadth (`SubagentStart` / `SubagentStop`) | Present | Strong subagent instrumentation signal | Medium |
| Lifecycle breadth (`SessionEnd`) | Present | Needed for handoff / cleanup patterns | Medium |
| Ecosystem benchmark script | Present | Needed for repeatable comparison | High |
| Harvesting script + dashboard JSON | Present | Repeatable evidence for roadmap and release notes | High |
| Domain plugin packs | Present (experimental) | Validates modular packaging direction | Medium |
| Policy packs | Not yet implemented | Strong but still optional | Later |

## Notes

- This snapshot is intentionally small and curated.
- The goal is decision quality, not ecosystem collection for its own sake.
- Large benchmark repositories are references, not implementation blueprints.


---

## kb/reference/claude-ecosystem-expansion-foundations.md

---
title: "Claude Ecosystem Expansion Foundations"
category: reference
service: ai-toolkit
tags: [benchmark, claude-code, ecosystem, hooks, plugins, architecture]
version: "1.0.0"
created: "2026-03-27"
last_updated: "2026-04-13"
description: "Reference summary of the ecosystem signals and implementation foundations adopted in ai-toolkit, including runtime-aware plugin packaging."
---

# Claude Ecosystem Expansion Foundations

## Purpose

This document captures the architectural foundations adopted in `ai-toolkit` after reviewing:

1. the current toolkit repository,
2. official Claude Code patterns,
3. selected external benchmark repositories.

The outcome is a toolkit that is now positioned as a more modular, Claude-first, benchmark-backed system with stronger lifecycle automation and extension tooling.

## Implemented Foundations

### 1. Plugin-oriented structure

`ai-toolkit` now treats plugin packaging as a first-class capability, with runtime-aware install surfaces for Claude and optional global Codex layering.

Implemented artifacts:
- `app/.claude-plugin/plugin.json`
- `app/plugins/`
- `app/skills/plugin-creator/SKILL.md`
- `kb/reference/plugin-pack-conventions.md`

### 2. Broader lifecycle coverage

The toolkit now covers prompt, edit, subagent, compaction, and session-end phases.

Implemented events:
- `SessionStart`
- `Notification`
- `PreToolUse`
- `UserPromptSubmit`
- `PostToolUse`
- `Stop`
- `TaskCompleted`
- `TeammateIdle`
- `SubagentStart`
- `SubagentStop`
- `PreCompact`
- `SessionEnd`

### 3. Creator workflows

The toolkit now includes first-class creator workflows for extension work:
- `hook-creator`
- `command-creator`
- `agent-creator`
- `plugin-creator`

### 4. Benchmark-backed maintenance

External ecosystem research is operationalized through repeatable scripts and machine-readable artifacts.

Implemented artifacts:
- `scripts/benchmark_ecosystem.py`
- `scripts/harvest_ecosystem.py`
- `benchmarks/ecosystem-dashboard.json`
- `benchmarks/ecosystem-harvest.json`
- `kb/reference/claude-ecosystem-benchmark-snapshot.md`

## Benchmark Inputs

The reference benchmark set is intentionally curated:
- `anthropics/claude-code`
- `affaan-m/everything-claude-code`
- `ChrisWiles/claude-code-showcase`
- `disler/claude-code-hooks-mastery`
- `codeaholicguy/ai-devkit`
- `alirezarezvani/claude-code-skill-factory`

## Adopted Outcomes

| Area | Adopted in ai-toolkit |
|------|------------------------|
| Plugin manifests | Yes |
| Domain plugin packs | Yes (experimental) |
| Hook creator workflow | Yes |
| Command creator workflow | Yes |
| Agent creator workflow | Yes |
| Plugin creator workflow | Yes |
| Post-edit feedback hooks | Yes |
| Prompt governance hook | Yes |
| Subagent lifecycle hooks | Yes |
| Session-end handoff | Yes |
| Benchmark dashboard JSON | Yes |
| Harvesting workflow | Yes |

## Current Position

`ai-toolkit` is now documented and implemented as a complete, production-ready toolkit baseline rather than a staged roadmap. Future changes should be treated as normal product evolution, not backlog catch-up.

---

## kb/reference/cli-reference.md

---
title: "CLI Reference"
category: reference
service: ai-toolkit
tags: [cli, commands, reference, install, update, plugin, mcp, telemetry]
created: "2026-04-13"
last_updated: "2026-04-24"
description: "Complete CLI reference for all ai-toolkit commands, options, and flags."
---

# CLI Reference

```
Usage: ai-toolkit <command> [options]
```

## Core Commands

| Command | Description |
|---------|-------------|
| `install` | First-time global install into `~/.claude/` + Cursor, Windsurf, Gemini |
| `install --local` | Claude Code configs only; add `--editors all` or `--editors cursor,aider` for other tools |
| `update` | Re-apply toolkit after `npm install -g @softspark/ai-toolkit@latest` |
| `update --local` | Re-apply + auto-detect editors from existing project files |
| `reset --local` | Wipe all project-local configs and recreate from scratch (clean slate) |
| `status` | Show installed modules and version |
| `uninstall` | Remove toolkit from `~/.claude/` |
| `validate` | Verify toolkit integrity (`--strict` for CI-grade, warnings = errors) |
| `doctor` | Diagnose install health, hooks, quick-win assets, and artifact drift |
| `doctor --fix` | Auto-repair broken symlinks, missing hooks, stale artifacts |
| `eject [dir]` | Export standalone config (no symlinks, no toolkit dependency) |

## Rule & Hook Injection

| Command | Description |
|---------|-------------|
| `add-rule <rule.md\|url> [name]` | Register rule in `~/.softspark/ai-toolkit/rules/` — auto-applied on every `update` |
| `remove-rule <name> [dir]` | Unregister rule and remove its block from `CLAUDE.md` |
| `inject-hook <file.json\|url> [name]` | Inject external hooks (file or URL) into settings.json (idempotent, `_source` tagged, URL hooks auto-refresh on update) |
| `remove-hook <name>` | Remove injected hooks by source name (also unregisters URL source if present) |

## MCP Management

| Command | Description |
|---------|-------------|
| `mcp list` | List available MCP server templates (26 templates) |
| `mcp editors` | List editors with native MCP config adapters and scopes |
| `mcp add <name> [names...]` | Add MCP server template(s) to `.mcp.json` |
| `mcp install --editor <name[,..]> [names...]` | Install templates into native editor MCP config |
| `mcp show <name>` | Show MCP template config details |
| `mcp remove <name>` | Remove MCP server from `.mcp.json` or editor MCP config |

## Plugin Management

| Command | Description |
|---------|-------------|
| `plugin list` | Show available plugin packs with install status |
| `plugin install <name> [--editor claude\|codex\|all]` | Install a plugin pack for selected runtime(s) |
| `plugin install --all [--editor claude\|codex\|all]` | Install all 11 plugin packs |
| `plugin update <name> [--editor claude\|codex\|all]` | Update a plugin pack (remove + reinstall, preserves data) |
| `plugin update --all [--editor claude\|codex\|all]` | Update all installed plugin packs |
| `plugin clean <name> [--days N]` | Prune old plugin data (default: 90 days) |
| `plugin remove <name> [--editor claude\|codex\|all]` | Remove a plugin pack |
| `plugin status [--editor claude\|codex\|all]` | Show installed plugins with runtime-specific details |

## Config Inheritance

| Command | Description |
|---------|-------------|
| `config validate [path]` | Validate `.softspark-toolkit.json` schema + extends + enforcement |
| `config diff [path]` | Show project vs base config differences |
| `config init [flags]` | Create `.softspark-toolkit.json` (`--extends`, `--profile`, `--no-extends`) |
| `config create-base <name>` | Scaffold base config npm package |
| `config check [path]` | CI enforcement gate (exit 0=pass, 1=fail, 2=no config; `--json`) |

## Project Registry

| Command | Description |
|---------|-------------|
| `projects` | List registered projects |
| `projects --prune` | Remove stale (deleted) entries |
| `projects remove /path` | Unregister specific project |

## Generator Commands

| Command | Description |
|---------|-------------|
| `generate-all` | Generate all platform configs at once |
| `agents-md` | Regenerate `AGENTS.md` from agent definitions |
| `codex-md` | Generate `AGENTS.md` with marker injection for Codex CLI |
| `codex-rules` | Generate `.agents/rules/*.md` for Codex CLI |
| `codex-hooks` | Generate `.codex/hooks.json` for Codex CLI |
| `cursor-rules` | Generate `.cursorrules` (legacy single file) |
| `cursor-mdc` | Generate `.cursor/rules/*.mdc` (recommended) |
| `windsurf-rules` | Generate `.windsurfrules` (legacy) |
| `windsurf-dir-rules` | Generate `.windsurf/rules/*.md` (recommended) |
| `copilot-instructions` | Generate `.github/copilot-instructions.md` |
| `gemini-md` | Generate `GEMINI.md` for Gemini CLI |
| `cline-rules` | Generate `.clinerules` (legacy) |
| `cline-dir-rules` | Generate `.clinerules/*.md` (recommended) |
| `roo-modes` | Generate `.roomodes` |
| `roo-dir-rules` | Generate `.roo/rules/*.md` |
| `aider-conf` | Generate `.aider.conf.yml` |
| `conventions-md` | Generate `CONVENTIONS.md` for Aider |
| `augment-rules` | Generate `.augment/rules/ai-toolkit.md` (legacy) |
| `augment-dir-rules` | Generate `.augment/rules/ai-toolkit-*.md` (recommended) |
| `antigravity-rules` | Generate `.agent/rules/` and `.agent/workflows/` |
| `llms-txt` | Generate `llms.txt` and `llms-full.txt` |

## Other Commands

| Command | Description |
|---------|-------------|
| `stats` | Show skill usage statistics (`--summary` for product telemetry, `--reset` to clear, `--json` for raw output) |
| `benchmark --my-config` | Compare your config vs defaults vs ecosystem |
| `benchmark-ecosystem` | Generate ecosystem benchmark snapshot |
| `create skill <name>` | Scaffold new skill from template (`--template=linter\|reviewer\|generator\|workflow\|knowledge`) |
| `sync` | Config portability via GitHub Gist (`--export`, `--push`, `--pull`, `--import`) |
| `compile-slm` | Compile toolkit into minimal SLM system prompt (`--budget`, `--model-size`, `--dry-run`) |
| `evaluate` | Run skill evaluation suite |

### `stats`

```bash
ai-toolkit stats                 # table of local skill usage
ai-toolkit stats --summary       # product telemetry summary
ai-toolkit stats --summary --json  # machine-readable telemetry
ai-toolkit stats --reset         # clear local stats
```

`--summary` reports total invocations, unique skills used, catalog coverage, unused catalog skills, active skills in the last 7 days, and top skills. Data stays local in `~/.softspark/ai-toolkit/stats.json`.

## Install / Update Options

```bash
ai-toolkit install --only agents,hooks          # apply only listed components
ai-toolkit install --skip hooks                 # skip listed components
ai-toolkit install --profile minimal            # minimal | standard | strict
ai-toolkit install --persona backend-lead       # backend-lead | frontend-lead | devops-eng | junior-dev
ai-toolkit install --local --editors all        # Claude Code + all editors
ai-toolkit install --local --editors cursor,aider  # + specific editors
ai-toolkit install --local --lang typescript    # explicit language rules
ai-toolkit install --modules core,agents,rules-typescript  # selective modules
ai-toolkit install --list                       # dry-run: show what would change
ai-toolkit update --local                       # auto-detects editors from existing files
```

---

## kb/reference/codex-cli-compatibility.md

---
title: "AI Toolkit - Codex CLI Compatibility"
category: reference
service: ai-toolkit
tags: [codex, compatibility, install, skills, hooks]
version: "1.0.3"
created: "2026-04-12"
last_updated: "2026-05-25"
description: "Reference for how ai-toolkit maps Claude-oriented skills, hooks, and plugin packs to Codex CLI."
---

# AI Toolkit - Codex CLI Compatibility

## Summary

Codex CLI now receives the full `ai-toolkit` skill catalog during local install.

Native Codex-compatible skills are linked directly into `.agents/skills/`. Skills
that depend on Claude-only orchestration primitives are generated as Codex
wrappers that preserve the original workflow intent while translating execution
to Codex subagents and plan tracking.

Experimental plugin packs can also target a global Codex surface with
`ai-toolkit plugin install --editor codex`, which layers plugin-specific skills,
rules, and hooks into `HOME` without changing the project-local core install
model.

## Local Install Outputs

`ai-toolkit install --local --editors codex` generates:

- `AGENTS.md`
- `.agents/rules/*.md`
- `.agents/skills/*`
- `.codex/hooks.json`

## Global Plugin Outputs

`ai-toolkit plugin install --editor codex <pack>` bootstraps or reuses:

- `~/AGENTS.md`
- `~/.agents/rules/*.md`
- `~/.agents/skills/*`
- `~/.codex/hooks.json`

Plugin packs only add their own runtime-specific layer on top of the generated
Codex base. Shared hook scripts and plugin scripts still live in
`~/.softspark/ai-toolkit/`.

## Skill Translation Model

Two delivery modes are used for Codex:

| Mode | How it is installed | Use case |
|------|----------------------|----------|
| Native | Symlink to `app/skills/<name>/` | Skills whose `allowed-tools` are already supported in Codex |
| Adapted | Generated wrapper directory in `.agents/skills/<name>/` | Skills that rely on Claude-only `Agent`, `Team*`, or `Task*` primitives |

Adapted skills keep the same support assets (`reference/`, `scripts/`, `assets/`)
via symlinks, but rewrite `SKILL.md` to Codex-native guidance.

## Claude-to-Codex Tool Mapping

The adapter rewrites Claude-specific delegation guidance to the closest Codex
runtime primitives:

| Claude-oriented primitive | Codex replacement |
|---------------------------|------------------|
| `Agent(...)` | `spawn_agent(..., fork_context=True, ...)` |
| `SendMessage` | `send_input` |
| `TaskCreate` / `TaskList` / `TaskUpdate` | `update_plan` or explicit checklist tracking |
| `TaskGet` / `TaskOutput` | `wait_agent` |
| `TaskStop` / `TeamDelete` | `close_agent` |
| Agent teams | Multiple spawned subagents with explicit file ownership |

## Adapted Skill Classes

The main adapted group is multi-agent orchestration:

- `/orchestrate`
- `/workflow`
- `/swarm`
- `/subagent-development`

The adapter also covers skills that previously depended only on Claude's
`Agent` primitive, such as:

- `/tdd`
- `/write-a-prd`
- `/qa-session`
- `/triage-issue`
- `/architecture-audit`

## Hook Compatibility

Codex does not expose the full Claude hook event surface. The Codex hook
generator emits only the events supported by Codex runtime integration:

- `SessionStart`
- `PreToolUse`
- `PostToolUse`
- `UserPromptSubmit`
- `Stop`

This means Claude-only events such as `TaskCompleted`, `TeammateIdle`,
`SubagentStart`, `SubagentStop`, `PreCompact`, `SessionEnd`, and
`Notification` are not available in `.codex/hooks.json`.

`inject-hook` automatically propagates Codex-compatible events to
`~/.codex/hooks.json` (global layer). Non-Codex events are silently skipped.
`remove-hook` cleans both Claude and Codex targets.

Generated Codex hook commands include `AI_TOOLKIT_HOOK_QUIET=1`. The generated
`UserPromptSubmit` governance hook does not set `AI_TOOLKIT_HOOK_FORMAT=json`
by default because Codex currently renders `additionalContext` as visible hook
context in the TUI. This keeps prompt-submit output quiet while preserving hook
side effects and blocking decisions such as search-first Stop enforcement.

Codex `UserPromptSubmit` JSON output is event-specific. When emitting context,
the hook must include the event name alongside the context:

```json
{
  "hookSpecificOutput": {
    "hookEventName": "UserPromptSubmit",
    "additionalContext": "..."
  },
  "suppressOutput": true
}
```

Older `{"hookSpecificOutput":{"additionalContext":"..."}}` output can be valid
JSON but fail newer Codex event-output validation.

Plain-text informational hook context is also silent by default in the shared
hook helper. Set `AI_TOOLKIT_HOOK_VERBOSE=1` only when debugging hook output
outside the Codex UI.

## Behavioral Limits

Codex wrappers preserve workflow intent, but not every Claude runtime behavior
has a perfect one-to-one equivalent.

Known limits:

- No native Codex equivalent of tmux-backed Agent Teams lifecycle events
- No separate task object model equivalent to Claude `Task*` APIs
- Hook event coverage is narrower than Claude Code
- MCP search tool calls may not fire the shared `PostToolUse` search tracker,
  so `stop-search-check.sh` also checks `~/.codex/log/codex-tui.log` for
  `smart_query`, `hybrid_search_kb`, `crag_search`, `multi_hop_search`, and
  `verify_answer` calls after the search-first flag timestamp before blocking.
  The scan is bounded to a recent log window, but sized to tolerate noisy Codex
  skill-loader output between the search call and the Stop hook.

These are runtime platform limits, not installation defects.

## Verification

The Codex compatibility path is verified by:

1. Generator contract tests for `generate_codex.py`
2. Local install tests for `.agents/skills/` and `.codex/hooks.json`
3. Plugin install tests for global Codex rules, hooks, and cleanup paths
4. CLI tests for `codex-md` and `codex-hooks`

## Related

- `kb/reference/skills-catalog.md`
- `kb/reference/architecture-overview.md`
- `kb/reference/global-install-model.md`

---

## kb/reference/comparison.md

---
title: "Ecosystem Comparison"
category: reference
service: ai-toolkit
tags: [comparison, ecosystem, features, alternatives]
created: "2026-04-13"
last_updated: "2026-04-13"
description: "Feature comparison of ai-toolkit vs other Claude Code toolkits and agent frameworks."
---

# Ecosystem Comparison

| Feature | ai-toolkit | everything-claude-code | wshobson/agents | ruflo |
|---------|---------------|----------------------|-----------------|-------|
| Skills | 93 | 100+ | 146 | 20+ |
| Agents | 44 | 30+ | 112 | 20+ |
| Machine-enforced constitution | **Yes** | No (docs only) | No | No |
| Skill-scoped lifecycle hooks | **Yes** | No | No | No |
| Effort-based model budgeting | **Yes** | No | No | No |
| Test suite | Yes (bats) | Yes (997 tests) | No | Yes |
| npm/npx install | Yes | Yes | Yes | Yes |
| Cross-tool support | **Cursor, Windsurf, Copilot, Gemini, Cline, Roo, Aider, Augment, Antigravity, Codex** | 5+ tools | Smithery | Limited |
| Selective install | Yes | Yes | Yes (72 plugins) | No |
| Session persistence | Yes | Yes | No | No |
| Architecture notes | **Yes** | No | No | No |
| KB/RAG integration | **Yes** | No | No | Yes |
| License | MIT | MIT | MIT | MIT |

For live benchmark data, see the [ecosystem benchmark snapshot](claude-ecosystem-benchmark-snapshot.md).

---

## kb/reference/competitive-features-implementation.md

---
title: "Plan: Competitive Features Implementation — Learning System, Language Rules, Hook Matrix, MCP Templates"
category: reference
service: ai-toolkit
tags:
  - competitive-analysis
  - continuous-learning
  - language-rules
  - hook-matrix
  - mcp-templates
  - install-profiles
  - completed
doc_type: plan
status: completed
created: "2026-04-07"
last_updated: "2026-04-24"
completion: "100%"
description: "Implementation plan for features identified from competitive analysis of everything-claude-code and claude-mem. Focus on learning system, language rules, advanced hooks, MCP templates, and rag-mcp integration. COMPLETED: 8/9 features shipped (1 skipped). See kb/reference/ for permanent documentation."
---

# Plan: Competitive Features — ai-toolkit

**Status:** :white_check_mark: COMPLETED
**Completion:** 100% (8/9 features, 1 skipped)
**Started:** 2026-04-07
**Estimated Completion:** 2026-06-15
**Source:** Competitive analysis of `affaan-m/everything-claude-code` (ECC) + `thedotmack/claude-mem`

---

## 1. Objective

Strengthen ai-toolkit's competitive position by implementing 10 features from competitive analysis while maintaining our advantages (clean architecture, 11 editors, personas, safety constitution).

**Key design principle:** ai-toolkit is a **generic toolkit** — it does NOT know about rag-mcp or any specific consumer. Consumers (like rag-mcp) use ai-toolkit's public API (`inject-rule`, `inject-hook`, `merge-hooks`) to add their own rules and hooks.

**State before plan:** 88 skills, 47 agents, 14 hooks, 9 editor integrations
**State after plan:** See manifest.json and README.md for current counts. Target: skills, agents, hooks, language rules, MCP templates, extension API

---

## 2. Progress Tracking

| # | Feature | Priority | Status | Est. Time | Actual | Notes |
|---|---------|----------|--------|-----------|--------|-------|
| 1.1 | Language-Specific Rules (13 langs) | P0 | :white_check_mark: | 5-7d | 1d | 70 files (13 langs × 5 + 5 common) |
| 1.2 | Advanced Hook Matrix | P0 | :white_check_mark: | 5-7d | 1d | 6 new hooks + hooks.json |
| 1.3 | MCP Server Templates (25) | P0 | :white_check_mark: | 2-3d | 1d | 25 templates + mcp_manager.py + CLI |
| 2.1 | `inject-hook` CLI command | P1 | :white_check_mark: | 3-5d | 1d | inject_hook_cli.py + 17 tests + CLI |
| 2.2 | Manifest-Driven Install | P1 | :white_check_mark: | 7-10d | 1d | modules, state tracking, auto-detect |
| 3.1 | Council Skill | P2 | :white_check_mark: | 3-5d | 1d | /council (4-perspective orchestrator) |
| 3.2 | Brand Voice Skill | P2 | :white_check_mark: | 2-3d | 1d | knowledge skill + anti-trope list |
| 3.3 | Agent Introspection Skill | P2 | :white_check_mark: | 3-5d | 1d | /introspect (7 failure patterns) |
| 4.1 | Documentation Site (Starlight/Astro) | P3 | :no_entry: SKIPPED | — | — | Unnecessary — README/CLAUDE.md sufficient |

---

## 3. Dependency Graph

```
ALL FEATURES ARE INDEPENDENT — no external dependencies

MCP Templates (1.3) ← quick win, start here
Language Rules (1.1) ← independent
Hook Matrix (1.2) ← independent
inject-hook CLI (2.1) ← independent (extends existing inject-rule pattern)
Manifest Install (2.2) ← independent but complex
Council Skill (3.1) ← independent
Brand Voice (3.2) ← independent
Agent Introspection (3.3) ← independent
Documentation Site (4.1) ← independent
```

---

## 4. Detailed Implementation

### Faza 1: Quick Wins + Foundation (tydzień 1-2)

#### 1.1 Language-Specific Rules System

**Source:** ECC — 13 language dirs × 5 files each = 65 rule files
**What we create:** Skill-based language rules that inject into CLAUDE.md via `--local`

**Current state:** We have `app/skills/` with some language patterns (typescript-patterns, ruby-patterns, etc.)
**Gap:** No systematic coding-style + testing + security + hooks + patterns per language

**Files to create:**

```
app/rules/
├── common/
│   ├── coding-style.md          # KISS, DRY, YAGNI, immutability
│   ├── testing.md               # Testing standards
│   ├── git-workflow.md          # Commit conventions
│   ├── performance.md           # Performance guidelines
│   └── security.md              # OWASP, input validation
├── typescript/
│   ├── coding-style.md          # TS-specific (strict mode, no any, etc.)
│   ├── testing.md               # Jest/Vitest patterns
│   ├── patterns.md              # TS patterns (discriminated unions, etc.)
│   ├── hooks.md                 # React hooks, lifecycle
│   └── security.md              # XSS, sanitization
├── python/
│   ├── coding-style.md          # PEP 8, type hints, dataclasses
│   ├── testing.md               # pytest, fixtures, parametrize
│   ├── patterns.md              # Python patterns
│   ├── hooks.md                 # Django/FastAPI lifecycle
│   └── security.md              # SQL injection, SSTI
├── golang/                      # Same 5-file structure
├── rust/
├── java/
├── kotlin/
├── swift/
├── dart/
├── csharp/
├── php/
├── cpp/
└── ruby/
```

**Total: 13 languages × 5 files + 5 common = 70 files**

**Integration with install:**
```bash
# During ai-toolkit install --local
# Detect project language from package.json, Cargo.toml, go.mod, etc.
# Inject relevant language rules into CLAUDE.md
```

**Files to modify:**

| File | Action | Description |
|------|--------|-------------|
| `app/rules/` (70 files) | CREATE | Language-specific rules |
| `scripts/install_steps/detect_language.py` | CREATE | Auto-detect project language |
| `scripts/install_steps/inject_rules.py` | EDIT | Inject language rules into CLAUDE.md |
| `scripts/validate.py` | EDIT | Validate rules format |
| `tests/test_rules.py` | CREATE | Tests |

**Success Criteria:**
- [x] 13 languages × 5 rule files created (70 files: 13 dirs × 5 + 5 common)
- [x] `ai-toolkit install --local` auto-detects language and injects rules (two-phase: marker files + extension scan)
- [x] Manual override: `ai-toolkit install --local --lang typescript` (with aliases: go→golang, c++→cpp, cs→csharp)
- [x] validate.py checks rules format
- [x] Tests: dedicated `tests/test_rules.bats`

---

#### 1.2 Advanced Hook Matrix

**Source:** ECC — 11+ specific hooks with PreToolUse/PostToolUse matrix
**Current state:** 14 hooks in `app/hooks/`
**Gap:** Missing specific hooks for config protection, MCP health, governance, continuous learning

**New hooks to add:**

| Hook | Event | Script | Purpose |
|------|-------|--------|---------|
| `guard-config.sh` | PreToolUse (Edit/Write) | Bash | Block edits to .eslintrc, .prettierrc, tsconfig unless explicit |
| `mcp-health.sh` | SessionStart | Bash | Check MCP server health before session |
| `governance-capture.sh` | PostToolUse | Bash | Log governance events (security, policy) |
| `observe-session.sh` | PostToolUse | Bash | Send observations to rag-mcp (bridge) |
| `pre-compact-save.sh` | PreCompact | Bash | Save context state before compaction |
| `commit-quality.sh` | PreToolUse (Bash) | Bash | Check commit message quality |

**Files:**

| File | Action | Description |
|------|--------|-------------|
| `app/hooks/guard-config.sh` | CREATE | Config file protection |
| `app/hooks/mcp-health.sh` | CREATE | MCP server health check |
| `app/hooks/governance-capture.sh` | CREATE | Governance event logging |
| `app/hooks/observe-session.sh` | CREATE | Send obs to rag-mcp |
| `app/hooks/pre-compact-save.sh` | CREATE | Context save before compact |
| `app/hooks/commit-quality.sh` | CREATE | Commit message quality |
| `scripts/install_steps/install_hooks.py` | EDIT | Register new hooks |
| `tests/test_hooks.py` | EDIT | Tests for new hooks |

**Success Criteria:**
- [x] 5/6 new hooks created and registerable (observe-session.sh not in app/hooks — lives in rag-mcp as consumer)
- [x] guard-config blocks config edits unless `--force`
- [x] mcp-health pings configured MCP servers on session start
- [x] All hooks optional (enable/disable in settings.json)
- [x] Tests: 89 hook tests in test_hooks.bats

---

#### 1.3 MCP Server Templates

**Source:** ECC — 25 preconfigured MCP servers
**What we create:** Template configs that users can copy

**File to create:**
```
app/mcp-templates/
├── README.md                    # How to use templates
├── github.json                  # GitHub MCP server
├── jira.json                    # Jira MCP server
├── context7.json                # Context7 docs
├── filesystem.json              # Filesystem MCP server
├── sequential-thinking.json     # Sequential thinking
├── exa-search.json              # Exa web search
├── supabase.json                # Supabase
├── postgres.json                # PostgreSQL
├── redis.json                   # Redis
├── cloudflare.json              # Cloudflare
├── vercel.json                  # Vercel
├── railway.json                 # Railway
├── docker.json                  # Docker
├── browser-use.json             # Browser automation
├── fal-ai.json                  # fal.ai (image/video)
├── firecrawl.json               # Web scraping
├── sentry.json                  # Sentry error tracking
├── linear.json                  # Linear issue tracker
├── slack.json                   # Slack
├── notion.json                  # Notion
├── confluence.json              # Confluence
├── grafana.json                 # Grafana
├── datadog.json                 # Datadog
└── custom-template.json         # Template for custom MCP
```

**CLI command:**
```bash
ai-toolkit mcp add github          # Copy github.json to .mcp.json
ai-toolkit mcp add github jira     # Add multiple
ai-toolkit mcp list                 # List available templates
ai-toolkit mcp show github         # Show config details
```

**Files:**

| File | Action | Description |
|------|--------|-------------|
| `app/mcp-templates/` (25 files) | CREATE | MCP configs |
| `bin/ai-toolkit` | EDIT | Add `mcp` subcommand |
| `scripts/mcp_manager.py` | CREATE | MCP template manager |
| `tests/test_mcp_templates.py` | CREATE | Validate JSON schemas |

**Success Criteria:**
- [x] 25 MCP template configs created
- [x] `ai-toolkit mcp add <name>` merges into .mcp.json
- [x] `ai-toolkit mcp list` shows all available
- [x] Tests: 15 in test_mcp_manager.bats

---

### Faza 2: Extension API + Install (tydzień 3-5)

#### 2.1 `inject-hook` CLI Command (Generic Hook Injection)

**Purpose:** Allow ANY external tool to inject hooks into `~/.claude/settings.json` — the same way `inject-rule` works for `~/.claude/CLAUDE.md`. This is the missing piece that enables consumers (rag-mcp, custom tools, CI systems) to register hooks without knowing ai-toolkit internals.

**Current state:**
- `inject-rule ./my-rules.md` → injects rules into CLAUDE.md between `<!-- TOOLKIT:my-rules START/END -->` markers
- `merge-hooks.py inject <hooks.json> <settings.json>` → merges hooks but ONLY with `_source: "ai-toolkit"` tag
- **Gap:** No public CLI for external tools to inject hooks with their OWN `_source` tag

**Architecture (parallels inject-rule):**
```
inject-rule ./rag-mcp-rules.md          → CLAUDE.md      (markers: <!-- TOOLKIT:rag-mcp-rules -->)
inject-hook ./rag-mcp-hooks.json        → settings.json  (tag: "_source": "rag-mcp-hooks")
remove-rule rag-mcp-rules               → strips from CLAUDE.md
remove-hook rag-mcp-hooks               → strips from settings.json
```

**Example: rag-mcp consuming this API:**
```bash
# rag-mcp creates a hooks file:
cat > /tmp/rag-mcp-hooks.json << 'EOF'
{
  "hooks": {
    "UserPromptSubmit": [
      {
        "matcher": "",
        "hooks": [{ "type": "command", "command": "echo 'apply KB-first research'" }]
      }
    ],
    "PostToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [{ "type": "command", "command": "$HOME/.rag-mcp/hooks/observe-session.sh" }]
      }
    ],
    "SessionStart": [
      {
        "matcher": "",
        "hooks": [{ "type": "command", "command": "$HOME/.rag-mcp/hooks/inject-instincts.sh" }]
      }
    ]
  }
}
EOF

# rag-mcp calls ai-toolkit to inject:
npx @softspark/ai-toolkit inject-hook /tmp/rag-mcp-hooks.json
# → all entries tagged with _source: "rag-mcp-hooks" in settings.json
# → re-running is idempotent (strips old rag-mcp-hooks entries, appends new)

# rag-mcp removes its hooks:
npx @softspark/ai-toolkit remove-hook rag-mcp-hooks
```

**Files:**

| File | Action | Description |
|------|--------|-------------|
| `scripts/inject_hook_cli.py` | CREATE | CLI: inject-hook / remove-hook |
| `scripts/merge-hooks.py` | EDIT | Support custom `_source` tag (not just "ai-toolkit") |
| `bin/ai-toolkit.js` | EDIT | Register `inject-hook` + `remove-hook` subcommands |
| `tests/test_inject_hook.bats` | CREATE | Tests |
| `kb/howto/inject-hook-api.md` | CREATE | Documentation for consumers |

**merge-hooks.py changes:**
```python
# Current: always uses SOURCE_TAG = "ai-toolkit"
# New: accept --source parameter
SOURCE_TAG = "ai-toolkit"  # default

def cmd_inject(toolkit_path: str, target_path: str, source: str = "") -> None:
    source_tag = source or derive_source_from_filename(toolkit_path)
    # ... tag all entries with _source: source_tag
    # ... strip old entries with same source_tag
    # ... append new entries
```

**inject_hook_cli.py:**
```python
#!/usr/bin/env python3
"""Inject external hooks into ~/.claude/settings.json.

Usage:
  inject_hook_cli.py <hooks-file.json> [target-dir]
  inject_hook_cli.py --remove <hook-source-name> [target-dir]

The source name is derived from the filename stem (e.g., rag-mcp-hooks.json → "rag-mcp-hooks").
All entries are tagged with "_source": "<source-name>" for idempotent updates.
"""
```

**Consistency table — all ai-toolkit extension commands:**

| Command | Target File | Mechanism | Idempotent |
|---------|------------|-----------|------------|
| `inject-rule <file.md>` | `~/.claude/CLAUDE.md` | HTML markers (`<!-- TOOLKIT:name -->`) | Yes |
| `remove-rule <name>` | `~/.claude/CLAUDE.md` | Strip markers | Yes |
| `inject-hook <file.json>` | `~/.claude/settings.json` | JSON `_source` tag | Yes |
| `remove-hook <name>` | `~/.claude/settings.json` | Strip by `_source` | Yes |
| `add-rule <file.md>` | `~/.softspark/ai-toolkit/rules/` | File copy + re-inject all | Yes |

**Success Criteria:**
- [x] `inject-hook ./my-hooks.json` merges hooks with auto-derived `_source` tag
- [x] `remove-hook my-hooks` strips all entries with that `_source`
- [x] Re-running is idempotent (update, not duplicate)
- [x] Existing ai-toolkit hooks (`_source: "ai-toolkit"`) are never touched
- [x] Tests: 17 in test_inject_hook.bats

---

#### 2.2 Manifest-Driven Install System

**Source:** ECC — JSON manifests, state tracking, 5 profiles
**Current state:** Simple profile-based install (minimal/standard/strict)
**Gap:** No granular module selection, no state tracking, no incremental updates

**Files:**

| File | Action | Description |
|------|--------|-------------|
| `manifest.json` | EDIT | Full module manifest with dependencies |
| `scripts/install_steps/install_plan.py` | CREATE | Plan what to install |
| `scripts/install_steps/install_apply.py` | CREATE | Execute install plan |
| `scripts/install_steps/install_state.py` | CREATE | Track installed modules |
| `tests/test_manifest_install.py` | CREATE | Tests |

**manifest.json structure:**
```json
{
  "modules": {
    "core": {
      "description": "Core skills and hooks",
      "files": ["app/skills/commit/*", "app/skills/review/*", "app/hooks/*.sh"],
      "required": true
    },
    "agents": {
      "description": "Specialized agents",
      "files": ["app/agents/*.md"],
      "required": false,
      "default": true
    },
    "rules-common": {
      "description": "Common coding rules",
      "files": ["app/rules/common/*.md"],
      "required": false,
      "default": true
    },
    "rules-typescript": {
      "description": "TypeScript-specific rules",
      "files": ["app/rules/typescript/*.md"],
      "auto_detect": "package.json"
    },
    "rules-python": {
      "description": "Python-specific rules",
      "files": ["app/rules/python/*.md"],
      "auto_detect": "requirements.txt|pyproject.toml|setup.py"
    },
    "mcp-templates": {
      "description": "MCP server templates",
      "files": ["app/mcp-templates/*.json"],
      "required": false
    },
    "rag-mcp-bridge": {
      "description": "rag-mcp integration hooks",
      "files": ["app/hooks/observe-session.sh", "scripts/config/rag-mcp-bridge.yaml"],
      "required": false,
      "requires": ["core"]
    }
  },
  "profiles": {
    "minimal": ["core"],
    "standard": ["core", "agents", "rules-common"],
    "strict": ["core", "agents", "rules-common", "mcp-templates"],
    "full": ["*"]
  }
}
```

**State tracking (~/.softspark/ai-toolkit/state.json):**
```json
{
  "installed_version": "1.2.1",
  "installed_modules": ["core", "agents", "rules-common", "rules-typescript"],
  "installed_at": "2026-04-07T10:00:00Z",
  "last_updated": "2026-04-07T10:00:00Z",
  "file_hashes": {
    "app/hooks/session-start.sh": "abc123..."
  }
}
```

**CLI:**
```bash
ai-toolkit install --profile standard          # Profile-based (existing)
ai-toolkit install --modules core,agents       # Module-based (new)
ai-toolkit install --auto-detect               # Detect language, install matching rules
ai-toolkit update                              # Incremental update (only changed files)
ai-toolkit status                              # Show installed modules
```

**Success Criteria:**
- [x] manifest.json defines all modules with dependencies
- [x] install --modules allows granular selection
- [x] install --auto-detect detects language from project files (two-phase: markers + extensions)
- [x] state.json tracks what's installed
- [x] update only changes modified files (content hash)
- [x] Backward compatible with existing install
- [x] Tests: 35 across test_install.bats, test_install_flags.bats, test_install_state.bats

---

### Faza 3: New Skills (tydzień 6-7)

#### 3.1 Council Skill (/council)

**Source:** ECC — 4-voice decision workflow
**Type:** Hybrid skill (user-invocable: true)

**File:** `app/skills/council/SKILL.md`

**Skill definition:**
```yaml
---
name: council
description: "4-perspective decision evaluation for architecture choices"
user-invocable: true
agent: orchestrator
context: fork
---
```

**Behavior:**
1. User invokes `/council "Should we migrate from Redis to Valkey?"`
2. Spawn 4 sub-agents in parallel:
   - **Advocate:** Strongest case FOR
   - **Critic:** Strongest case AGAINST (devil's advocate)
   - **Pragmatist:** Trade-offs, costs, timeline, team capacity
   - **User-Proxy:** End-user/customer impact
3. Synthesize into structured output:
   - Pros (from Advocate)
   - Cons (from Critic)
   - Trade-offs (from Pragmatist)
   - User Impact (from User-Proxy)
   - **Recommendation** with confidence level

**Success Criteria:**
- [x] `/council` invocable
- [x] 4 perspectives generated
- [x] Structured output with recommendation
- [x] Tests: dedicated council skill contract

---

#### 3.2 Brand Voice Skill (/brand-voice)

**Source:** ECC — canonical voice system
**Type:** Knowledge skill (user-invocable: false, auto-loaded for writing tasks)

**File:** `app/skills/brand-voice/SKILL.md`

**Content:**
- Anti-trope list (banned LLM phrases: "dive into", "game-changer", "cutting-edge", etc.)
- Voice capture template (how to define a project's voice)
- Consistency checks (before outputting content, verify voice match)

**Success Criteria:**
- [x] Skill auto-loads when writing docs/content
- [x] Anti-trope list prevents generic LLM rhetoric
- [x] Tests: dedicated brand-voice skill contract

---

#### 3.3 Agent Introspection Skill (/introspect)

**Source:** ECC — agent-introspection-debugging
**Type:** Task skill (user-invocable: true)

**File:** `app/skills/introspect/SKILL.md`

**Behavior:**
1. Capture current failure/stuck state
2. Classify pattern (loop, wrong approach, missing context, etc.)
3. Suggest smallest recovery action
4. Emit structured introspection report
5. Optionally hand off to verification

**Success Criteria:**
- [x] `/introspect` invocable when agent is stuck
- [x] Classifies failure pattern
- [x] Suggests recovery action
- [x] Tests: dedicated introspect skill contract

---

### Faza 4: Documentation & Marketing (tydzień 8)

#### 4.1 Documentation Site

**Source:** claude-mem (Mintlify, 27 languages)
**Options:**
1. **Starlight (Astro)** — free, static, fast (recommended)
2. **Mintlify** — paid, beautiful, hosted
3. **Docusaurus** — free, React-based

**Structure:**
```
docs/
├── astro.config.mjs
├── src/content/docs/
│   ├── getting-started/
│   │   ├── installation.md
│   │   ├── quick-start.md
│   │   └── first-skill.md
│   ├── skills/
│   │   ├── tier-1.md
│   │   ├── tier-2.md
│   │   └── tier-3.md
│   ├── agents/
│   │   └── catalog.md
│   ├── hooks/
│   │   └── lifecycle.md
│   ├── guides/
│   │   ├── create-skill.md
│   │   ├── create-agent.md
│   │   └── rag-mcp-integration.md
│   └── reference/
│       ├── cli.md
│       └── manifest.md
```

**Success Criteria:**
- :no_entry: SKIPPED — README/CLAUDE.md sufficient, no documentation site needed

---

## 5. Extension API Design (Generic — No Consumer Knowledge)

ai-toolkit provides a **generic extension API**. It does NOT know about any specific consumer. Consumers use the public CLI to register their rules and hooks.

```
┌──────────────────────────────────────────────────────┐
│                   ai-toolkit (generic)                │
│                                                      │
│  Public Extension API:                               │
│    inject-rule  <file.md>     → CLAUDE.md            │
│    remove-rule  <name>        → CLAUDE.md            │
│    inject-hook  <file.json>   → settings.json  [NEW] │
│    remove-hook  <name>        → settings.json  [NEW] │
│    add-rule     <file.md>     → rules/ registry      │
│    mcp add      <template>    → .mcp.json      [NEW] │
│                                                      │
│  Idempotent: markers (rules) / _source tags (hooks)  │
│  ai-toolkit NEVER calls external services            │
└──────────────────────────────────────────────────────┘
                        ▲
                        │ uses API
        ┌───────────────┼───────────────┐
        │               │               │
   rag-mcp          custom-tool     ci-system
   (consumer)       (consumer)      (consumer)
```

**Example: how rag-mcp would consume (handled in rag-mcp repo, NOT here):**
```bash
# rag-mcp install script calls:
npx @softspark/ai-toolkit inject-rule  ./rag-mcp-rules.md      # existing
npx @softspark/ai-toolkit inject-hook  ./rag-mcp-hooks.json    # NEW
# → rag-mcp's hooks + rules are registered, ai-toolkit doesn't care what they do
```

---

## 6. Success Criteria (Overall)

| Metric | Before | Target |
|--------|--------|--------|
| Skills | 88 | ~91 (+3 new skills) |
| Hooks | 14 | 21 (+7, observe-session in rag-mcp) |
| Language rules | ~8 (pattern skills) | 70 (13 langs × 5 + 5 common) |
| MCP templates | 0 | 25 |
| Install granularity | 3 profiles | 3 profiles + module-level |
| Extension API | inject-rule only | inject-rule + inject-hook + mcp add |
| Documentation | README only | Published site |

---

## 7. Risks and Mitigation

| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| Rule files = maintenance burden | Medium | Medium | Auto-generate from ECC (port script), validate.py checks |
| inject-hook misuse by consumers | Low | Medium | Validate JSON schema, reject malformed hooks |
| Manifest install breaks existing | Low | High | Backward compatible, existing CLI preserved |
| Documentation site drift | Medium | Medium | Generate from source (CLAUDE.md → site), CI check |
| Too many hooks slow session start | Low | Medium | Hooks run async, timeout 5s each |

---

## 8. Pre-Mortem

1. **Language rules become stale** — Mitigation: version in frontmatter, validate.py checks freshness
2. **inject-hook consumers conflict** — Mitigation: each consumer has unique `_source` tag, never collide
3. **Manifest install confuses users** — Mitigation: `--profile` still works, manifest is opt-in power feature
4. **Council skill too slow** — Mitigation: parallel sub-agents, timeout 60s per perspective
5. **Documentation out of sync** — Mitigation: CI job: generate docs → diff → fail if stale

---

## 9. Remaining Gaps

All major features shipped. Outstanding items:

1. [x] `validate.py` checks rules format (1.1)
2. [x] Dedicated `test_rules` test file exists (1.1)
3. [x] Dedicated tests exist for council, brand-voice, introspect skills (3.1-3.3)
4. [x] `observe-session.sh` lives in rag-mcp (consumer), not ai-toolkit — by design

---

## 10. Blockers

None — all features are independent of external systems.

---

**Last Updated:** 2026-04-09

---

## kb/reference/distribution-model.md

---
title: "Distribution Model"
category: reference
service: ai-toolkit
tags: [architecture, distribution, symlinks, npm, install]
version: "1.4.2"
created: "2026-03-23"
last_updated: "2026-04-09"
description: "Reference description of how ai-toolkit is delivered and propagated on a developer machine."
---

# Distribution Model

## Summary

`ai-toolkit` uses a split delivery model:

- **npm package** for delivery to the machine,
- **filesystem symlinks and merged files** for propagation into Claude Code directories.

```text
npm install -g @softspark/ai-toolkit   → delivers toolkit files
ai-toolkit install                     → links / merges into ~/.claude/
```

## Why this model exists

The toolkit must be reusable across many projects while remaining easy to update from one place.

This model gives:
- standard installation and versioning,
- instant propagation for symlinked assets,
- predictable update flow for merged / copied assets,
- one source of truth per machine.

## Adopted Strategies

| Layer | Mechanism | Result |
|------|-----------|--------|
| Delivery | npm package | standard install / update UX |
| Agents | per-file symlinks | zero-overhead propagation |
| Skills | per-directory symlinks | zero-overhead propagation |
| Hooks | copied scripts + merged JSON | safe runtime integration |
| Docs / rules | marker injection | user content preserved |

## Trade-offs

### Positive
- easy installation
- clear update path
- global reuse across projects
- low propagation overhead

### Negative
- symlink targets depend on a valid global install location
- merged / copied assets require `ai-toolkit update` after source changes
- all projects on a machine share the same installed toolkit version

## Related Documents

- `kb/reference/global-install-model.md`
- `kb/reference/merge-friendly-install-model.md`
- `kb/reference/architecture-overview.md`

---

## kb/reference/enterprise-config-guide.md

---
title: "Enterprise Config Inheritance Guide"
category: reference
service: ai-toolkit
tags:
  - enterprise
  - config-inheritance
  - extends
  - governance
  - multi-repo
doc_type: reference
created: "2026-04-11"
last_updated: "2026-04-11"
description: "Comprehensive guide for setting up and using ai-toolkit configuration inheritance. Covers base config creation, project setup, enforcement rules, CI integration, and troubleshooting."
---

# Enterprise Config Inheritance Guide

## Overview

Configuration inheritance enables organizations to define a shared base config published as an npm package, Git URL, or local path. Individual projects extend this base via an `extends` field in `.softspark-toolkit.json`. Changes to the base propagate automatically on `ai-toolkit update --local`.

**Pattern:** Mirrors ESLint's `extends`, TypeScript's `extends`, and Prettier's shared configs.

---

## Quick Start

### 1. Create a base config (team lead)

```bash
ai-toolkit config create-base @mycompany/ai-toolkit-config
cd mycompany-ai-toolkit-config

# Edit ai-toolkit.config.json — add your org's rules, agents, enforcement
# Add rule files to rules/
# Add custom agent definitions to agents/

npm publish
```

### 2. Set up a project (developer)

```bash
cd my-project
ai-toolkit config init --extends @mycompany/ai-toolkit-config
ai-toolkit install --local
```

Or manually create `.softspark-toolkit.json`:

```json
{
  "extends": "@mycompany/ai-toolkit-config",
  "profile": "standard",
  "agents": {
    "enabled": ["frontend-specialist"]
  }
}
```

### 3. Verify

```bash
ai-toolkit config validate    # Schema + extends + enforcement
ai-toolkit config diff        # Show differences from base
ai-toolkit config check       # CI enforcement check
```

---

## Configuration Reference

### Project config (`.softspark-toolkit.json`)

| Field | Type | Description |
|-------|------|-------------|
| `extends` | string | Base config source (npm, git URL, local path) |
| `profile` | enum | `minimal`, `standard`, `strict`, `full`, `offline-slm` |
| `agents` | object | `enabled`, `disabled`, `custom` arrays |
| `rules` | object | `inject`, `remove` arrays |
| `constitution` | object | `amendments` array (article 6+ only) |
| `enforce` | object | Non-overridable constraints (base configs only) |
| `overrides` | object | Explicit overrides with justification |

### Base config (`ai-toolkit.config.json`)

Same fields as project config, plus:

| Field | Type | Description |
|-------|------|-------------|
| `name` | string | Package identity (required) |
| `version` | string | Semver version (required) |

### Extends sources

| Source | Syntax | Example |
|--------|--------|---------|
| npm package | `"@scope/pkg"` | `"@mycompany/ai-toolkit-config"` |
| npm + version | `"@scope/pkg@version"` | `"@mycompany/ai-toolkit-config@^2.0.0"` |
| Git URL | `"git+https://..."` | `"git+https://github.com/myco/config.git"` |
| Local path | `"./path"` or `"../path"` | `"../shared-config"` |

---

## Merge Semantics

When a project extends a base, configs are merged with these rules:

| Type | Rule |
|------|------|
| **Dicts** | Recursive deep merge |
| **Lists** | Union (base + project, deduplicated) |
| **Scalars** | Project wins |
| **Agents** | Union enabled, project can disable (unless required) |
| **Rules** | Union inject, project can remove |
| **Constitution** | Base articles immutable, project adds only (6+) |
| **Enforce** | Base wins (cannot weaken, only strengthen) |
| **Profile** | Project can change |

### Merge order (multi-level)

```
grandparent → parent → project
```

Deepest ancestor is resolved first. Max chain depth: 5 levels.

---

## Enforcement

Base configs can define non-overridable constraints via the `enforce` block:

```json
{
  "enforce": {
    "minHookProfile": "standard",
    "requiredPlugins": ["security-pack"],
    "forbidOverride": ["constitution", "guard-destructive"],
    "requiredAgents": ["security-auditor"]
  }
}
```

| Constraint | Effect |
|------------|--------|
| `minHookProfile` | Projects cannot use a weaker hook profile |
| `requiredPlugins` | Must be installed in all projects |
| `forbidOverride` | These components cannot be overridden |
| `requiredAgents` | Must be enabled in all projects |

### Overrides

Projects can override base settings, but must declare intent:

```json
{
  "overrides": {
    "quality-check": {
      "override": true,
      "justification": "Company uses custom lint pipeline via Jenkins",
      "replacement": "skip"
    }
  }
}
```

Requirements:
- `override: true` must be explicit
- `justification` must be at least 20 characters
- Component must not be in `enforce.forbidOverride`

---

## Constitution Immutability

- **Articles I-VI** (toolkit core) are absolutely immutable
- **Base config articles** are immutable — projects cannot modify them
- Projects can **only ADD** new articles (article 7+)

```json
{
  "constitution": {
    "amendments": [
      {"article": 8, "title": "API Standards", "text": "All APIs must be RESTful."}
    ]
  }
}
```

---

## CLI Commands

### `ai-toolkit config validate [path]`

Validates `.softspark-toolkit.json` schema, resolves extends, checks enforcement.

```bash
ai-toolkit config validate
# ✓ schema valid
# ✓ extends resolved: 1 base config(s)
# ✓ no forbidden overrides
# ✓ constitution articles intact
```

### `ai-toolkit config diff [path]`

Shows differences between project config and base.

```bash
ai-toolkit config diff
# Base: @mycompany/ai-toolkit-config@2.1.0
# Profile:     strict (base) → standard (project) ⚠ OVERRIDE
# Agents:
#   + frontend-specialist     (project adds)
#   = security-auditor        (base requires, cannot disable)
```

### `ai-toolkit config init [flags]`

Create `.softspark-toolkit.json` interactively or with flags.

```bash
ai-toolkit config init                                    # interactive
ai-toolkit config init --extends @mycompany/config        # with extends
ai-toolkit config init --no-extends --profile standard    # without extends
ai-toolkit config init --force                            # overwrite existing
```

### `ai-toolkit config create-base <name> [output-dir]`

Scaffold a base config npm package.

```bash
ai-toolkit config create-base @mycompany/ai-toolkit-config
# Creates: mycompany-ai-toolkit-config/
#   package.json, ai-toolkit.config.json, rules/, agents/, README.md
```

### `ai-toolkit config check [path] [--json]`

CI enforcement check. Exit codes: 0 (pass), 1 (fail), 2 (no config).

```bash
ai-toolkit config check --json
# {"status": "pass", "code": 0, "checks": [...]}
```

GitHub Actions example:

```yaml
- name: AI Toolkit Governance Check
  run: |
    npx @softspark/ai-toolkit config check --json
    npx @softspark/ai-toolkit config validate --strict
```

---

## Lock File

`.softspark-toolkit.lock.json` pins exact resolved versions for reproducible installs.

- `install --local` → creates/updates lock file
- `update --local` → re-resolves and updates lock file
- `update --local --refresh-base` → force re-fetch ignoring cache
- Commit `.softspark-toolkit.lock.json` to git for team synchronization

```json
{
  "lockfileVersion": 1,
  "resolved": {
    "@mycompany/ai-toolkit-config": {
      "version": "2.1.0",
      "integrity": "sha256:abc123...",
      "cached": "~/.softspark/ai-toolkit/config-cache/@mycompany/ai-toolkit-config/2.1.0/"
    }
  }
}
```

---

## Offline Support

When npm/git is unavailable:

1. Checks cache (`~/.softspark/ai-toolkit/config-cache/`)
2. If cached version found → uses with warning
3. If not cached → error with instructions

```bash
# Force refresh when back online:
ai-toolkit update --local --refresh-base
```

---

## Troubleshooting

### "Cannot resolve extends"

- Check network connectivity
- Verify npm package name is correct
- For private packages, ensure `.npmrc` has auth configured
- Try `--refresh-base` to clear cache

### "Cannot disable agent — required by base config"

The base config's `enforce.requiredAgents` prevents disabling this agent.
Contact your team lead to request an exemption.

### "Cannot modify Constitution Article X"

Base constitution articles are immutable. You can only ADD new articles with higher numbers.

### "Override requires justification"

All overrides need `"override": true` and a `"justification"` field (min 20 chars).

### "Circular extends detected"

Your extends chain has a loop. Check that base configs don't reference each other cyclically. Max depth is 5 levels.

### Lock file stale

Run `ai-toolkit update --local` to re-resolve and update the lock file.

---

## kb/reference/extension-api.md

---
title: "Extension API Reference"
category: reference
service: ai-toolkit
tags: [extension-api, inject-rule, inject-hook, inject-mcp, mcp-templates, integration, editors]
version: "1.5.0"
created: "2026-04-07"
last_updated: "2026-05-12"
description: "Reference for ai-toolkit's extension API: inject-rule, inject-hook, inject-mcp, remove-* variants, and editor-aware MCP template management."
---

# Extension API Reference

## Overview

ai-toolkit exposes a generic extension API that lets external tools register their own rules and hooks alongside the toolkit's built-in components. The toolkit has no knowledge of any specific consumer — it only provides the injection mechanism. Consumers call the public CLI commands from their own install scripts.

This design is intentional: ai-toolkit is a generic toolkit. Consumers (MCP servers, CI systems, custom tools) use the public API to add their own rules and hooks without modifying toolkit internals.

## Commands

| Command | Target File | Mechanism | Idempotent |
|---------|-------------|-----------|------------|
| `inject-rule <file.md>` | `~/.claude/CLAUDE.md` | HTML comment markers (`<!-- TOOLKIT:name -->`) | Yes |
| `remove-rule <name>` | `~/.claude/CLAUDE.md` | Strip markers by block name | Yes |
| `inject-hook <file.json\|url> [name]` | `~/.claude/settings.json` | JSON `_source` tag per entry, URL cached + registered | Yes |
| `remove-hook <name>` | `~/.claude/settings.json` | Strip all entries with matching `_source`, unregister URL source | Yes |
| `inject-mcp <file.json\|url> [name] [--force]` | `~/.mcp.json` + every editor with `global_path` | JSON `_source` tag per server, URL cached + registered, full editor propagation | Yes |
| `remove-mcp <name>` | `~/.mcp.json` + every editor with `global_path` | Strip all servers with matching `_source`, clean editor configs, unregister URL | Yes |
| `add-rule <file.md\|url>` | `~/.softspark/ai-toolkit/rules/` | File copy + re-inject all rules on next `update` | Yes |
| `mcp add <name...>` | `.mcp.json` | Merge `mcpServers` block from built-in template | Yes |
| `mcp install --editor <name...>` | Native editor MCP config | Render canonical template into editor format | Yes |

## inject-rule

Injects a Markdown rules file into `~/.claude/CLAUDE.md` between named HTML comment markers.

```bash
npx @softspark/ai-toolkit inject-rule ./my-tool-rules.md
```

**Implementation:** `scripts/inject_rule_cli.py` (delegates to `inject_section_cli.py`).

**Markers written:**
```html
<!-- TOOLKIT:my-tool-rules START -->
... content of my-tool-rules.md ...
<!-- TOOLKIT:my-tool-rules END -->
```

The block name is derived from the file stem (`my-tool-rules.md` → `my-tool-rules`). Re-running replaces the existing block — no duplicates. Content outside these markers is never modified.

## remove-rule

Strips a previously injected rule block from `~/.claude/CLAUDE.md`.

```bash
npx @softspark/ai-toolkit remove-rule my-tool-rules
```

The argument is the block name (file stem used during `inject-rule`). If the block is not present, the command exits 0 silently.

## inject-hook

Injects hook entries from a JSON file or HTTPS URL into `~/.claude/settings.json`. Every injected entry is tagged with `"_source": "<source-name>"` where the source name is derived from the filename stem or URL last segment.

```bash
# From local file
npx @softspark/ai-toolkit inject-hook ./my-tool-hooks.json

# From URL (HTTPS only) — cached locally, auto-refreshed on update
npx @softspark/ai-toolkit inject-hook https://example.com/my-tool-hooks.json

# With explicit source name
npx @softspark/ai-toolkit inject-hook https://example.com/hooks.json my-tool-hooks
```

**Implementation:** `scripts/inject_hook_cli.py`, `scripts/hook_sources.py`, `scripts/url_fetch.py`.

**Input format:**
```json
{
  "hooks": {
    "SessionStart": [
      {
        "matcher": "",
        "hooks": [{ "type": "command", "command": "$HOME/.my-tool/hooks/on-start.sh" }]
      }
    ],
    "PostToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [{ "type": "command", "command": "$HOME/.my-tool/hooks/on-edit.sh" }]
      }
    ]
  }
}
```

**Source name derivation:** `my-tool-hooks.json` → source name `"my-tool-hooks"`. For URLs: `https://example.com/path/my-tool-hooks.json` → `"my-tool-hooks"`. All entries are tagged `"_source": "my-tool-hooks"` in settings.json.

**URL support:** When an HTTPS URL is provided, the JSON is fetched, validated, cached in `~/.softspark/ai-toolkit/hooks/external/<name>.json`, and registered in `sources.json`. On every `ai-toolkit update`, URL-sourced hooks are re-fetched and re-injected automatically. If the fetch fails during update, the cached version is used.

**Idempotency:** Re-running strips all existing entries with the same source name, then appends the new ones. No duplicates accumulate.

**Safety:** Entries tagged `"_source": "ai-toolkit"` are never modified or removed by this command. External tools cannot affect the toolkit's own hooks. Only HTTPS URLs are accepted.

**Codex propagation:** Codex-compatible events (`SessionStart`, `PreToolUse`, `PostToolUse`, `UserPromptSubmit`, `Stop`) are automatically propagated to `~/.codex/hooks.json`. Non-Codex events are silently skipped. No extra flags needed.

## remove-hook

Strips all hook entries from `~/.claude/settings.json` that carry a given `_source` tag. If the hook was URL-sourced, also unregisters the URL from `sources.json` and removes the cached file.

```bash
npx @softspark/ai-toolkit remove-hook my-tool-hooks
```

The argument is the source name (file stem used during `inject-hook`). If no entries with that source are present, the command exits 0 silently.

## inject-mcp

Injects an external MCP server template into `~/.mcp.json` (toolkit source-of-truth) and propagates it to every editor that exposes a `global_path` in `EDITOR_SPECS`. Symmetric with `inject-hook` -- accepts both local file paths and HTTPS URLs, with cache + auto-refresh on `ai-toolkit update`.

```bash
# From local file
npx @softspark/ai-toolkit inject-mcp ./rag-mcp-template.json

# From URL (cached locally, auto-refreshed on update)
npx @softspark/ai-toolkit inject-mcp https://example.com/rag-mcp-template.json

# With explicit source name (preferred when filename stem is generic)
npx @softspark/ai-toolkit inject-mcp ./mcp/mcp-template.json --name rag-mcp

# With explicit target dir
npx @softspark/ai-toolkit inject-mcp ./template.json /custom/target --name my-rag

# Force overwrite of servers with a different _source (collision resolution)
npx @softspark/ai-toolkit inject-mcp ./conflict.json --force
```

**Flags:** `--name <name>` overrides the auto-derived source name (works for both local files and URLs). `--force` overwrites servers tagged with a different `_source`. Positional `template-name` is supported only for URL sources (legacy positional grammar inherited from `inject-hook`); for local files use `--name`.

**Implementation:** `scripts/inject_mcp_cli.py`, `scripts/mcp_sources.py`, `scripts/url_fetch.py`.

**Input format:** Same as built-in templates in `app/mcp-templates/`:
```json
{
  "name": "rag-mcp",
  "description": "Multi-tenant RAG over knowledge bases",
  "mcpServers": {
    "rag-mcp": {
      "type": "http",
      "url": "http://localhost:8081/mcp/sse?secret_key=${RAG_MCP_SECRET_KEY}"
    }
  }
}
```

**Source name derivation:** `rag-mcp-template.json` → `"rag-mcp-template"`. For URLs: `https://example.com/rag-mcp-template.json` → `"rag-mcp-template"`. Every server in the `mcpServers` block is tagged with `"_source": "<source-name>"` inside `~/.mcp.json` only; native editor configs receive the same servers **without** the `_source` field (some clients reject unknown keys).

**URL support:** When an HTTPS URL is provided, the JSON is fetched, validated, cached in `~/.softspark/ai-toolkit/mcp-templates/external/<name>.json`, and registered in `sources.json`. On every `ai-toolkit update`, URL-sourced templates are re-fetched and re-injected automatically. If the fetch fails during update, the cached version is used.

**Editor propagation:** Every editor with a `global_path` in `EDITOR_SPECS` is updated -- Claude (`~/.claude.json`), Cursor (`~/.cursor/mcp.json`), GitHub Copilot (`~/.copilot/mcp-config.json`), Gemini CLI (`~/.gemini/settings.json`), Windsurf (`~/.codeium/windsurf/mcp_config.json`), Cline (`~/.cline/data/settings/cline_mcp_settings.json`), Augment (`~/.augment/settings.json`), Codex CLI (`~/.codex/config.toml`). Per-editor failures are non-fatal -- the command reports a warning and continues.

**Idempotency:** Re-running with the same source overwrites entries for that source cleanly -- no duplicates accumulate.

**Collisions:** If a server name in `~/.mcp.json` already exists under a *different* `_source` tag, the command exits with code 3 unless `--force` is passed. Entries tagged `"_source": "ai-toolkit"` are protected even with `--force` -- the built-in template namespace cannot be hijacked.

**Safety:** Only HTTPS URLs are accepted. The source name `ai-toolkit` is reserved.

## remove-mcp

Strips all server entries from `~/.mcp.json` that carry a given `_source` tag, cleans the same server names from every editor `global_path`, and (if URL-sourced) unregisters from `sources.json` and removes the cached file.

```bash
npx @softspark/ai-toolkit remove-mcp rag-mcp-template
```

The argument is the source name (file stem used during `inject-mcp`). If no entries with that source are present, the command exits 0 silently. `ai-toolkit` source is reserved and cannot be removed via this command.

## mcp add / install

Merges one or more MCP server templates from `app/mcp-templates/` into the project's `.mcp.json`.

```bash
ai-toolkit mcp add github                # add a single template
ai-toolkit mcp add github postgres slack  # add multiple at once
ai-toolkit mcp list                       # list all available templates
ai-toolkit mcp editors                    # list supported native adapters
ai-toolkit mcp show github                # print a template's JSON
ai-toolkit mcp install --editor cursor --scope project github --target .
ai-toolkit mcp install --editor codex context7
ai-toolkit mcp remove github             # remove an entry from .mcp.json
ai-toolkit mcp remove github --editor cursor --scope project --target .
```

**Implementation:** `scripts/mcp_manager.py`.

The `add` command merges the `mcpServers` block from the template into `.mcp.json`. If `.mcp.json` does not exist, it is created. If the server name already exists, the entry is overwritten.

The `install` command renders the same canonical template into a native editor config format. Supported adapters currently cover:
- JSON clients with `mcpServers`: Claude Code, Cursor, Gemini CLI, Windsurf, Cline, Augment
- JSON clients with additional required metadata: GitHub Copilot
- TOML clients: Codex CLI

When `install` runs with `--scope project`, ai-toolkit also updates `.mcp.json` so the project-level config remains the source of truth for later sync and local install flows.

## Architecture

```
┌──────────────────────────────────────────────────────┐
│                   ai-toolkit (generic)               │
│                                                      │
│  Public Extension API:                               │
│    inject-rule  <file.md>     → CLAUDE.md            │
│    remove-rule  <name>        → CLAUDE.md            │
│    inject-hook  <file|url>    → settings.json        │
│    remove-hook  <name>        → settings.json        │
│    inject-mcp   <file|url>    → .mcp.json + editors  │
│    remove-mcp   <name>        → .mcp.json + editors  │
│    add-rule     <file|url>    → rules/ registry      │
│    mcp add      <template>    → .mcp.json            │
│    mcp install  <template>    → editor-native MCP    │
│                                                      │
│  Idempotent: markers (rules) / _source tags (hooks)  │
│  URL sources: cached + auto-refreshed on update      │
└──────────────────────────────────────────────────────┘
                        ▲
                        │ uses API
        ┌───────────────┼───────────────┐
        │               │               │
   rag-mcp          custom-tool     ci-system
   (consumer)       (consumer)      (consumer)
```

## Example: Registering Rules, Hooks, and MCP Servers from an External Tool

An external tool's install script would call:

```bash
# Register rules into CLAUDE.md
npx @softspark/ai-toolkit inject-rule ./rules/my-tool-rules.md

# Register hooks into settings.json (auto-propagates to Codex)
npx @softspark/ai-toolkit inject-hook ./hooks/my-tool-hooks.json

# Register MCP server template into .mcp.json + all editor MCP configs
npx @softspark/ai-toolkit inject-mcp ./mcp-template.json

# Alternative: pull MCP template from a URL (auto-refreshed on update)
npx @softspark/ai-toolkit inject-mcp https://example.com/mcp-template.json
```

To uninstall:

```bash
npx @softspark/ai-toolkit remove-rule my-tool-rules
npx @softspark/ai-toolkit remove-hook my-tool-hooks
npx @softspark/ai-toolkit remove-mcp my-tool
```

All operations are idempotent — safe to run on every install or update.

## Related Documentation

- [PATH: kb/reference/hooks-catalog.md] — built-in hooks reference
- [PATH: kb/reference/mcp-templates.md] — available MCP server templates
- [PATH: kb/reference/mcp-editor-compatibility.md] — native editor MCP support matrix
- [PATH: kb/reference/architecture-overview.md] — overall install model

---

## kb/reference/global-install-model.md

---
title: "Global Install Model"
category: reference
service: ai-toolkit
tags: [install, global, claude, codex, plugins, local-setup]
version: "3.0.1"
created: "2026-03-26"
last_updated: "2026-04-28"
description: "Reference description of the global install target, project-local editor setup, global Codex plugin layering, and command responsibilities in ai-toolkit."
---

# Global Install Model

## Summary

`ai-toolkit` installs globally into `~/.claude/` by default.

That means one machine-level install provides agents, skills, hooks, and rules to every project without committing toolkit boilerplate into each repository.

Other editor targets are opt-in and only use documented file surfaces. Cursor
rules stay project-local because Cursor's global user rules are managed through
the settings UI, not a stable merge-safe file. Codex remains project-local for
the core toolkit install, but experimental plugin packs can layer a global
Codex target in `HOME` when explicitly installed with
`ai-toolkit plugin install --editor codex`.

## Command Responsibilities

| Command | Target | Purpose |
|---------|--------|---------|
| `ai-toolkit install` | `~/.claude/` | first-time machine setup |
| `ai-toolkit update` | `~/.claude/` | re-apply after package or rule changes |
| `ai-toolkit install --local` | current project | Claude Code configs only (CLAUDE.md, settings, constitution, language rules). Add `--editors all` for other tools, or `--editors cursor,aider` for specific ones. Auto-detects editors from existing project files when `--editors` is omitted. |
| `ai-toolkit install --local --lang <lang>` | current project | explicit language selection for rules (e.g. `--lang typescript`, `--lang go,python`); auto-detected when omitted |
| `ai-toolkit install --modules <list>` | `~/.claude/` | selective module install (e.g. `--modules core,agents,rules-typescript`) |
| `ai-toolkit update --local` | current project | refresh project configs; auto-detects editors from existing files |
| `ai-toolkit add-rule` | `~/.softspark/ai-toolkit/rules/` | register a global rule |
| `ai-toolkit remove-rule` | `~/.softspark/ai-toolkit/rules/` | unregister a global rule |
| `ai-toolkit mcp add <name...>` | current project | merge MCP templates into `.mcp.json` |
| `ai-toolkit mcp install --editor <name...>` | editor-native config | render MCP templates into editor-specific config files |
| `ai-toolkit plugin install --editor claude|codex|all <name>` | runtime-native config | install plugin pack for selected runtime(s) |
| `ai-toolkit plugin update --editor claude|codex|all <name>` | runtime-native config | re-apply plugin pack after toolkit updates |
| `ai-toolkit plugin remove --editor claude|codex|all <name>` | runtime-native config | remove plugin pack from selected runtime(s) |

## Install Profiles (v3.0.0)

The `--profile` flag controls how much of each editor's native surface is activated.

| Profile | What runs | Use when |
|---------|-----------|----------|
| `minimal` | Agents and skills only. No editor generators beyond pointer skills for editors that require them. | You want the smallest possible footprint, or you manage editor configs by hand. |
| `standard` (default) | Claude Code + editor rule files. Includes **Gemini hooks** and the **Copilot directory layout** (v3.0.0 change from prior `standard`). | Day-to-day installs. Most users. |
| `strict` | Everything in `standard` plus git-hook wiring for commit-time safety checks. | Solo dev or tight team with zero tolerance for drift. |
| `full` | Every native surface across every editor: hooks, sub-agents, custom commands, skill pointers for Cursor / Windsurf / Gemini / Augment / Antigravity. | You want maximum coverage and understand that each editor will carry generated files under its own layout. |

`--codex-skills` is an independent opt-in flag (not part of profile) that materializes the full skill catalog under `.agents/skills/` for Codex. Other editors stay on compat-read or the per-editor pointer skill.

## Global Editor Targets

`ai-toolkit install --editors <name>` can write global files only for editors
with documented, file-based config surfaces:

- `windsurf`: `~/.codeium/windsurf/memories/global_rules.md` plus `~/.codeium/windsurf/skills/ai-toolkit-skill-catalogue/SKILL.md`
- `gemini`: `~/.gemini/GEMINI.md`
- `augment`: `~/.augment/rules/ai-toolkit.md`
- `cline`: `~/.cline/rules/ai-toolkit-*.md` plus `~/.cline/skills/ai-toolkit-skill-catalogue/SKILL.md`
- `roo`: `~/.roo/rules/ai-toolkit-*.md`
- `aider`: `~/.aider.conf.yml` plus `~/.aider-ai-toolkit-CONVENTIONS.md` when the YAML file does not already exist
- `codex`: `~/AGENTS.md`, `~/.agents/rules/*`, `~/.agents/skills/*`, `~/.codex/hooks.json`
- `opencode`: `~/.config/opencode/*`

Cursor, GitHub Copilot, and Google Antigravity rule installs stay project-local.
Their global MCP support, where available, is handled by `ai-toolkit mcp
install`, not by the rule installer.

## Why global install is the default

- less setup friction,
- no repeated per-project install step,
- easier machine-level upgrades,
- correct alignment with Claude Code user-level paths.

## What remains project-local

These files still stay local to a repository as part of the core install model:
- `CLAUDE.md`
- `.claude/settings.local.json`
- `.mcp.json`
- `.cursor/mcp.json`
- `.roo/mcp.json`
- `.github/mcp.json`
- `.claude/constitution.md`
- project `AGENTS.md`
- project `.agents/rules/*.md`
- project `.agents/skills/*`
- project `.codex/hooks.json`
- `.github/copilot-instructions.md`
- `.clinerules`
- `.roomodes`
- `.aider.conf.yml`
- `.augment/rules/ai-toolkit-*.md`
- `.agent/rules/*.md` and `.agent/workflows/*.md` (Google Antigravity)
- `.git/hooks/pre-commit` (fallback)
- project-specific documentation or safety overlays

Hooks do **not** live in project-local settings. They are merged only into global `~/.claude/settings.json`.

Codex is the exception in terms of file location, not hook ownership: its local
`.codex/hooks.json` points to hook scripts already installed globally in
`~/.softspark/ai-toolkit/hooks/`.

## Codex Local Install Behavior

`ai-toolkit install --local --editors codex` creates:

- `AGENTS.md`
- `.agents/rules/*.md`
- `.agents/skills/*`
- `.codex/hooks.json`

Native Codex-compatible skills are linked directly. Claude-oriented skills that
depend on `Agent`, `Team*`, or `Task*` primitives are translated into generated
Codex wrappers so the project still receives the full skill catalog.

## Codex Global Plugin Layer

`ai-toolkit plugin install --editor codex <pack>` additionally targets:

- `~/AGENTS.md`
- `~/.agents/rules/*.md`
- `~/.agents/skills/*`
- `~/.codex/hooks.json`

This is not the default core install path. It is an explicit, opt-in plugin
layer used only for plugin packs. Runtime state is tracked in
`~/.softspark/ai-toolkit/plugins.json` per target (`claude`, `codex`).

## MCP Local Sync Behavior

If `.mcp.json` exists in the current project, `ai-toolkit install --local` mirrors its `mcpServers` block into:
- `.claude/settings.local.json`
- `.cursor/mcp.json` when `--editors cursor` is selected
- `.github/mcp.json` when `--editors copilot` is selected
- `.roo/mcp.json` when `--editors roo` is selected

Global-only editor MCP configs are not written during `install --local`. Use `ai-toolkit mcp install --editor <name...>` for those targets.

## Related Documents

- `kb/reference/distribution-model.md`
- `kb/reference/merge-friendly-install-model.md`
- `kb/reference/codex-cli-compatibility.md`
- `kb/reference/mcp-editor-compatibility.md`

---

## kb/reference/hierarchical-override-pattern.md

---
title: "Hierarchical Override Pattern"
category: reference
service: ai-toolkit
description: "Convention for SKILL.md + reference/*.md relationship with explicit override semantics."
tags: [skills, architecture, patterns, override]
created: 2026-04-01
last_updated: 2026-04-01
---

# Hierarchical Override Pattern

## Overview

Skills in ai-toolkit follow a two-level content hierarchy: a master `SKILL.md`
file that defines global defaults and the main instruction flow, and optional
`reference/*.md` files that extend and specialize without contradicting the
master.

This document defines the conventions, override semantics, and splitting
criteria for this pattern.

## Architecture

```
app/skills/<skill-name>/
  SKILL.md                    # Master: global defaults, main flow
  reference/
    domain-a.md               # Extension: adds detail for domain A
    domain-b.md               # Extension: adds detail for domain B
    visual-companion.md       # Extension: visual/UI-specific guidance
```

## Roles

### SKILL.md (Master)

The `SKILL.md` file is the single source of truth for a skill. It defines:

- **Purpose and scope** of the skill.
- **Global defaults** that apply unless overridden by context.
- **Main instruction flow** -- the step-by-step process the agent follows.
- **Cross-references** to reference files (explicit `see reference/X.md` links).
- **Invocation metadata** (frontmatter: `disable-model-invocation`, etc.).

The master file is always loaded. It is the entry point for the agent.

### reference/*.md (Extensions)

Reference files extend the master by providing:

- **Domain-specific detail** that would bloat the master.
- **Lookup tables** (e.g., language-specific patterns, framework configs).
- **Specialized workflows** that apply in narrow contexts.
- **Examples and templates** too long for inline inclusion.

Reference files are loaded on demand -- only when the agent determines the
context requires them, or when the master explicitly references them.

## Override Semantics

The relationship between master and reference files follows strict rules:

### Rule 1: Reference files ADD, never REPLACE

A reference file must not contradict the master. It adds specificity within the
boundaries the master defines.

```
SKILL.md says:     "Use type hints on all public APIs"
reference/go.md:   "Use Go's built-in type system; exported functions are public APIs"
```

This is valid -- it specializes the general rule for Go without contradicting it.

```
SKILL.md says:     "Always validate input at the API boundary"
reference/perf.md: "Skip validation for internal microservice calls"
```

This is INVALID -- it contradicts the master. If an exception is needed, it must
be documented in the master itself with explicit conditions.

### Rule 2: Master defines the contract, references fill in the details

Think of it as interface vs. implementation:

| Layer | Defines | Example |
|-------|---------|---------|
| Master | "Validate all inputs" | General principle |
| Reference | "In Python, use Pydantic v2 BaseModel with Field validators" | Concrete implementation |

### Rule 3: Conflicts are resolved by the master

If a reference file and the master appear to conflict, the master wins. This
should be treated as a bug in the reference file and corrected.

### Rule 4: References may cross-reference each other

Reference files can link to other reference files, but the dependency graph
should remain shallow (max 2 levels deep). Deep chains make maintenance
difficult.

## When to Split

Split content from `SKILL.md` into `reference/*.md` when:

| Criterion | Threshold |
|-----------|-----------|
| **Total line count** | Master exceeds 300 lines |
| **Distinct sub-domains** | Content covers 3+ distinct domains (languages, frameworks, concerns) |
| **Lookup tables** | Tables with 20+ rows that serve as reference material |
| **Reuse potential** | Content could be useful to multiple skills |
| **Update frequency** | A section changes much more frequently than the rest |

Do NOT split when:

- The master is under 300 lines and covers a single domain.
- The "reference" content is only a few paragraphs.
- Splitting would force the agent to always load multiple files for basic
  operation.

## Examples from Existing Skills

### write-a-prd

```
app/skills/write-a-prd/
  SKILL.md                          # Main PRD creation flow
  reference/visual-companion.md     # Visual/UI-specific PRD guidance
```

- `SKILL.md` defines the interview-driven PRD process, output format, and
  quality criteria.
- `reference/visual-companion.md` extends with guidance for PRDs that involve
  visual interfaces -- design system references, wireframe conventions,
  accessibility requirements.
- The master references it: `"For visual products, see reference/visual-companion.md"`

### clean-code

```
app/skills/clean-code/
  SKILL.md                    # Universal clean code principles
  reference/python.md         # Python-specific patterns
  reference/typescript.md     # TypeScript-specific patterns
  reference/php.md            # PHP-specific patterns
  reference/go.md             # Go-specific patterns
  reference/dart.md           # Dart/Flutter-specific patterns
```

- `SKILL.md` defines language-agnostic principles (naming, SRP, DRY).
- Each `reference/<lang>.md` provides language-specific idioms, linting config,
  and type system patterns.
- The master links to them: `"For Python patterns, see reference/python.md"`

### testing-patterns

```
app/skills/testing-patterns/
  SKILL.md                             # Universal testing principles (AAA, org, targets)
  reference/python-pytest.md           # pytest specifics
  reference/typescript-vitest.md       # Vitest/Jest specifics
  reference/php-phpunit.md             # PHPUnit specifics
  reference/go-testing.md              # Go testing specifics
  reference/flutter-testing.md         # Flutter/Dart testing specifics
```

Same pattern: master defines the universal structure, references specialize per
ecosystem.

## Authoring Guidelines

1. **Master first.** Always write the `SKILL.md` completely before splitting.
   Premature splitting leads to fragmented, hard-to-follow skills.

2. **Explicit cross-references.** Every reference file must be linked from the
   master with a clear sentence explaining when to consult it.

3. **Self-contained references.** A reference file should be useful on its own
   for someone who has already read the master. Do not assume the reader will
   re-read the master alongside it.

4. **Consistent frontmatter.** Reference files do not need frontmatter unless
   they are independently searchable. If they are, use the same YAML format as
   the master.

5. **Naming convention.** Use kebab-case filenames that describe the domain:
   `python.md`, `visual-companion.md`, `database-patterns.md`. Avoid generic
   names like `extra.md` or `notes.md`.

## Anti-Patterns

| Anti-Pattern | Problem | Fix |
|--------------|---------|-----|
| Reference contradicts master | Agent gets conflicting instructions | Move exception to master with conditions |
| Master too thin | Agent lacks context without loading all references | Keep core flow in master, only split detail |
| Circular references | Infinite loading, confused agent | Keep dependency graph acyclic and shallow |
| Unnamed splits | `misc.md`, `extra.md` -- no signal about content | Use descriptive domain-based names |
| Over-splitting | 10+ reference files for a simple skill | Consolidate until the 300-line / 3-domain threshold justifies splitting |

---

## kb/reference/hooks-catalog.md

---
title: "Hooks Catalog"
category: reference
service: ai-toolkit
tags: [hooks, quality, safety, enforcement, settings.json]
version: "1.5.7"
created: "2026-03-27"
last_updated: "2026-05-25"
description: "Complete reference of all ai-toolkit hooks: events, scripts, installation, and runtime behavior."
---

# Hooks Catalog

## Overview

ai-toolkit provides 29 global hook entries across 14 lifecycle events that enforce quality, safety, and workflow rules across all Claude Code sessions. Hooks are merged into `~/.claude/settings.json` on install, with logic in standalone scripts at `~/.softspark/ai-toolkit/hooks/`.

## Supported Surface

`scripts/validate.py` validates both event names and handler shapes before release. The accepted lifecycle surface includes `PostToolUseFailure`, `PostToolBatch`, and `UserPromptExpansion` in addition to the installed ai-toolkit events below.

Supported handler types are `command`, `http`, `prompt`, `agent`, and `mcp_tool`. ai-toolkit ships command hooks by default; non-command handlers are validated so external consumers can safely inject richer hook definitions through `inject-hook`.

## Installation

```bash
ai-toolkit install    # copies scripts to ~/.softspark/ai-toolkit/hooks/, merges into settings.json
ai-toolkit update     # re-copies scripts, re-merges (idempotent)
```

**File locations:**
- Scripts: `~/.softspark/ai-toolkit/hooks/*.sh`
- Config: `~/.claude/settings.json` → `hooks` key
- Source: `ai-toolkit/app/hooks/*.sh` + `app/hooks.json`

## Hook Events

### SessionStart — `session-start.sh`

| Field | Value |
|-------|-------|
| Event | `SessionStart` |
| Matcher | `startup\|compact` |
| Script | `~/.softspark/ai-toolkit/hooks/session-start.sh` |
| Fires | Session start + after context compaction |

**Actions:**
1. Injects MANDATORY reminder to follow CLAUDE.md rules
2. Injects REMINDER about tests and documentation
3. Loads session context from `.claude/session-context.md` (if exists)
4. Loads active instincts from `.claude/instincts/*.md` (if any)

By default the hook performs session-state reset, stale search-flag cleanup, and
update notification side effects without printing informational stdout. Set
`AI_TOOLKIT_HOOK_VERBOSE=1` to print the startup reminders and loaded context
for debugging; `AI_TOOLKIT_HOOK_QUIET=1` keeps it silent explicitly.

### Notification — `notify-waiting.sh`

| Field | Value |
|-------|-------|
| Event | `Notification` |
| Matcher | *(all)* |
| Script | `~/.softspark/ai-toolkit/hooks/notify-waiting.sh` |
| Fires | Claude Code waiting for user input |

**Action:** Cross-platform desktop notification ("Claude Code needs your attention"):
- macOS: `osascript` (native)
- Linux: `notify-send` (libnotify)
- Windows/WSL: `powershell.exe` WScript popup (5s auto-close)

### PreToolUse (Bash) — `guard-destructive.sh`

| Field | Value |
|-------|-------|
| Event | `PreToolUse` |
| Matcher | `Bash` |
| Script | `~/.softspark/ai-toolkit/hooks/guard-destructive.sh` |
| Fires | Before any Bash command |

**Action:** Blocks (exit 2) commands matching destructive patterns:
- `rm -rf`, `sudo rm`
- `DROP TABLE`, `DROP DATABASE`, `TRUNCATE`
- `format /`, `dd if=`
- `git push --force`
- `chmod -R 777`

**Exemptions (avoid false positives):**
- `git push --force-with-lease` / `--force-if-includes` — the safe force-push variants are allowed.
- A single, non-chained `echo`/`printf`/`git commit`/`git tag` carrying a destructive token as *data* (e.g. a commit message mentioning `DROP TABLE`) is allowed. Chained commands (`&&`, `;`, `|`) are still inspected in full.

### PreToolUse (file ops) — `guard-path.sh`

| Field | Value |
|-------|-------|
| Event | `PreToolUse` |
| Matcher | `Bash\|Read\|Edit\|Write\|MultiEdit\|Glob\|Grep\|NotebookEdit\|mcp__filesystem__.*` |
| Script | `~/.softspark/ai-toolkit/hooks/guard-path.sh` |
| Fires | Before any file access tool (including Bash, MCP filesystem) |

**Action:** Blocks (exit 2) when a path contains `/Users/<wrong>` or `/home/<wrong>` that doesn't match the actual `$HOME`. Prevents Claude from hallucinating or confusing similar usernames (common with non-ASCII names like Polish names).

**Feedback to Claude:** Tells it to use `~`, `$HOME`, or run `echo $HOME` instead of guessing.

### UserPromptSubmit — `user-prompt-submit.sh`

| Field | Value |
|-------|-------|
| Event | `UserPromptSubmit` |
| Matcher | *(all)* |
| Script | `~/.softspark/ai-toolkit/hooks/user-prompt-submit.sh` |
| Fires | Before Claude starts working on a submitted prompt |

**Action:** Maintains the per-session search-first flag used by Stop enforcement
and can provide a lightweight governance reminder: plan mode for architectural
work, evidence-first debugging, KB-first research, and validation expectations.

Skipped when `TOOLKIT_HOOK_PROFILE=minimal`. The bundled `app/hooks.json`
registers this command with `AI_TOOLKIT_HOOK_QUIET=1 AI_TOOLKIT_HOOK_FORMAT=json`.
This keeps the hook visually quiet (`suppressOutput: true`) while still
injecting event-specific JSON context before Claude starts working in runtimes
that consume hidden context:

```json
{
  "hookSpecificOutput": {
    "hookEventName": "UserPromptSubmit",
    "additionalContext": "..."
  },
  "suppressOutput": true
}
```

That context is the proactive half of search-first enforcement; the paired
`stop-search-check.sh` remains the corrective half. In plain-text mode,
informational reminders are silent by default and require
`AI_TOOLKIT_HOOK_VERBOSE=1`.

Codex-generated hooks intentionally run this script without
`AI_TOOLKIT_HOOK_FORMAT=json` by default because Codex renders
`additionalContext` visibly in the TUI; the search-first flag side effect still
arms the corrective Stop hook.

### UserPromptSubmit (usage tracking) — `track-usage.sh`

| Field | Value |
|-------|-------|
| Event | `UserPromptSubmit` |
| Matcher | *(all)* |
| Script | `~/.softspark/ai-toolkit/hooks/track-usage.sh` |
| Fires | Before Claude starts working on a submitted prompt |

**Action:** Records skill invocations (slash commands like `/commit`, `/review`) to `~/.softspark/ai-toolkit/stats.json` for local usage analytics. Non-slash prompts are ignored. Stats writes are best-effort and stay silent if the local state path is not writable.

### PostToolUse (edit feedback) — `post-tool-use.sh`

| Field | Value |
|-------|-------|
| Event | `PostToolUse` |
| Matcher | `Edit\|MultiEdit\|Write` |
| Script | `~/.softspark/ai-toolkit/hooks/post-tool-use.sh` |
| Fires | After file edit/write tool operations |

**Action:** Adds a lightweight reminder to run relevant validation, tests, and documentation updates after edits.

Skipped when `TOOLKIT_HOOK_PROFILE=minimal`.

### Stop (quality check) — `quality-check.sh`

| Field | Value |
|-------|-------|
| Event | `Stop` |
| Matcher | *(all)* |
| Script | `~/.softspark/ai-toolkit/hooks/quality-check.sh` |
| Fires | After every Claude response |

**Action:** Runs language-appropriate linter:
- Python: `ruff check .`
- TypeScript: `npx tsc --noEmit`
- PHP: `vendor/bin/phpstan analyse`
- Dart: `dart analyze`
- Go: `go vet ./...`

Skipped when `TOOLKIT_HOOK_PROFILE=minimal`.

### Stop (session save) — `save-session.sh`

| Field | Value |
|-------|-------|
| Event | `Stop` |
| Matcher | *(all)* |
| Script | `~/.softspark/ai-toolkit/hooks/save-session.sh` |
| Fires | After every Claude response |

**Action:** Writes enriched session context to `.claude/session-context.md` for cross-session persistence. Captures:
- Session ID and last assistant message (first 5 lines)
- Git branch, uncommitted change count, and diff stat (last 5 lines)
- Agent-written checkpoints from `.claude/session-context.md.checkpoints` (if present — written by proactive checkpointing per Constitution Art. I §5)

Skipped when `TOOLKIT_HOOK_PROFILE=minimal`.

### Stop (quality gate) — `quality-gate.sh`

| Field | Value |
|-------|-------|
| Event | `Stop` |
| Matcher | *(all)* |
| Script | `~/.softspark/ai-toolkit/hooks/quality-gate.sh` |
| Fires | Before Claude is allowed to finish a response |

**Action:** Runs lint/typecheck. **Blocks stopping (exit 2)** if errors found, so Claude must continue and fix the issues. Missing local tooling is reported as skipped rather than blocking the session.

Skipped when `TOOLKIT_HOOK_PROFILE=minimal`.

### TaskCompleted — `quality-gate.sh`

| Field | Value |
|-------|-------|
| Event | `TaskCompleted` |
| Matcher | *(all)* |
| Script | `~/.softspark/ai-toolkit/hooks/quality-gate.sh` |
| Fires | When an Agent Teams task is marked complete |

**Action:** Runs lint/typecheck. **Blocks completion (exit 2)** if errors found. Strict profile also runs `mypy --strict`.

Skipped when `TOOLKIT_HOOK_PROFILE=minimal`.

### SubagentStart — `subagent-start.sh`

| Field | Value |
|-------|-------|
| Event | `SubagentStart` |
| Matcher | *(all)* |
| Script | `~/.softspark/ai-toolkit/hooks/subagent-start.sh` |
| Fires | When a subagent is spawned |

**Action:** Reminds subagents to stay narrow in scope, gather evidence first, and return explicit validation notes.

Skipped when `TOOLKIT_HOOK_PROFILE=minimal`.

### SubagentStop — `subagent-stop.sh`

| Field | Value |
|-------|-------|
| Event | `SubagentStop` |
| Matcher | *(all)* |
| Script | `~/.softspark/ai-toolkit/hooks/subagent-stop.sh` |
| Fires | When a subagent completes |

**Action:** Enforces a concise handoff checklist: findings, files touched, tests run, risks, and docs follow-up.

Skipped when `TOOLKIT_HOOK_PROFILE=minimal`.

### PreCompact — `pre-compact.sh`

| Field | Value |
|-------|-------|
| Event | `PreCompact` |
| Matcher | *(all)* |
| Script | `~/.softspark/ai-toolkit/hooks/pre-compact.sh` |
| Fires | Before context compaction |

**Actions (prioritized — higher priority items survive tighter token budgets):**
1. **Mandatory reload reminder** — always emitted, instructs Claude to re-read CLAUDE.md and active tasks
2. **Active instincts** — lists each instinct with confidence score and pattern name from `.claude/instincts/*.md`
3. **Session context** — preserves task state from `.claude/session-context.md` (if exists)
4. **Git working state** — branch name, uncommitted change count, last commit (if inside a git repo)
5. **Key decisions** — last 10 lines from `.claude/decisions.md` (if exists)

Skipped when `TOOLKIT_HOOK_PROFILE=minimal`.

### SessionEnd — `session-end.sh`

| Field | Value |
|-------|-------|
| Event | `SessionEnd` |
| Matcher | *(all)* |
| Script | `~/.softspark/ai-toolkit/hooks/session-end.sh` |
| Fires | When a Claude session ends |

**Action:** Writes `.claude/session-end.md` with a lightweight handoff note for the next session and reminds the next session to review preserved context.

Skipped when `TOOLKIT_HOOK_PROFILE=minimal`.

### TeammateIdle — inline

| Field | Value |
|-------|-------|
| Event | `TeammateIdle` |
| Matcher | *(all)* |
| Fires | Agent Teams teammate going idle |

**Action:** Reminds teammate to verify: files modified, tests written, docs updated.

---

## New Hooks (v1.1.0)

### PreToolUse (config guard) — `guard-config.sh`

| Field | Value |
|-------|-------|
| Event | `PreToolUse` |
| Matcher | `Edit\|Write\|MultiEdit` |
| Script | `~/.softspark/ai-toolkit/hooks/guard-config.sh` |
| Fires | Before any file write/edit operation |

**Action:** Blocks (exit 2) edits to linter and formatter config files — `.eslintrc`, `.eslintrc.*`, `eslint.config.*`, `.prettierrc`, `.prettierrc.*`, `prettier.config.*`, `tsconfig.json`, `tsconfig.*.json` — unless the request contains an explicit acknowledgment phrase (e.g. "intentionally editing config"). Returns a human-readable explanation to Claude so it can ask the user for confirmation before retrying.

### SessionStart — `mcp-health.sh`

| Field | Value |
|-------|-------|
| Event | `SessionStart` |
| Matcher | *(all)* |
| Script | `~/.softspark/ai-toolkit/hooks/mcp-health.sh` |
| Fires | Session start |

**Action:** Non-blocking (always exits 0). Reads MCP server definitions from `~/.claude/settings.json` and any local `.mcp.json`. For each configured server, checks whether the required runtime command (`npx`, `uvx`, `docker`, etc.) is available in `$PATH`. Emits warnings for any missing runtimes, including install hints (e.g. "npm install -g npx"). Helps surface MCP misconfiguration early without interrupting the session.

### PostToolUse (governance) — `governance-capture.sh`

| Field | Value |
|-------|-------|
| Event | `PostToolUse` |
| Matcher | *(all)* |
| Script | `~/.softspark/ai-toolkit/hooks/governance-capture.sh` |
| Fires | After any tool use |

**Action:** Non-blocking (always exits 0). Logs security-sensitive operations (Bash commands, file writes to sensitive paths, large writes) to `~/.softspark/ai-toolkit/governance.log` with ISO timestamp, session ID, tool name, and a content excerpt. Skipped when `TOOLKIT_HOOK_PROFILE=minimal`.

### PreCompact — `pre-compact-save.sh`

| Field | Value |
|-------|-------|
| Event | `PreCompact` |
| Matcher | *(all)* |
| Script | `~/.softspark/ai-toolkit/hooks/pre-compact-save.sh` |
| Fires | Before context compaction |

**Action:** Saves a timestamped context snapshot to `~/.softspark/ai-toolkit/compactions/YYYY-MM-DD_HH-MM-SS.txt`. Captures session ID, working directory, git branch and status, and environment metadata. Provides an audit trail of what was in context at each compaction point. Skipped when `TOOLKIT_HOOK_PROFILE=minimal`.

### PreToolUse (commit quality) — `commit-quality.sh`

| Field | Value |
|-------|-------|
| Event | `PreToolUse` |
| Matcher | `Bash` |
| Script | `~/.softspark/ai-toolkit/hooks/commit-quality.sh` |
| Fires | Before any Bash command |

**Action:** Non-blocking (always exits 0). Inspects Bash commands containing `git commit`. Extracts the commit message from the `-m` flag and checks it against Conventional Commits format (`type: description`, where type is one of feat/fix/docs/refactor/test/chore/ci/perf/style/revert). Emits an advisory warning if the message does not match — the commit is not blocked, only nudged. Commands without `git commit` or without a `-m` message (e.g. interactive commits) are ignored.

### SessionStart — `session-context.sh`

| Field | Value |
|-------|-------|
| Event | `SessionStart` |
| Matcher | *(all)* |
| Script | `~/.softspark/ai-toolkit/hooks/session-context.sh` |
| Fires | Session start |

**Action:** Captures an environment snapshot to `~/.softspark/ai-toolkit/sessions/current-context.json`. Records working directory, git branch, git status summary, Node.js version, Python version, and timestamp. Used by other hooks and tools to access session metadata without re-running discovery commands. Skipped when `TOOLKIT_HOOK_PROFILE=minimal`.

---

## New Hooks (Constitution Art. VI Enforcement)

These six hooks turn Constitution Article VI ("Repair Discipline") from prose into executable enforcement. Each one closes a specific gap that previously relied on agent goodwill.

### PreToolUse (revert protection) — `revert-guard.sh`

| Field | Value |
|-------|-------|
| Event | `PreToolUse` |
| Matcher | `Bash` |
| Script | `~/.softspark/ai-toolkit/hooks/revert-guard.sh` |
| Fires | Before any Bash command |

**Action:** Blocks (exit 2) `git checkout/restore -- <file>`, `git reset --hard`, or `git clean -fd` when the affected files were edited in the current session (per `session_state.py` log). Forces the agent to fix root causes instead of reverting work-in-progress (Art. VI.2). Branch switches (`git checkout main`) and reverts on untouched files pass through unchanged.

**Override (one-off):** `CLAUDE_REVERT_OK=1`. Skipped when `TOOLKIT_HOOK_PROFILE=minimal`.

### PostToolUse (test cohesion) — `test-cohesion.sh`

| Field | Value |
|-------|-------|
| Event | `PostToolUse` |
| Matcher | `Edit\|MultiEdit\|Write` |
| Script | `~/.softspark/ai-toolkit/hooks/test-cohesion.sh` |
| Fires | After every file edit |

**Action:** Runs the test commands mapped to the edited path via `test-cohesion-map.json`. Lookup order: project-local `.claude/test-cohesion-map.json`, then toolkit default `app/hooks/test-cohesion-map.json`. Blocks (exit 2) when the related test command fails. Runs **only** related tests, not the full suite (one of the user-stated requirements).

Map schema:
```json
[
  {
    "match": "src/auth/*.py",
    "tests": ["tests/test_auth.py"],
    "runner": "pytest",
    "command": null
  }
]
```
First-match-wins per file. Built-in runners: `bats`, `pytest`, `vitest`, `jest`. Use `"command"` for full overrides.

**Overrides:** `CLAUDE_SKIP_COHESION=1` (one-off), `CLAUDE_HOOK_BOOTSTRAP=1` (when editing the hook itself). Skipped when `TOOLKIT_HOOK_PROFILE=minimal`.

### PostToolUse (loop guard) — `loop-guard.sh`

| Field | Value |
|-------|-------|
| Event | `PostToolUse` |
| Matcher | `Bash\|Edit\|MultiEdit\|Write` |
| Script | `~/.softspark/ai-toolkit/hooks/loop-guard.sh` |
| Fires | After every Bash command or file edit |

**Action:** Advisory only, never blocks. Hashes a `tool|identity` signature of each action (command for Bash; file path + new content for edits) and keeps the last `AI_TOOLKIT_LOOP_WINDOW` (default 6) hashes per session. When the same signature repeats `AI_TOOLKIT_LOOP_THRESHOLD` times (default 3) it emits a `PostToolUse` `additionalContext` advisory telling Claude to reassess instead of retrying. Catches successful-but-identical loops that the `/repeat` circuit breaker (which only counts failures) does not. Edits track content, so normal iterative editing of one file does not trip it. Only short hashes are stored — never raw payloads.

**Tunables:** `AI_TOOLKIT_LOOP_WINDOW`, `AI_TOOLKIT_LOOP_THRESHOLD`. Skipped when `TOOLKIT_HOOK_PROFILE=minimal`; honours `AI_TOOLKIT_DISABLED_HOOKS`.

### PostToolUse (search-first tracker) — `search-tracker.sh`

| Field | Value |
|-------|-------|
| Event | `PostToolUse` |
| Matcher | `mcp__.*__(smart_query\|hybrid_search_kb\|crag_search\|multi_hop_search\|verify_answer)\|WebSearch\|WebFetch` |
| Script | `~/.softspark/ai-toolkit/hooks/search-tracker.sh` |
| Fires | After any search-style tool call |

**Action:** Clears `~/.softspark/ai-toolkit/state/search-required-<session_id>.flag` (per-session, keyed by `session_id` from the hook stdin payload, falling back to `transcript_path` basename, then `default`). Pairs with `user-prompt-submit.sh` (sets the flag on long technical prompts only when a search provider is detected or strict mode is enabled) and `stop-search-check.sh` (blocks Stop if the calling session's flag is still set). Search provider detection parses actual MCP server names from `mcpServers`, `mcp_servers`, or `mcp` config blocks; hook matchers and permission allowlists do not count as providers. Codex Stop enforcement also scans the recent `~/.codex/log/codex-tui.log` window for `ToolCall: mcp__...__smart_query` and `tool.name="smart_query"`-style entries because Codex MCP tool calls may not fire the shared `PostToolUse` tracker. Together the hooks enforce the global CLAUDE.md GOLDEN RULE without breaking offline/no-RAG installs and without cross-session interference when multiple Claude Code windows run in parallel.

Non-blocking (exit 0). Skipped when `TOOLKIT_HOOK_PROFILE=minimal`.

### Stop (search-first enforcement) — `stop-search-check.sh`

| Field | Value |
|-------|-------|
| Event | `Stop` |
| Matcher | *(all)* |
| Script | `~/.softspark/ai-toolkit/hooks/stop-search-check.sh` |
| Fires | When Claude finishes a response |

**Action:** If `search-required-<session_id>.flag` for the calling session is still present (no search tool ran during this turn) and a search provider is still detectable, emits `{"decision":"block","reason":"..."}` to continue the conversation with a search-first reminder. If no RAG/Web provider is detected, it clears the stale flag and exits 0, so offline/no-MCP users are not blocked. On Codex, where MCP search tools may not trigger the shared `PostToolUse` tracker, the hook also checks `~/.codex/log/codex-tui.log` for search tool calls after the flag timestamp before blocking. Flags are scoped by `session_id` from the hook stdin payload so a Stop in session B never consumes session A's flag (and vice versa). Stale per-session flags older than 60 minutes are GC'd on the next `SessionStart`.

**Overrides:** `CLAUDE_SKIP_SEARCH_FIRST=1`, `AI_TOOLKIT_SEARCH_FIRST=off`, or `AI_TOOLKIT_SEARCH_FIRST=strict` to force enforcement. Skipped when `TOOLKIT_HOOK_PROFILE=minimal`.

### InstructionsLoaded — `instructions-audit.sh`

| Field | Value |
|-------|-------|
| Event | `InstructionsLoaded` |
| Matcher | *(all)* |
| Script | `~/.softspark/ai-toolkit/hooks/instructions-audit.sh` |
| Fires | Whenever CLAUDE.md / `.claude/rules/*.md` is loaded into context |

**Action:** Appends `<ts>\t<memory_type>\t<load_reason>\t<file_path>` to `~/.softspark/ai-toolkit/state/loaded-instructions.log`. Provides audit visibility: when a rule does NOT enter context (silently dropped, token budget, glob miss), the absence is observable. Auto-rotates at 2000 lines.

Non-blocking (exit 0). Skipped when `TOOLKIT_HOOK_PROFILE=minimal`.

### ConfigChange — `config-desync-guard.sh`

| Field | Value |
|-------|-------|
| Event | `ConfigChange` |
| Matcher | `user_settings` |
| Script | `~/.softspark/ai-toolkit/hooks/config-desync-guard.sh` |
| Fires | When `~/.claude/settings.json` changes |

**Action:** Compares `_source: ai-toolkit` entries between `~/.claude/settings.json` (installed) and the toolkit source `app/hooks.json`. If they diverge (missing/stale entries), emits an advisory to stderr suggesting `ai-toolkit update` or `ai-toolkit doctor --fix`. Non-blocking by design — user's legitimate settings edits never get rejected.

**Override (silence advisory):** `CLAUDE_SKIP_CONFIG_DESYNC=1`. Skipped when `TOOLKIT_HOOK_PROFILE=minimal`.

### Supporting Infrastructure

| Component | Purpose |
|-----------|---------|
| `scripts/session_state.py` | Append-only edit log keyed by session_id. Cleared on SessionStart. Read by revert-guard, test-cohesion, quality-gate. |
| `scripts/test_cohesion.py` | Resolves changed paths → test commands via cohesion map. First-match-wins. Stdlib-only. |
| `app/hooks/test-cohesion-map.json` | Toolkit-default path → tests mapping (used when no project map exists). |
| `app/hooks/_locate-toolkit.sh` | Shared bash helper that exports `$TOOLKIT_DIR` for hooks needing scripts/. |
| `app/hooks/_hook-io.sh` | Shared bash helper that normalizes hook payloads across Claude, Augment, Gemini, Windsurf, and Cursor-style JSON. JSON context output takes precedence over `AI_TOOLKIT_HOOK_QUIET=1`, so quiet hooks can still emit `additionalContext` with `suppressOutput: true`; plain-text output requires `AI_TOOLKIT_HOOK_VERBOSE=1`. |
| `app/hooks/_search-capability.sh` | Shared bash helper that enables search-first blocking only when RAG/Web is configured or strict mode is requested. |

## Runtime Profiles

Set in `.claude/settings.local.json`:

```json
{ "env": { "TOOLKIT_HOOK_PROFILE": "standard" } }
```

| Profile | Behavior |
|---------|----------|
| `minimal` | Only destructive guard + SessionStart |
| `standard` | All hooks (default) |
| `strict` | Standard + mypy --strict on task completion |

**Disabling individual hooks:** set `AI_TOOLKIT_DISABLED_HOOKS` to a comma-separated list of hook names (with or without `.sh`), e.g. `{ "env": { "AI_TOOLKIT_DISABLED_HOOKS": "loop-guard,quality-check" } }`. Listed hooks become no-ops. This covers profile-gated hooks only; the safety guards (`guard-destructive`/`guard-path`/`guard-config`) intentionally cannot be env-disabled — remove them deliberately with `ai-toolkit remove-hook`.

Non-blocking informational context is silent in plain-text mode by default while
side effects and blocking decisions still run. Set `AI_TOOLKIT_HOOK_VERBOSE=1`
only when debugging hook context locally. `AI_TOOLKIT_HOOK_QUIET=1` keeps hook
commands explicitly silent, and Codex-generated hooks plus Claude's bundled
`UserPromptSubmit` entry use it to avoid visible prompt hook context.

## Architecture

```
~/.softspark/ai-toolkit/
├── rules/          # Registered rules (add-rule.sh)
├── state/          # Per-session runtime state (NEW)
│   ├── session-edits.json         # Append-only edit log per session
│   ├── search-required-<sid>.flag # Per-session: set by user-prompt-submit, cleared by search-tracker/stop-search-check, GC'd at SessionStart (>60min)
│   ├── loaded-instructions.log    # Audit trail of which rules entered context
│   └── test-cohesion-last.log     # Last cohesion test command output
└── hooks/          # Hook scripts (copied on install)
    ├── _profile-check.sh         # Shared: profile skip logic (sourced by hooks)
    ├── _locate-toolkit.sh        # NEW: shared $TOOLKIT_DIR locator
    ├── _hook-io.sh               # NEW: shared multi-editor payload/output adapter
    ├── _search-capability.sh     # NEW: capability-aware search-first enforcement
    ├── session-start.sh
    ├── session-context.sh
    ├── guard-destructive.sh
    ├── guard-path.sh
    ├── guard-config.sh
    ├── revert-guard.sh           # NEW: block revert on session-edited files (Art. VI.2)
    ├── mcp-health.sh
    ├── user-prompt-submit.sh     # extended: arms search-required flag
    ├── post-tool-use.sh          # extended: appends edits to session state
    ├── governance-capture.sh
    ├── test-cohesion.sh          # NEW: runs mapped tests after edits (Art. VI.3)
    ├── test-cohesion-map.json    # NEW: path → tests mapping
    ├── search-tracker.sh         # NEW: clears search-required flag
    ├── quality-check.sh
    ├── quality-gate.sh           # extended: cohesion-tests session edits
    ├── stop-search-check.sh      # NEW: enforces search-first on Stop
    ├── save-session.sh
    ├── subagent-start.sh
    ├── subagent-stop.sh
    ├── track-usage.sh
    ├── pre-compact.sh
    ├── pre-compact-save.sh
    ├── commit-quality.sh
    ├── instructions-audit.sh     # NEW: logs CLAUDE.md / rules loads
    ├── config-desync-guard.sh    # NEW: warns on settings ↔ source drift
    └── session-end.sh

~/.claude/settings.json
└── hooks:          # Hook definitions referencing ~/.softspark/ai-toolkit/hooks/
    ├── SessionStart       → session-start.sh, mcp-health.sh, session-context.sh
    ├── Notification       → notify-waiting.sh
    ├── PreToolUse         → guard-destructive.sh, guard-path.sh, guard-config.sh, commit-quality.sh, revert-guard.sh
    ├── UserPromptSubmit   → user-prompt-submit.sh, track-usage.sh
    ├── PostToolUse        → post-tool-use.sh, governance-capture.sh, test-cohesion.sh, search-tracker.sh
    ├── Stop               → quality-check.sh, save-session.sh, quality-gate.sh, stop-search-check.sh
    ├── TaskCompleted      → quality-gate.sh
    ├── TeammateIdle       → echo (inline)
    ├── SubagentStart      → subagent-start.sh
    ├── SubagentStop       → subagent-stop.sh
    ├── PreCompact         → pre-compact.sh, pre-compact-save.sh
    ├── SessionEnd         → session-end.sh
    ├── InstructionsLoaded → instructions-audit.sh
    └── ConfigChange       → config-desync-guard.sh
```

**Key design decisions:**
- Scripts **copied** (not symlinked) — user can customize without breaking git
- Hooks in `settings.json` (not `hooks.json`) — Claude Code only reads settings files
- `_source: "ai-toolkit"` tag on every entry — allows idempotent merge/strip
- Hooks are **global only** — `--local` does not install hooks into project settings

## Troubleshooting

**Hooks not loading:**
1. Run `/hooks` in Claude Code — lists all active hooks
2. Check `claude --debug hooks` — shows matcher resolution
3. Verify JSON: `python3 -c "import json; json.load(open('$HOME/.claude/settings.json'))"`

**Hook script not found:**
```bash
ls ~/.softspark/ai-toolkit/hooks/     # should list 28 .sh files (plus _profile-check.sh + _locate-toolkit.sh + _hook-io.sh + _search-capability.sh helpers + test-cohesion-map.json)
ai-toolkit update            # re-copies scripts
```

**Legacy cleanup:**
```bash
rm ~/.claude/hooks.json      # old format, no longer used
rm -rf ~/.claude/hooks       # old symlink, no longer used
```

---

## kb/reference/integrations.md

---
title: "AI Toolkit - External Integrations"
category: reference
service: ai-toolkit
tags: [integrations, rules, add-rule]
version: "1.0.5"
created: "2026-03-26"
last_updated: "2026-03-26"
description: "How external repos inject rules into ~/.claude/CLAUDE.md via ai-toolkit"
---

# External Integrations

Repos that register rules with ai-toolkit so they are automatically injected into `~/.claude/CLAUDE.md` on every `update`.

---

## How to Register Rules

Use `add-rule` to register a rule file globally. Every subsequent `ai-toolkit update` picks it up automatically.

```bash
cd /path/to/your-repo
ai-toolkit add-rule ./jira-rules.md
ai-toolkit update   # inject now
```

After registration, `ai-toolkit update` will always re-inject the rule. Registry location: `~/.softspark/ai-toolkit/rules/`.

To unregister a rule (removes from `~/.softspark/ai-toolkit/rules/` and strips the block from `CLAUDE.md`):

```bash
ai-toolkit remove-rule jira-rules
```

---

## How It Works

Both mechanisms use marker-based idempotent injection. Rule name = filename without `.md`.

```
<!-- TOOLKIT:jira-rules START -->

...rule content...

<!-- TOOLKIT:jira-rules END -->
```

Content outside markers is never touched. Re-running updates only the marked block.

---

## Adding a New Integration

1. Create `<name>-rules.md` in your repo with Claude-relevant conventions
2. Register it: `ai-toolkit add-rule ./<name>-rules.md`
3. Verify it appears in: `~/.softspark/ai-toolkit/rules/<name>-rules.md`
4. On next `install` it will be listed in: `Rules injected: ... <name>-rules`
5. Add an entry below documenting the integration

---

## Known Integrations

### rag-mcp

**Rule file:** `rag-mcp.md`
**Marker:** `TOOLKIT:rag-mcp`

Teaches Claude Code the RAG-MCP search protocol: always call `smart_query()` before answering, `kb_id` vs `file_path` distinction, available MCP tools.

```bash
cd /path/to/rag-mcp
ai-toolkit add-rule ./rag-mcp-rules.md
```

### jira-mcp

**Rule file:** `jira-rules.md`
**Marker:** `TOOLKIT:jira-rules`

Teaches Claude Code the Jira MCP tool set: `sync_tasks`, `read_cached_tasks`, `update_task_status`, `log_task_time`, and key rules (sync first, hours only, check transitions).

```bash
cd /path/to/jira-mcp
ai-toolkit add-rule ./jira-rules.md
```

---

## kb/reference/language-packs.md

---
title: "AI Toolkit - Language Plugin Packs"
category: reference
service: ai-toolkit
tags: [plugins, languages, rust, java, csharp, kotlin, swift, ruby]
version: "1.0.0"
created: "2026-03-29"
last_updated: "2026-03-29"
description: "6 language-specific plugin packs providing knowledge skills for Rust, Java, C#, Kotlin, Swift, and Ruby."
---

# Language Plugin Packs

## Overview

Language packs are domain-scoped plugin packs that provide knowledge skills for specific programming languages. Each pack contains a single knowledge skill with idiomatic patterns, error handling, testing conventions, common frameworks, and performance tips.

## Available Packs

| Pack | Skill | Language | Key Topics |
|------|-------|----------|------------|
| `rust-pack` | `rust-patterns` | Rust | Ownership, borrowing, Cargo, tokio, serde |
| `java-pack` | `java-patterns` | Java | Records, sealed classes, Spring Boot, JUnit 5 |
| `csharp-pack` | `csharp-patterns` | C# / .NET | Nullable refs, async/await, ASP.NET Core, EF Core |
| `kotlin-pack` | `kotlin-patterns` | Kotlin | Coroutines, DSLs, sealed classes, Ktor, MockK |
| `swift-pack` | `swift-patterns` | Swift / iOS | Protocol-oriented, SwiftUI, async/await, SPM |
| `ruby-pack` | `ruby-patterns` | Ruby | Blocks, Rails conventions, RSpec, ActiveRecord |

## Skill Content Sections

Each language skill follows a consistent structure:

1. **Project Structure** — standard directory layout and build tool configuration
2. **Idioms / Code Style** — language-specific patterns and conventions
3. **Error Handling** — error types, patterns, and best practices
4. **Testing Patterns** — test frameworks, assertion libraries, mocking
5. **Common Libraries / Frameworks** — ecosystem essentials
6. **Performance Tips** — optimization techniques and profiling
7. **Build / Package Management** — dependency management and CI

## How Knowledge Skills Work

These skills have `user-invocable: false` in their frontmatter, meaning they are NOT slash commands. Instead, Claude loads them contextually when the conversation topic matches the skill's description trigger.

For example, when a user asks "How do I handle errors in Rust?", Claude automatically loads `rust-patterns` to provide idiomatic Rust error handling guidance.

## Requesting New Language Packs

File an issue with the `language-pack` label. Include:
- Language name
- Key topics to cover
- Popular frameworks/libraries to include

---

## kb/reference/language-rules.md

---
title: "Language Rules System"
category: reference
service: ai-toolkit
tags: [rules, languages, coding-style, testing, patterns, security]
version: "2.0.0"
created: "2026-04-07"
last_updated: "2026-04-28"
description: "Reference for the language-specific rules system: 13 per-language rule sets shipped as knowledge skills, plus a common set inlined into CLAUDE.md."
---

# Language Rules System

## Overview

ai-toolkit ships rule content for 13 languages/platforms plus a language-agnostic common set. Source files live under `app/rules/` and are split into two delivery channels by `ai-toolkit install --local`:

- **Common rules** (`app/rules/common/*.md`): full content is inlined into the project's `.claude/CLAUDE.md` under a single `<!-- TOOLKIT:language-rules START -->` marker. They cover coding-style, git-workflow, performance, security, and testing — concerns that apply regardless of language, so they stay always visible.
- **Per-language rules** (`app/rules/<lang>/*.md`): emitted at build time as `<lang>-rules` knowledge skills under `app/skills/`. Each skill is `user-invocable: false`, so Claude loads it via the Agent Skills progressive-disclosure mechanism only when its description triggers match (file extensions, framework names, or matching keywords in the prompt).

The skills are generated from the rule files via `python3 scripts/generate_language_rules_skills.py`, which is idempotent and rerun-safe. Other editors (Cursor, Windsurf, Cline, Roo, Augment, Codex, Copilot, Antigravity, Gemini, opencode) still receive the full per-language rule content via their own generators in `scripts/dir_rules_shared.py::build_language_rules()` — Claude is the only target where the per-language content is now skill-delivered rather than inlined.

## File Structure

```
app/rules/
├── common/
│   ├── coding-style.md     # KISS, DRY, YAGNI, immutability
│   ├── testing.md          # Universal testing standards
│   ├── git-workflow.md     # Commit conventions
│   ├── performance.md      # Performance guidelines
│   └── security.md         # OWASP, input validation
├── typescript/
│   ├── coding-style.md     # Strict mode, no-any, naming
│   ├── testing.md          # Vitest/Jest patterns
│   ├── patterns.md         # Discriminated unions, utility types
│   ├── frameworks.md       # React hooks, Next.js, lifecycle
│   └── security.md         # XSS prevention, sanitization
├── python/
│   ├── coding-style.md     # PEP 8, type hints, dataclasses
│   ├── testing.md          # pytest, fixtures, parametrize
│   ├── patterns.md         # Python idioms, context managers
│   ├── frameworks.md       # FastAPI/Django lifecycle, SQLAlchemy
│   └── security.md         # SQL injection, SSTI prevention
├── golang/           # same 5-file structure
├── rust/
├── java/
├── kotlin/
├── swift/
├── dart/
├── csharp/
├── php/
├── cpp/
├── ruby/
└── medplum/
```

**Total: 13 per-language directories × 5 files + 1 common directory × 5 files + 3 standalone files** (see README.md for canonical count). Per-language directories ship as `<lang>-rules` knowledge skills; the common directory is inlined into CLAUDE.md.

## Supported Languages

| Language | Directory | Auto-detect Files |
|----------|-----------|------------------|
| Common | `rules/common/` | always included |
| TypeScript | `rules/typescript/` | `package.json`, `tsconfig.json` |
| Python | `rules/python/` | `requirements.txt`, `pyproject.toml`, `setup.py`, `Pipfile` |
| Go | `rules/golang/` | `go.mod` |
| Rust | `rules/rust/` | `Cargo.toml` |
| Java | `rules/java/` | `pom.xml`, `build.gradle`, `build.gradle.kts` |
| Kotlin | `rules/kotlin/` | `build.gradle.kts` |
| Swift | `rules/swift/` | `Package.swift`, `*.xcodeproj` |
| Dart | `rules/dart/` | `pubspec.yaml` |
| C# | `rules/csharp/` | `*.csproj`, `*.sln` |
| PHP | `rules/php/` | `composer.json` |
| C++ | `rules/cpp/` | `CMakeLists.txt`, `Makefile`, `*.cpp` |
| Ruby | `rules/ruby/` | `Gemfile`, `*.gemspec` |
| Medplum | `rules/medplum/` | `medplum.config.mts`, `medplum.config.ts` |

## Rule Categories

| Category | Filename | Content |
|----------|----------|---------|
| `coding-style` | `coding-style.md` | Naming, formatting, idiomatic constructs, linter config |
| `testing` | `testing.md` | Test framework usage, fixture patterns, coverage targets |
| `patterns` | `patterns.md` | Language-specific design patterns and idioms |
| `frameworks` | `frameworks.md` | Recommended framework conventions and lifecycle hooks |
| `security` | `security.md` | Common language-specific vulnerabilities and mitigations |

The `common/` directory uses the same structure except `frameworks.md` is replaced by `git-workflow.md` and `performance.md`.

## Auto-Detection

`--local` automatically enables language auto-detection. `scripts/install_steps/detect_language.py` uses two-phase detection and merges results from both:

```bash
ai-toolkit install --local     # auto-detects language (--auto-detect is implied)
```

### Phase 1: Marker files (config-level signals)

Scans for configuration files defined in each module's `auto_detect` list in `manifest.json`:

1. `package.json` or `tsconfig.json` → TypeScript
2. `go.mod` → Go
3. `Cargo.toml` → Rust
4. `pubspec.yaml` → Dart
5. `composer.json` → PHP
6. `Gemfile` → Ruby
7. `requirements.txt`, `pyproject.toml`, `setup.py`, or `Pipfile` → Python
8. `pom.xml` or `build.gradle` → Java
9. `build.gradle.kts` → Kotlin
10. `Package.swift` → Swift
11. `*.csproj` or `*.sln` → C#
12. `CMakeLists.txt` or `Makefile` → C++
13. `medplum.config.mts` or `medplum.config.ts` → Medplum

### Phase 2: Source file extensions (actual code presence)

Scans top-level files and one directory level deep for source file extensions (`.py`, `.ts`, `.go`, `.rs`, `.java`, `.kt`, `.swift`, `.dart`, `.cs`, `.php`, `.cpp`, `.rb`, etc.). Skips dependency/build directories (`node_modules`, `venv`, `dist`, `build`, etc.) for speed.

This catches cases where marker files are misleading — e.g., a Python project with a `package.json` only for its npm CLI wrapper will correctly detect both Python (via `.py` files) and TypeScript (via `package.json`).

Both phases contribute; results are merged and deduplicated. Common rules are always injected regardless of detected language.

## Installation

```bash
# Auto-detect language from project files (default with --local)
ai-toolkit install --local

# Explicitly select a language (implies --local, disables auto-detect)
ai-toolkit install --local --lang typescript

# Multiple languages
ai-toolkit install --local --lang go,python

# Skip auto-detect, install specific modules only
ai-toolkit install --local --modules core,agents
```

The `--lang` flag accepts comma-separated language names and converts them to `rules-<lang>` modules. Common aliases are supported: `go` → `golang`, `c++` → `cpp`, `c#`/`cs` → `csharp`. Using `--lang` implies `--local` and disables auto-detection.

Common rules are injected into the project `CLAUDE.md` between a single named marker (the per-language markers from v1.x are no longer used):

```
<!-- TOOLKIT:language-rules START -->
# Language Rules

Common (language-agnostic) rules apply to every change in this project.
Language-specific rules live in `<lang>-rules` knowledge skills (e.g.
`python-rules`, `typescript-rules`) and load automatically when their
triggers match.

Detected languages: `python-rules`, `typescript-rules`.

---

... full content of app/rules/common/*.md inlined here ...
<!-- TOOLKIT:language-rules END -->
```

Re-running `install --local` is idempotent — the existing block is replaced, not duplicated. Per-language rules are not injected into `CLAUDE.md` for Claude — they are loaded contextually via their respective `<lang>-rules` knowledge skills.

### Generating language-rules skills

The `<lang>-rules` skills under `app/skills/` are produced by:

```bash
python3 scripts/generate_language_rules_skills.py            # write all
python3 scripts/generate_language_rules_skills.py --check    # dry-run, exit 1 on diff
python3 scripts/generate_language_rules_skills.py --langs python,rust  # subset
```

The generator reads `app/rules/<lang>/*.md`, strips YAML frontmatter, concatenates the categories, and writes `app/skills/<lang>-rules/SKILL.md` with frontmatter:

- `name: <lang>-rules`
- `description: ...` — language label, rule categories, and concrete trigger keywords (file extensions, framework names) so the skill activates reliably when Claude is working on that language.
- `user-invocable: false` — knowledge skill, no slash command.
- `allowed-tools: Read` — the skill body is reference content, not an action.

Rerunning the generator is idempotent. Editing rule files under `app/rules/<lang>/` and rerunning the generator is the canonical way to update a language skill.

## Manifest Module Names

Language rules are tracked as modules in `manifest.json`:

| Module | Description |
|--------|-------------|
| `rules-common` | Common coding rules (5 files), included in `standard` profile |
| `rules-typescript` | TypeScript-specific rules |
| `rules-python` | Python-specific rules |
| `rules-golang` | Go-specific rules |
| `rules-rust` | Rust-specific rules |
| `rules-java` | Java-specific rules |
| `rules-kotlin` | Kotlin-specific rules |
| `rules-swift` | Swift-specific rules |
| `rules-dart` | Dart/Flutter-specific rules |
| `rules-csharp` | C#/.NET-specific rules |
| `rules-php` | PHP-specific rules |
| `rules-cpp` | C++-specific rules |
| `rules-ruby` | Ruby-specific rules |
| `rules-medplum` | Medplum/FHIR healthcare platform rules |

## Rules vs Skills

| | Common rules | Per-language rules | Other skills |
|---|---|---|---|
| Source | `app/rules/common/` | `app/rules/<lang>/` | `app/skills/<name>/SKILL.md` |
| Delivery to Claude | Inlined into project `CLAUDE.md` (`--local`) | Generated as `<lang>-rules` knowledge skills, loaded contextually | Loaded contextually by description match |
| Visibility | Always in context | Loaded when triggers match (file extensions, framework names) | Loaded when triggers match |
| Scope | Language-agnostic standards (security, git, testing, perf, style) | Per-language coding-style, frameworks, patterns, security, testing | Domain skills (testing, debugging, RAG, etc.) |
| Install | `ai-toolkit install --local` | Global install (skills directory is symlinked) | Global install |
| Other editors | Inlined into editor-specific rule files | Inlined into editor-specific rule files (still full content, not skills) | N/A |

Per-language content delivered as a knowledge skill is the same Markdown that other editors receive inlined. The split exists only for Claude, where the Agent Skills progressive-disclosure mechanism keeps the system prompt small.

## Related Documentation

- [PATH: kb/reference/manifest-install.md] — module-level install granularity
- [PATH: kb/reference/extension-api.md] — injecting rules from external tools
- [PATH: kb/reference/architecture-overview.md] — overall install model

---

## kb/reference/manifest-install.md

---
title: "Manifest-Driven Install System"
category: reference
service: ai-toolkit
tags: [install, manifest, modules, profiles, auto-detect, state-tracking]
version: "1.0.0"
created: "2026-04-07"
last_updated: "2026-04-07"
description: "Reference for the manifest-driven install system: 17 modules, 4 profiles, auto-detection, and state tracking in ~/.softspark/ai-toolkit/state.json."
---

# Manifest-Driven Install System

## Overview

ai-toolkit's install system supports module-level granularity on top of the existing profile-based install. Instead of choosing only between minimal/standard/strict, you can select individual modules (specific language rules, MCP templates, etc.) or enable auto-detection of the project language.

All existing `--profile` behavior is preserved and unchanged. The manifest system is an additive opt-in layer.

## Modules

Modules are defined in `manifest.json` at the repository root. There are 17 modules:

| Module | Description | In Profile |
|--------|-------------|-----------|
| `core` | Core hooks and essential skills | minimal, standard, strict, full |
| `agents` | Specialized agents | standard, strict, full |
| `skills` | All skills (task, hybrid, knowledge) | standard, strict, full |
| `rules-common` | Common coding rules (5 files) | standard, strict, full |
| `rules-typescript` | TypeScript-specific rules (5 files) | auto-detect |
| `rules-python` | Python-specific rules (5 files) | auto-detect |
| `rules-golang` | Go-specific rules (5 files) | auto-detect |
| `rules-rust` | Rust-specific rules (5 files) | auto-detect |
| `rules-java` | Java-specific rules (5 files) | auto-detect |
| `rules-kotlin` | Kotlin-specific rules (5 files) | auto-detect |
| `rules-swift` | Swift-specific rules (5 files) | auto-detect |
| `rules-dart` | Dart/Flutter-specific rules (5 files) | auto-detect |
| `rules-csharp` | C#/.NET-specific rules (5 files) | auto-detect |
| `rules-php` | PHP-specific rules (5 files) | auto-detect |
| `rules-cpp` | C++-specific rules (5 files) | auto-detect |
| `rules-ruby` | Ruby-specific rules (5 files) | auto-detect |
| `mcp-templates` | 26 MCP server config templates | strict, full |

## Profiles

Profiles are predefined module sets. They map directly to `--profile` values:

| Profile | Modules |
|---------|---------|
| `minimal` | `core` |
| `standard` | `core`, `agents`, `skills`, `rules-common` |
| `strict` | `core`, `agents`, `skills`, `rules-common`, `mcp-templates` |
| `full` | All modules (same as strict currently; language rules added via `--auto-detect`) |

## CLI

```bash
# Profile-based install (existing behavior, unchanged)
ai-toolkit install --profile standard

# Module-based install (new)
ai-toolkit install --modules core,agents,rules-typescript

# --local implies --auto-detect (language rules auto-detected)
ai-toolkit install --local

# Show currently installed modules and their state
ai-toolkit status

# Incremental update (only re-applies modules with changed content)
ai-toolkit update
```

### --modules

Accepts a comma-separated list of module names. Can be combined with a profile:

```bash
# Start from standard profile, also add TypeScript rules
ai-toolkit install --profile standard --modules rules-typescript
```

### --auto-detect

Scans the current working directory for marker files and selects the matching language module. Implemented in `scripts/install_steps/detect_language.py`.

Detection markers per module:

| Module | Detected when these files exist |
|--------|--------------------------------|
| `rules-typescript` | `package.json` or `tsconfig.json` |
| `rules-python` | `requirements.txt`, `pyproject.toml`, `setup.py`, or `Pipfile` |
| `rules-golang` | `go.mod` |
| `rules-rust` | `Cargo.toml` |
| `rules-java` | `pom.xml` or `build.gradle` |
| `rules-kotlin` | `build.gradle.kts` |
| `rules-swift` | `Package.swift` |
| `rules-dart` | `pubspec.yaml` |
| `rules-csharp` | `*.csproj` or `*.sln` |
| `rules-php` | `composer.json` |
| `rules-cpp` | `CMakeLists.txt` or `Makefile` |
| `rules-ruby` | `Gemfile` |

### status

Lists all currently installed modules with version and install timestamp:

```bash
ai-toolkit status
# Installed modules (from ~/.softspark/ai-toolkit/state.json):
#   core            v1.3.0   installed 2026-04-07T10:00:00Z
#   agents          v1.3.0   installed 2026-04-07T10:00:00Z
#   skills          v1.3.0   installed 2026-04-07T10:00:00Z
#   rules-common    v1.3.0   installed 2026-04-07T10:00:00Z
#   rules-typescript v1.3.0  installed 2026-04-07T10:00:00Z
```

### update

Re-applies installed modules, skipping files whose content hash has not changed since last install. Implemented in `scripts/install_steps/install_state.py`.

## State Tracking

Installed module state is persisted to `~/.softspark/ai-toolkit/state.json`:

```json
{
  "installed_version": "1.3.0",
  "installed_modules": ["core", "agents", "skills", "rules-common", "rules-typescript"],
  "installed_at": "2026-04-07T10:00:00Z",
  "last_updated": "2026-04-07T10:00:00Z",
  "file_hashes": {
    "app/hooks/session-start.sh": "abc123..."
  }
}
```

- `installed_modules` — used by `update` to know which modules to re-apply
- `file_hashes` — used to skip unchanged files during `update`
- The file is written after every successful install or update

## Implementation Files

| File | Purpose |
|------|---------|
| `manifest.json` | Module and profile definitions |
| `scripts/install_steps/detect_language.py` | Auto-detect project language from marker files |
| `scripts/install_steps/install_state.py` | Read/write `~/.softspark/ai-toolkit/state.json` |

## Backward Compatibility

Existing `--profile` usage works identically. The manifest system does not change what gets installed when you use `--profile minimal/standard/strict`. It only adds:

1. `--modules` for granular selection
2. `--auto-detect` for language rules
3. `state.json` tracking for incremental updates
4. `status` command to inspect installed state

No existing install scripts or CI configurations need changes.

## Related Documentation

- [PATH: kb/reference/language-rules.md] — language rules structure and auto-detection detail
- [PATH: kb/reference/mcp-templates.md] — MCP server templates (the `mcp-templates` module)
- [PATH: kb/reference/architecture-overview.md] — overall install model

---

## kb/reference/mcp-editor-compatibility.md

---
title: "AI Toolkit - MCP Editor Compatibility"
category: reference
service: ai-toolkit
tags: [mcp, editors, compatibility, codex, cursor]
version: "1.1.0"
created: "2026-04-12"
last_updated: "2026-05-30"
description: "Official MCP support matrix and native config targets for editors supported by ai-toolkit."
---

# MCP Editor Compatibility

## Overview

ai-toolkit keeps `.mcp.json` as the project-level canonical template format and can render that config into native editor MCP files where the editor exposes a stable, documented configuration surface.

## Supported Native Adapters

| Editor | Scope | Native Config Path | Adapter Behavior |
|--------|-------|--------------------|------------------|
| Claude Code | project + global | `.claude/settings.local.json`, `~/.claude/settings.json` | Merges `mcpServers` while preserving other settings keys |
| Cursor | project + global | `.cursor/mcp.json`, `~/.cursor/mcp.json` | Mirrors `mcpServers` directly |
| GitHub Copilot | project + global | `.github/mcp.json`, `~/.copilot/mcp-config.json` | Adds Copilot-required `type` and `tools` fields |
| Gemini CLI | project + global | `.gemini/settings.json`, `~/.gemini/settings.json` | Merges `mcpServers` into settings JSON |
| Roo Code | project | `.roo/mcp.json` | Mirrors `mcpServers` into the documented project-level MCP file |
| Windsurf | global | `~/.codeium/windsurf/mcp_config.json` | Global-only JSON config |
| Cline | global | `~/.cline/data/settings/cline_mcp_settings.json` | Global-only JSON config |
| Augment | global | `~/.augment/settings.json` | Global-only JSON settings file |
| Codex CLI | global | `~/.codex/config.toml` | Renders JSON templates as TOML `mcp_servers` tables |

## Unsupported for Automatic Install

These editors are still supported by ai-toolkit for rules and instructions, but ai-toolkit does not currently auto-write MCP config because a stable official file target was not adopted:

| Editor | Reason |
|--------|--------|
| Aider | No verified native MCP config surface was adopted in ai-toolkit |
| Google Antigravity | MCP can be configured via UI/import flows, but no stable file target was adopted in ai-toolkit |

## CLI

```bash
ai-toolkit mcp editors
ai-toolkit mcp install --editor cursor --scope project github --target .
ai-toolkit mcp install --editor codex context7
ai-toolkit mcp remove github --editor cursor --scope project --target .
```

## Install Flow Integration

When `.mcp.json` exists in a project, `ai-toolkit install --local` mirrors its servers into:
- `.claude/settings.local.json`
- `.cursor/mcp.json` when `--editors cursor` is selected
- `.github/mcp.json` when `--editors copilot` is selected
- `.roo/mcp.json` when `--editors roo` is selected

Global-only clients are configured explicitly via `ai-toolkit mcp install --editor ...`.

## Related

- [PATH: kb/reference/mcp-templates.md] — template catalog and CLI
- [PATH: kb/reference/extension-api.md] — extension API surface

---

## kb/reference/mcp-templates.md

---
title: "MCP Server Templates"
category: reference
service: ai-toolkit
tags: [mcp, templates, servers, configuration, editors, inject-mcp, external-templates]
version: "1.2.0"
created: "2026-04-07"
last_updated: "2026-05-12"
description: "Reference for 26 built-in MCP server templates, external template injection via inject-mcp, and native editor MCP installation support."
---

# MCP Server Templates

## Overview

ai-toolkit ships 26 ready-to-use MCP server configuration templates in `app/mcp-templates/`. Each template is a JSON file that defines the canonical `mcpServers` block for a specific service. Templates can be merged into the project's `.mcp.json` and rendered into editor-native MCP config files via the `ai-toolkit mcp` CLI subcommand.

**External templates:** Tools outside the toolkit (MCP servers, plugins, custom integrations) can register their own MCP templates via `ai-toolkit inject-mcp <file|url>` -- the toolkit caches the template, tags every server with a `_source` field, and propagates the config to every editor that exposes a `global_path`. URL-sourced templates are auto-refreshed on every `ai-toolkit update`. See [PATH: kb/reference/extension-api.md] for the inject-mcp / remove-mcp reference.

## CLI

```bash
ai-toolkit mcp list               # List all available templates
ai-toolkit mcp editors            # List editors with native MCP adapters
ai-toolkit mcp show <name>        # Print a template's JSON config
ai-toolkit mcp add <name>         # Merge a template into .mcp.json
ai-toolkit mcp add <n1> <n2>      # Add multiple templates at once
ai-toolkit mcp install --editor cursor --scope project github --target .
ai-toolkit mcp install --editor codex context7
ai-toolkit mcp remove <name>      # Remove from .mcp.json or native editor config
```

**Implementation:** `scripts/mcp_manager.py`

The `add` command merges the `mcpServers` block from the template into `.mcp.json`. If `.mcp.json` does not exist it is created. If the server name already exists in `.mcp.json`, the entry is overwritten with the template version.

The `install` command renders the same canonical template into an editor-native config format:
- JSON clients with `mcpServers` blocks: Claude Code, Cursor, Gemini CLI, Roo Code, Windsurf, Cline, Augment
- JSON clients with additional transport metadata: GitHub Copilot
- TOML clients: Codex CLI (`[mcp_servers.<name>]`)

When `install` runs with `--scope project`, ai-toolkit also updates the project's `.mcp.json` so it remains the source of truth for later syncs.

## Editor Support Matrix

| Editor | Scope | Native Config Path | Notes |
|--------|-------|--------------------|-------|
| `claude` | project + global | `.claude/settings.local.json`, `~/.claude/settings.json` | Preserves existing hooks and env keys |
| `cursor` | project + global | `.cursor/mcp.json`, `~/.cursor/mcp.json` | Mirrors canonical `mcpServers` |
| `copilot` | project + global | `.github/mcp.json`, `~/.copilot/mcp-config.json` | Adds `type` and `tools: ["*"]` automatically |
| `gemini` | project + global | `.gemini/settings.json`, `~/.gemini/settings.json` | Uses Gemini CLI `mcpServers` format |
| `roo` | project | `.roo/mcp.json` | Mirrors canonical `mcpServers` into Roo's project MCP file |
| `windsurf` | global | `~/.codeium/windsurf/mcp_config.json` | Global-only official config |
| `cline` | global | `~/.cline/data/settings/cline_mcp_settings.json` | Global-only official config |
| `augment` | global | `~/.augment/settings.json` | Global-only settings file |
| `codex` | global | `~/.codex/config.toml` | Rendered as TOML `mcp_servers` tables |

Project-local `ai-toolkit install --local` also mirrors `.mcp.json` into Claude project settings plus selected project editors that have official repository/workspace MCP files (`cursor`, `copilot`, `roo`).

## Template List

| Name | Description | Required Env Vars |
|------|-------------|-------------------|
| `brave-search` | Web and local search powered by Brave Search API | `BRAVE_API_KEY` |
| `cloudflare` | Cloudflare Workers, KV, D1, R2, and DNS management | `CLOUDFLARE_API_TOKEN`, `CLOUDFLARE_ACCOUNT_ID` |
| `context7` | Up-to-date library documentation lookup via Context7 | — |
| `custom-template` | Empty template for building a custom MCP server | `API_KEY` (placeholder) |
| `datadog` | Datadog monitoring: metrics, logs, traces, dashboard queries | `DD_API_KEY`, `DD_APP_KEY`, `DD_SITE` |
| `docker` | Docker container and image management, logs, compose operations | — |
| `fetch` | HTTP fetch for web pages and API responses as markdown or raw content | — |
| `filesystem` | Local filesystem access for reading, writing, and searching files | — |
| `git` | Git repository inspection: diffs, logs, branches | — |
| `github` | GitHub API: issues, PRs, repos, code search | `GITHUB_PERSONAL_ACCESS_TOKEN` |
| `google-drive` | Google Drive file search, reading, and management | `GOOGLE_CLIENT_ID`, `GOOGLE_CLIENT_SECRET`, `GOOGLE_REDIRECT_URI` |
| `google-maps` | Google Maps geocoding, directions, place search | `GOOGLE_MAPS_API_KEY` |
| `grafana` | Grafana dashboard queries, alerting, and data source management | `GRAFANA_URL`, `GRAFANA_API_KEY` |
| `linear` | Linear issue tracker: issues, projects, team workflows | `LINEAR_API_KEY` |
| `memory` | Persistent memory store using a local knowledge graph | — |
| `notion` | Notion workspace: pages, databases, content management | `NOTION_API_KEY` |
| `postgres` | PostgreSQL database access, schema inspection, analysis | — |
| `puppeteer` | Browser automation: screenshots, navigation, web scraping | — |
| `redis` | Redis cache inspection, data management, and monitoring | `REDIS_URL` |
| `sentry` | Sentry error tracking: issue search, event details, alerting | `SENTRY_AUTH_TOKEN`, `SENTRY_ORG` |
| `sequential-thinking` | Step-by-step reasoning and problem decomposition | — |
| `slack` | Slack workspace: channels, messages, users | `SLACK_BOT_TOKEN`, `SLACK_TEAM_ID` |
| `sqlite` | SQLite database access, queries, schema management | — |
| `supabase` | Supabase project management, database queries, edge functions | `SUPABASE_URL`, `SUPABASE_SERVICE_ROLE_KEY` |
| `vercel` | Vercel deployment management, project settings, environment variables | `VERCEL_TOKEN` |

## Template Format

Each template is a JSON file with the following structure:

```json
{
  "name": "example",
  "description": "Human-readable description of what this server provides",
  "mcpServers": {
    "example": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-example"],
      "env": {
        "EXAMPLE_API_KEY": "${EXAMPLE_API_KEY}"
      }
    }
  }
}
```

- `name` — identifier used with `mcp add <name>`
- `description` — shown by `mcp list` and `mcp show`
- `mcpServers` — the block merged verbatim into `.mcp.json`
- `env` values use `${VAR_NAME}` placeholders that must be set in the shell environment or `.env` file before Claude Code starts

## Example: Adding GitHub and PostgreSQL

```bash
# Add templates
ai-toolkit mcp add github postgres

# Set required env vars (e.g., in .env or shell profile)
export GITHUB_PERSONAL_ACCESS_TOKEN=ghp_...

# Resulting .mcp.json contains both mcpServers entries
```

## Contributing a New Template

1. Create `app/mcp-templates/<name>.json` following the format above.
2. Use `${ENV_VAR}` placeholders for secrets — never hardcode values.
3. Keep the `name` field identical to the filename stem.
4. Run `python3 scripts/validate.py` to verify the file is valid JSON.
5. Add an entry to this document's template list table.

## Related Documentation

- [PATH: kb/reference/extension-api.md] — `mcp add` as part of the extension API
- [PATH: kb/reference/architecture-overview.md] — overall install model
- [PATH: kb/reference/mcp-editor-compatibility.md] — native editor MCP support matrix

---

## kb/reference/medplum-docs-map.md

---
title: "Medplum Documentation Map"
category: reference
service: ai-toolkit
tags: [medplum, fhir, healthcare, ehr, sdk, api, clinical, interoperability]
version: "1.0.0"
created: "2026-04-17"
last_updated: "2026-04-17"
description: "Comprehensive navigable index of Medplum documentation covering SDK, FHIR resources, clinical workflows, security, integrations, and terminology."
---

# Medplum Documentation Map

## Overview

Medplum is an open-source healthcare platform built on FHIR R4. It provides a FHIR-compliant datastore, TypeScript SDK, React component library, bot automation engine, and compliance infrastructure (HIPAA, SOC2, HITRUST).

| Item | Value |
|------|-------|
| Docs | `https://www.medplum.com/docs/` |
| API Base | `https://api.medplum.com/fhir/R4/` |
| App | `https://app.medplum.com/` |
| Storybook | `https://storybook.medplum.com/` |
| GraphiQL | `https://graphiql.medplum.com/` |

### Key Packages

| Package | Purpose |
|---------|---------|
| `@medplum/core` | SDK client, FHIR helpers, FHIRPath, HL7 parsing |
| `@medplum/fhirtypes` | TypeScript type definitions for all FHIR R4 resources |
| `@medplum/react` | React UI components (Mantine 7+, React 18+) |
| `@medplum/mock` | MockClient for unit testing |
| `@medplum/cli` | Command-line interface for FHIR operations |

---

## Documentation Sections

| Section | Path | Description |
|---------|------|-------------|
| FHIR Basics | `/docs/fhir-basics` | Resources, references, search, CodeableConcepts, identifiers, ValueSets, Subscriptions |
| FHIR Datastore | `/docs/fhir-datastore` | CRUD, binary data, batch requests, profiles, history, deduplication, USCDI |
| Search | `/docs/search` | Basic search, advanced parameters, `_filter`, pagination, `_include`/`_revinclude`, chaining |
| Terminology | `/docs/terminology` | CodeSystem, ValueSet, ConceptMap operations; LOINC, SNOMED, ICD-10 |
| GraphQL | `/docs/graphql` | Queries, mutations, connections, nested resolution, reverse references, array filtering |
| React Components | `/docs/react` | MedplumProvider, hooks, Mantine integration, tree-shaking, useSubscription |
| Analytics | `/docs/analytics` | Analytics and reporting capabilities |
| Auth | `/docs/auth` | OAuth2 flows, client credentials, external IDPs, Google, mTLS, MFA, token exchange, sessions |
| User Management | `/docs/user-management` | Project vs server users, registration, invitations |
| Access Control | `/docs/access` | Access policies, compartments, SMART scopes, IP rules, multi-tenant, field-level control |
| AI | `/docs/ai` | AI operations, AWS integration, MCP server |
| Bots | `/docs/bots` | Bot basics, cron jobs, questionnaire handlers, Lambda layers, secrets, webhooks, HL7, PDFs, unit testing |
| Subscriptions | `/docs/subscriptions` | Event-driven notifications, WebSocket, webhook resending |
| CLI | `/docs/cli` | Command-line FHIR operations, external servers |
| Integrations | `/docs/integration` | DoseSpot, Health Gorilla, Stedi, Candid Health, eFax, HL7, FHIRcast, SMART, CDS Hooks, C-CDA |
| Agent | `/docs/agent` | On-prem agent for HL7/DICOM bridging |
| Self-Hosting | `/docs/self-hosting` | Self-hosted deployment |
| Compliance | `/docs/compliance` | HIPAA, SOC2, HITRUST, ONC, CLIA/CAP, CFR11, GMP, ISO9001 |
| API Reference | `/docs/api` | REST endpoints, FHIR resources (150+), operations (40+), datatypes (40+), Medplum custom resources |
| SDK Reference | `/docs/sdk/core` | MedplumClient, utility functions, interfaces, types |

---

## Clinical Workflows

| Workflow | Path | Key Resources |
|----------|------|---------------|
| Intake & Registration | `/docs/intake` | Patient, QuestionnaireResponse, Encounter |
| Charting | `/docs/charting` | Condition, AllergyIntolerance, Observation (vitals), DocumentReference |
| Scheduling | `/docs/scheduling` | Schedule, Slot, Appointment, AppointmentResponse |
| Labs & Imaging | `/docs/labs-imaging` | ServiceRequest, DiagnosticReport, Observation, ImagingStudy |
| Medications | `/docs/medications` | MedicationRequest, Medication, MedicationAdministration |
| Care Plans | `/docs/careplans` | CarePlan, CareTeam, Task, PlanDefinition, Goal |
| Communications | `/docs/communications` | Communication, CommunicationRequest (threads, messaging, SMS) |
| Billing | `/docs/billing` | Claim, Coverage, ExplanationOfBenefit, ChargeItem |

### Clinical Configuration

| Feature | Path | Resources |
|---------|------|-----------|
| Provider Directory | `/docs/administration/provider-directory` | Practitioner, PractitionerRole, Organization, Location |
| Questionnaires | `/docs/questionnaires` | Questionnaire, QuestionnaireResponse, SDC extensions |
| Diagnostic Catalog | `/docs/careplans/diagnostic-catalog` | CodeSystem, ValueSet (LOINC panels) |
| Clinical Protocols | `/docs/careplans/protocols` | PlanDefinition, ActivityDefinition |

---

## SDK Reference — MedplumClient

### CRUD Operations

| Method | Description |
|--------|-------------|
| `createResource(resource)` | Create new FHIR resource (server assigns ID) |
| `readResource(resourceType, id)` | Read resource by type and ID |
| `updateResource(resource)` | Update existing resource (must include ID) |
| `patchResource(resourceType, id, operations)` | Apply JSON Patch operations |
| `deleteResource(resourceType, id)` | Delete resource by type and ID |
| `upsertResource(resource, query)` | Atomic create-or-update via search query |
| `createResourceIfNoneExist(resource, query)` | Conditional create if no match found |

### Search

| Method | Description |
|--------|-------------|
| `search(resourceType, query)` | Execute FHIR search, returns Bundle |
| `searchResources(resourceType, query)` | Returns resource array (unwrapped Bundle) |
| `searchOne(resourceType, query)` | Returns first matching resource |
| `searchResourcePages(resourceType, query)` | Async generator for paginated results |
| `fhirSearchUrl(resourceType, query)` | Build search URL from parameters |

### Authentication

| Method | Description |
|--------|-------------|
| `startLogin(loginRequest)` | Initiate user login flow |
| `startClientLogin(clientId, clientSecret)` | OAuth2 client credentials flow |
| `startGoogleLogin(loginRequest)` | Google Sign-In authentication |
| `setAccessToken(accessToken, refreshToken)` | Manually set auth tokens |
| `setBasicAuth(clientId, clientSecret)` | Configure basic auth |
| `signOut()` | Revoke token and clear cache |
| `isAuthenticated(gracePeriod)` | Check current auth status |
| `getProfile()` | Get current user profile (sync) |
| `getProfileAsync()` | Get current user profile (async fetch) |

### Advanced Operations

| Method | Description |
|--------|-------------|
| `executeBatch(bundle)` | Process batch or transaction Bundle |
| `executeBot(id, body, contentType)` | Run bot by ID or Identifier |
| `graphql(query, operationName, variables)` | Execute GraphQL queries |
| `readHistory(resourceType, id)` | Get all resource versions |
| `readPatientEverything(id)` | Patient $everything operation |
| `validateResource(resource)` | Validate resource against profiles |
| `valueSetExpand(params)` | Expand ValueSet for code lookups |
| `readResourceGraph(resourceType, id, graphName)` | Fetch linked resources via $graph |

### Media & Files

| Method | Description |
|--------|-------------|
| `createBinary(data, filename, contentType)` | Create Binary resource from data |
| `createAttachment(data, filename, contentType)` | Create Attachment element with Binary |
| `createPdf(docDefinition, filename)` | Generate PDF as Binary (pdfmake) |
| `uploadMedia(contents, contentType, filename)` | Upload and create Media resource |
| `download(url)` | Download URL as blob |

### Subscriptions & Real-time

| Method | Description |
|--------|-------------|
| `subscribeToCriteria(criteria, props)` | Subscribe to WebSocket notifications |
| `unsubscribeFromCriteria(criteria, props)` | Unsubscribe from criteria |
| `getSubscriptionManager()` | Access WebSocket subscription manager |

---

## SDK Utility Functions

| Function | Description |
|----------|-------------|
| `createReference(resource)` | Create a FHIR Reference from a resource |
| `getReferenceString(resource)` | Get `ResourceType/id` string |
| `getDisplayString(resource)` | Human-readable display for any resource |
| `formatHumanName(name)` | Format FHIR HumanName as string |
| `formatAddress(address)` | Format FHIR Address as string |
| `formatCodeableConcept(cc)` | Format CodeableConcept as string |
| `formatDate(date)` | Format FHIR date as human-readable |
| `formatDateTime(dateTime)` | Format FHIR dateTime as human-readable |
| `formatQuantity(quantity)` | Human-readable Quantity string |
| `getCodeBySystem(cc, system)` | Find code for a given system in CodeableConcept |
| `setCodeBySystem(cc, system, code)` | Set code for a given system |
| `getIdentifier(resource, system)` | Get identifier value for a system |
| `setIdentifier(resource, system, value)` | Set identifier for a system |
| `getExtension(resource, urls)` | Get extension by URL |
| `getExtensionValue(resource, urls)` | Get extension value by URL |
| `parseReference(ref)` | Parse reference string to ResourceType/ID |
| `resolveId(reference)` | Extract ID from reference |
| `isResource(value)` | Type guard for FHIR resource |
| `isReference(value)` | Type guard for FHIR Reference |
| `deepClone(value)` | Deep clone a FHIR resource |
| `deepEquals(a, b)` | Compare resources (ignoring versionId) |
| `normalizeOperationOutcome(error)` | Normalize error to OperationOutcome |
| `normalizeErrorString(error)` | Normalize error to displayable string |
| `getQuestionnaireAnswers(response)` | Extract answers as map by linkId |
| `evalFhirPath(expression, resource)` | Evaluate FHIRPath expression |
| `validateResource(resource)` | Validate against StructureDefinition |
| `generateId()` | Cross-platform UUID generator |

---

## FHIR Search Syntax

### Parameter Types

| Type | Behavior | Example |
|------|----------|---------|
| `string` | Case-insensitive prefix match | `name=eve` matches Eve, Evelyn |
| `token` | Exact match, supports system namespace | `identifier=http://sys\|val` |
| `date` | Supports comparison prefixes | `birthdate=1940-03-29` |
| `reference` | Links to other resources | `subject=Patient/123` |
| `quantity` | Numeric with units | `value-quantity=gt40` |
| `number` | Plain numeric | `probability=gt0.8` |

### Operators

| Operator | Syntax | Example |
|----------|--------|---------|
| AND | Multiple parameters | `name=Simpson&birthdate=1940-03-29` |
| OR | Comma-separated | `status=completed,cancelled` |

### Modifiers

| Modifier | Purpose | Example |
|----------|---------|---------|
| `:not` | Exclude values | `status:not=completed` |
| `:missing` | Include/exclude absent params | `birthdate:missing=true` |
| `:contains` | Substring match (string only) | `name:contains=eve` |
| `:exact` | Case-sensitive exact match | `name:exact=Eve` |

### Comparison Prefixes (date, quantity, number)

| Prefix | Meaning |
|--------|---------|
| `eq` | Equal (default) |
| `ne` | Not equal |
| `gt` | Greater than |
| `lt` | Less than |
| `ge` | Greater than or equal |
| `le` | Less than or equal |
| `sa` | Starts after |
| `eb` | Ends before |

### Special Parameters

| Parameter | Purpose | Example |
|-----------|---------|---------|
| `_sort` | Sort results (prefix `-` for descending) | `_sort=-_lastUpdated` |
| `_count` | Results per page | `_count=20` |
| `_offset` | Pagination offset | `_offset=40` |
| `_total` | Include total count | `_total=accurate` |
| `_include` | Include forward-referenced resources | `_include=Observation:patient` |
| `_revinclude` | Include backward-referencing resources | `_revinclude=Provenance:target` |
| `_include:iterate` | Recursive inclusion (multi-hop) | `_include:iterate=Patient:general-practitioner` |

---

## Core FHIR Resources

### Clinical

| Resource | Purpose |
|----------|---------|
| `Patient` | Demographics, identifiers, contacts |
| `Practitioner` | Provider demographics, qualifications |
| `PractitionerRole` | Provider role at organization/location |
| `Organization` | Healthcare organization |
| `Encounter` | Patient visit or interaction |
| `Condition` | Diagnosis or health concern |
| `Observation` | Measurements, vitals, lab results |
| `DiagnosticReport` | Lab/imaging report aggregating observations |
| `ServiceRequest` | Order for a procedure, lab, or referral |
| `MedicationRequest` | Prescription or medication order |
| `AllergyIntolerance` | Allergy or adverse reaction record |
| `Procedure` | Performed clinical procedure |
| `CarePlan` | Treatment plan with activities and goals |
| `CareTeam` | Group of practitioners caring for a patient |
| `Goal` | Patient health objective |
| `Task` | Actionable work item |

### Administrative

| Resource | Purpose |
|----------|---------|
| `Schedule` | Provider availability container |
| `Slot` | Bookable time block within a Schedule |
| `Appointment` | Scheduled visit with participants |
| `Coverage` | Insurance/payer information |
| `Claim` | Billing claim submission |
| `Communication` | Message between participants |
| `Questionnaire` | Form/survey definition |
| `QuestionnaireResponse` | Completed form responses |

### Infrastructure

| Resource | Purpose |
|----------|---------|
| `Bundle` | Collection of resources (transaction, batch, searchset) |
| `Subscription` | Event-driven notification trigger |
| `AuditEvent` | Security/privacy audit log entry |
| `Binary` | Raw binary data (files, images) |
| `DocumentReference` | Metadata about a document/attachment |
| `OperationOutcome` | Processing result with issues/errors |
| `ValueSet` | Set of codes for a specific use |
| `CodeSystem` | Collection of codes in a domain |
| `StructureDefinition` | Resource profile/schema definition |
| `PlanDefinition` | Clinical protocol/workflow template |

### Medplum Custom Resources

| Resource | Purpose |
|----------|---------|
| `Bot` | Serverless function definition |
| `ClientApplication` | OAuth2 client registration |
| `Project` | Top-level tenant/organization container |
| `ProjectMembership` | User membership with role/access policy |
| `AccessPolicy` | Resource-level access control rules |
| `Agent` | On-prem integration agent |
| `UserConfiguration` | User UI preferences |

---

## Security & Identity

### Authentication Flows

| Flow | Use Case | SDK Method |
|------|----------|------------|
| Client Credentials | Service-to-service, backend | `startClientLogin(clientId, secret)` |
| Authorization Code | User-facing web apps | `startLogin()` + `processCode()` |
| Google Sign-In | Google SSO | `startGoogleLogin()` |
| External IDP | Auth0, Cognito, Okta | `signInWithExternalAuth()` |
| JWT Bearer | Server-issued JWT assertion | `startJwtBearerLogin()` |
| Token Exchange | Convert external tokens | `exchangeExternalAccessToken()` |
| mTLS | Certificate-based auth | Server config |

### Access Policy Features

| Feature | Description |
|---------|-------------|
| Resource type rules | Allow/deny per resource type (read, write, create, delete) |
| Read-only fields | `readonlyFields` array on resource type |
| Hidden fields | `hiddenFields` array on resource type |
| Criteria filtering | FHIR search query (e.g., `Patient?address-state=CA`) |
| Compartments | Patient-based data isolation via `_compartment` |
| Parameterized | Variables: `%profile`, `%profile.id`, `%patient`, custom |
| Write constraints | FHIRPath expressions for state machine enforcement |
| SMART scopes | `patient/*.read`, `user/*.write` style scopes |
| IP rules | Restrict by IP address/CIDR |

---

## Automation

### Bot Handler Pattern

```typescript
import { BotEvent, MedplumClient } from '@medplum/core';
import { Patient } from '@medplum/fhirtypes';

export async function handler(medplum: MedplumClient, event: BotEvent): Promise<any> {
  const patient = event.input as Patient;
  // event.secrets — project secrets map
  // event.bot — reference to this Bot resource
  // event.traceId — request correlation ID
  return true;
}
```

### Bot Execution Triggers

| Trigger | Method |
|---------|--------|
| HTTP POST | `POST /fhir/R4/Bot/<ID>/$execute` |
| FHIR Subscription | Subscription criteria → rest-hook to `Bot/<ID>` |
| Cron schedule | Bot with cron expression in properties |
| Manual | Execute button in Medplum App |

### Subscription Pattern

```typescript
// Server-side: create Subscription resource
const sub = await medplum.createResource({
  resourceType: 'Subscription',
  status: 'active',
  criteria: 'Patient?name=Simpson',
  channel: { type: 'rest-hook', endpoint: 'Bot/<BOT_ID>' }
});

// Client-side: WebSocket subscription
medplum.subscribeToCriteria('Patient?name=Simpson');
```

---

## Integrations

| Integration | Path | Purpose |
|-------------|------|---------|
| DoseSpot | `/docs/integration/dosespot` | E-prescribing (enrollment, favorites, Rx) |
| Health Gorilla | `/docs/integration/health-gorilla` | Lab orders, receiving results |
| Stedi | `/docs/integration/stedi` | EDI/X12 eligibility checks |
| Candid Health | `/docs/integration/candid-health` | Revenue cycle management |
| eFax | `/docs/integration/efax` | Fax send/receive |
| HL7 v2 | `/docs/integration/hl7-interfacing` | ADT, ORM/OBR/OBX message interfacing |
| FHIRcast | `/docs/fhircast` | Real-time clinical context synchronization |
| SMART App Launch | `/docs/integration/smart-app-launch` | Embedded app launch framework |
| CDS Hooks | `/docs/integration/cds-hooks` | Clinical decision support at workflow triggers |
| C-CDA | `/docs/integration/c-cda` | Continuity of Care Document export |
| On-Prem Agent | `/docs/agent` | Bridge to on-prem HL7/DICOM systems |
| Log Streaming | `/docs/integration/log-streaming` | External log aggregation |

---

## Terminology Systems

| System | URI | Usage |
|--------|-----|-------|
| LOINC | `http://loinc.org` | Lab tests, vitals, clinical observations |
| SNOMED CT | `http://snomed.info/sct` | Clinical findings, procedures, body structures |
| ICD-10 | `http://hl7.org/fhir/sid/icd-10-cm` | Diagnoses, billing codes |
| RxNorm | `http://www.nlm.nih.gov/research/umls/rxnorm` | Medications (ingredients, brands, dose forms) |
| NDC | `http://hl7.org/fhir/sid/ndc` | Drug product codes (packaging level) |
| CPT | `http://www.ama-assn.org/go/cpt` | Procedure billing codes |
| UCUM | `http://unitsofmeasure.org` | Units of measure |
| US NPI | `http://hl7.org/fhir/sid/us-npi` | National Provider Identifier |
| US SSN | `http://hl7.org/fhir/sid/us-ssn` | Social Security Number |

---

## Compliance

| Standard | Path | Scope |
|----------|------|-------|
| HIPAA | `/docs/compliance/hipaa` | PHI protection, BAA, audit logging |
| SOC 2 Type II | `/docs/compliance/soc2` | Security, availability, confidentiality |
| HITRUST | `/docs/compliance/hitrust` | Healthcare security framework |
| ONC | `/docs/compliance/onc` | Health IT certification |
| CLIA/CAP | `/docs/compliance/clia-cap` | Laboratory certification |
| 21 CFR Part 11 | `/docs/compliance/cfr11` | Electronic records/signatures |
| ISO 9001 | `/docs/compliance/iso9001` | Quality management |
| HTI-1/HTI-4 | `/docs/compliance/hti-4` | Health tech interoperability rules |

---

## React Components (@medplum/react)

### Setup

Requires: React 18+, Mantine 7+, PostCSS with Mantine preset, `@medplum/core`, `@medplum/react`.

Provider nesting: `BrowserRouter` → `MedplumProvider` → `MantineProvider`.

### Key Components

| Component | Purpose |
|-----------|---------|
| `<MedplumProvider>` | Provides MedplumClient context to app |
| `<SignInForm>` | Authentication form |
| `<ResourceTable>` | Display resource fields in table |
| `<ResourceForm>` | Edit resource with auto-generated form |
| `<SearchControl>` | Search interface with filters and results |
| `<QuestionnaireForm>` | Render and submit FHIR Questionnaire |
| `<QuestionnaireBuilder>` | Build/edit Questionnaire resources |
| `<ChatControl>` | Communication thread interface |

### Key Hooks

| Hook | Purpose |
|------|---------|
| `useMedplum()` | Access MedplumClient instance |
| `useMedplumContext()` | Access client + profile + loading state |
| `useResource(ref)` | Read resource by reference |
| `useSearch(type, query)` | Execute search with React Suspense |
| `useSubscription(criteria)` | WebSocket subscription with auto-cleanup |

---

## Bundle Transaction Pattern

```typescript
const bundle = await medplum.executeBatch({
  resourceType: 'Bundle',
  type: 'transaction',
  entry: [
    {
      fullUrl: 'urn:uuid:patient-1',
      resource: { resourceType: 'Patient', name: [{ family: 'Smith' }] },
      request: { method: 'POST', url: 'Patient' }
    },
    {
      resource: {
        resourceType: 'Observation',
        subject: { reference: 'urn:uuid:patient-1' },  // internal ref
        code: { coding: [{ system: 'http://loinc.org', code: '8867-4' }] }
      },
      request: { method: 'POST', url: 'Observation' }
    }
  ]
});
```

Key patterns:
- `urn:uuid:` for internal references resolved server-side
- `ifNoneExist` on request for conditional creates
- `ifMatch` with `W/"versionId"` for optimistic concurrency
- Conditional references: `Practitioner?identifier=http://hl7.org/fhir/sid/us-npi|123`
- Async processing via `Prefer: respond-async` header for large bundles

---

## GraphQL Patterns

```graphql
# Search with nested resolution
{
  PatientList(name: "Eve", address_city: "Philadelphia") {
    id
    name { family given }
  }
}

# Reverse references
{
  Patient(id: "123") {
    encounters: EncounterList(_reference: patient) {
      id
      status
    }
  }
}

# Inline fragments for reference resolution
{
  DiagnosticReport(id: "456") {
    result {
      resource { ... on Observation { valueQuantity { value unit } } }
    }
  }
}
```

Notes: Search uses snake_case params (not kebab-case). `:not`, `:missing`, `:contains` modifiers not supported in GraphQL. Schema introspection disabled by default.

---

## kb/reference/merge-friendly-install-model.md

---
title: "Merge-Friendly Install Model"
category: reference
service: ai-toolkit
tags: [install, merge, hooks, injection, symlinks]
version: "1.0.0"
created: "2026-03-27"
last_updated: "2026-03-28"
description: "Reference description of how ai-toolkit preserves user content while installing toolkit components."
---

# Merge-Friendly Install Model

## Summary

`ai-toolkit` preserves user content while injecting toolkit behavior.

Instead of replacing entire directories or files, the installer uses merge-friendly strategies tailored to each component type.

## Component Strategies

| Component | Strategy | User content behavior |
|-----------|----------|-----------------------|
| `agents/*.md` | per-file symlinks | preserved; user file wins on name conflict |
| `skills/*/` | per-directory symlinks | preserved; user directory wins on name conflict |
| `settings.json` hooks | JSON merge with `_source: ai-toolkit` | preserved; toolkit entries removable |
| `constitution.md` | marker injection | preserved outside markers |
| `ARCHITECTURE.md` | marker injection | preserved outside markers |
| `CLAUDE.md` | marker injection | preserved outside markers |

## Why this model exists

This avoids two common failure modes:
1. users losing custom agents / skills due to whole-directory symlinks,
2. users losing custom hooks or docs due to full-file replacement.

## Operational Consequences

### Positive
- reversible installs and uninstalls,
- backward-compatible upgrades,
- safe coexistence of toolkit and user customizations,
- idempotent update flow.

### Trade-offs
- merged / copied artifacts require `ai-toolkit update` to refresh,
- hook merge logic depends on valid JSON and the `_source` tagging convention,
- install behavior is more complex than a simple copy or symlink-only model.

## Local Project Setup

Project-local setup uses the same preservation approach for files that should remain repository-specific, especially `CLAUDE.md` and `.claude/settings.local.json`.

## Related Documents

- `kb/reference/distribution-model.md`
- `kb/reference/global-install-model.md`
- `kb/reference/hooks-catalog.md`

---

## kb/reference/opencode-compatibility.md

---
title: "AI Toolkit - opencode Compatibility"
category: reference
service: ai-toolkit
tags: [opencode, compatibility, install, skills, hooks, mcp, plugins]
version: "1.0.0"
created: "2026-04-16"
last_updated: "2026-04-16"
description: "Reference for how ai-toolkit integrates with opencode — AGENTS.md, subagents, slash commands, JS plugin hook bridge, and MCP merge into opencode.json."
---

# AI Toolkit - opencode Compatibility

## Summary

opencode (https://opencode.ai) is the 11th supported editor. `ai-toolkit install --editors opencode` (or `--editors all`) lays down a full native integration: shared `AGENTS.md`, per-agent `.opencode/agents/` files, per-command `.opencode/commands/` files, a JS plugin bridging toolkit Bash hooks to opencode lifecycle events, and MCP server merge into `opencode.json`.

opencode also reads `CLAUDE.md` as a fallback, so a user without the native integration still gets baseline rules. The native path adds subagents, slash commands, hooks, and MCP.

## Local Install Outputs

`ai-toolkit install --local --editors opencode` generates:

- `AGENTS.md` (shared with Codex CLI via distinct marker sections)
- `.opencode/agents/ai-toolkit-*.md` (one per ai-toolkit agent, `mode: subagent`)
- `.opencode/commands/ai-toolkit-*.md` (one per user-invocable skill, required `template: |` frontmatter field)
- `.opencode/plugins/ai-toolkit-hooks.js` (JS plugin bridging Bash hooks)
- `opencode.json` (MCP key merged from `.mcp.json`, user keys preserved)

## Global Install Outputs

`ai-toolkit install --editors opencode` (no `--local`) lays down:

- `~/.config/opencode/AGENTS.md`
- `~/.config/opencode/agents/ai-toolkit-*.md`
- `~/.config/opencode/commands/ai-toolkit-*.md`
- `~/.config/opencode/plugins/ai-toolkit-hooks.js`
- `~/.config/opencode/opencode.json` (MCP merge, user keys preserved)

Files land directly under `~/.config/opencode/` (no `.opencode/` nesting) because that is the global layout opencode expects per https://opencode.ai/docs/config/. Shared hook scripts stay in `~/.softspark/ai-toolkit/hooks/` and are referenced by the global JS plugin.

## Editor Surface Comparison

| Feature            | Claude Code | Codex CLI       | opencode                                  |
|--------------------|-------------|-----------------|-------------------------------------------|
| Rules file         | `CLAUDE.md` | `AGENTS.md`     | `AGENTS.md` + `CLAUDE.md` fallback        |
| Subagents          | Yes         | No              | Yes (`mode: subagent`)                    |
| Slash commands     | Skills      | Adapted skills  | Native commands with frontmatter          |
| MCP                | Yes         | Yes             | Yes (`opencode.json`)                     |
| Lifecycle hooks    | JSON config | `.codex/hooks`  | JS/TS plugins (~30+ events)               |
| Global config dir  | `~/.claude` | `~/.codex`      | `~/.config/opencode`                      |
| Project config dir | `.claude`   | `.agents`       | `.opencode`                               |

## Shared AGENTS.md

opencode and Codex CLI both read `AGENTS.md`. The toolkit emits two distinct marker-bounded sections in a single file, so installing both editors does not clobber either. The Codex section is produced by `generate_codex.py`; the opencode section is produced by `generate_opencode.py`. Both sections reuse `codex_skill_adapter.py` because both editors lack Claude-only orchestration primitives (`Agent`, `TeamCreate`, `TaskCreate`).

## Subagent Translation Model

Each file in `app/agents/*.md` emits a corresponding `.opencode/agents/ai-toolkit-<name>.md` with:

- `description` — copied from the source agent frontmatter
- `mode: subagent` (required)
- `color` — copied when present

The `model` field is deliberately omitted. opencode requires the `provider/model-id` form; ai-toolkit only stores a short alias (`opus`/`sonnet`/`haiku`) which cannot be mapped without assuming a provider. opencode falls back to the user's `default_agent` / top-level `model` config.

Opencode treats these files as auto-completable with `@` and can delegate to them from the primary agent.

## Slash Command Translation Model

Only user-invocable skills (`user-invocable: true` or no `disable-model-invocation`) emit to `.opencode/commands/`. Knowledge skills (`user-invocable: false`) are intentionally skipped — they are not intended as commands.

Each command file carries opencode's required `template: |` frontmatter field, built from the SKILL.md body.

## Hook Bridge (JS Plugin)

`.opencode/plugins/ai-toolkit-hooks.js` is a single-file plugin that maps opencode events to the shared Bash hooks in `~/.softspark/ai-toolkit/hooks/`:

| opencode event             | Bash hook(s)                                                       |
|----------------------------|--------------------------------------------------------------------|
| `session.created`          | `session-start.sh` + `session-context.sh` + `mcp-health.sh`        |
| `session.compacted`        | `pre-compact.sh` + `pre-compact-save.sh` (PreCompact equivalent)   |
| `session.deleted`          | `session-end.sh` + `save-session.sh`                               |
| `message.updated`          | `user-prompt-submit.sh` + `track-usage.sh`                         |
| `message.part.updated`     | `user-prompt-submit.sh` + `track-usage.sh`                         |
| `tool.execute.before` (bash) | `guard-destructive.sh` + `commit-quality.sh`                     |
| `tool.execute.after`       | `post-tool-use.sh`                                                 |
| `permission.asked`         | `guard-destructive.sh` (approval-gate bridge)                      |
| `command.executed`         | `post-tool-use.sh`                                                 |

Plugin exports a single named export `AiToolkitHooks` — per opencode docs, named exports only (no default export). Hook scripts are invoked via Bun's `$` with the script path bound as a JS constant; opencode event payloads are passed as JSON on stdin, never interpolated into the shell command, so payload data cannot inject shell metacharacters. The toolkit's `exit 2` semantics for PreToolUse guards are preserved and bubble up as the plugin's return code.

**Intentionally unmapped events**: `tui.*`, `lsp.*`, `installation.*`, `session.idle/status/updated/error/diff`, `file.edited`, `file.watcher.updated`, `todo.updated`, `shell.env`, `server.connected`, `message.*.removed`, `experimental.*` — no matching Bash hook in the toolkit, or the event is opencode-UI-only.

## MCP Merge (opencode.json)

`generate_opencode_json.py` reads `.mcp.json` and merges its servers under the `mcp` key in `opencode.json`:

- `local` shape entries are translated to opencode's local command shape.
- `remote` shape entries are translated to opencode's remote URL shape.
- User-authored keys in `opencode.json` (outside `mcp`) are preserved.
- Re-running the generator is idempotent.

## Auto-Detection

The installer detects opencode as configured when any of these markers exist:

- `opencode.json`
- `.opencode/` directory
- `.opencode/agents/`
- `.opencode/commands/`
- `~/.config/opencode/`

`ai-toolkit update` picks up opencode automatically when detection fires.

## Uninstall & Reset

`scripts/install_steps/ai_tools.py` cleanup only removes ai-toolkit-marked artifacts:

- Generated `.opencode/agents/ai-toolkit-*.md`
- Generated `.opencode/commands/ai-toolkit-*.md`
- Generated `.opencode/plugins/ai-toolkit-hooks.js`
- Managed markers from `AGENTS.md`
- `mcp` key entries injected by the toolkit (user keys preserved)

User-authored opencode files and user-authored `opencode.json` keys are never deleted.

## Behavioral Limits

- opencode does not expose the full Claude hook event surface; only the events in the mapping table above are bridged. Claude-only events (`TaskCompleted`, `TeammateIdle`, `SubagentStart`, `SubagentStop`, `PreCompact`) are silently skipped.
- Multi-agent orchestration skills (`/orchestrate`, `/workflow`, `/swarm`, `/subagent-development`) run through the Codex adaptation layer — they use opencode subagents and explicit file ownership instead of Claude's `Agent`/`TaskCreate` primitives.

## Verification

The opencode integration is verified by:

1. Generator contract tests for the five `generate_opencode*.py` scripts (bats)
2. MCP merge idempotency and user-key preservation tests
3. Plugin export shape and event coverage tests
4. Auto-detection tests for install / update flow
5. `validate.py --strict` + `audit_skills.py --ci` in CI

## CLI Commands

| Command | Description |
|---------|-------------|
| `ai-toolkit opencode-md` | Generate `AGENTS.md` body for opencode |
| `ai-toolkit opencode-agents` | Generate `.opencode/agents/ai-toolkit-*.md` |
| `ai-toolkit opencode-commands` | Generate `.opencode/commands/ai-toolkit-*.md` |
| `ai-toolkit opencode-plugin` | Generate `.opencode/plugins/ai-toolkit-hooks.js` |
| `ai-toolkit opencode-json` | Merge MCP servers into `opencode.json` |

## Related

- `kb/reference/skills-catalog.md`
- `kb/reference/agents-catalog.md`
- `kb/reference/codex-cli-compatibility.md`
- `kb/reference/architecture-overview.md`
- `kb/reference/global-install-model.md`
- `kb/reference/mcp-editor-compatibility.md`

---

## kb/reference/plugin-pack-conventions.md

---
title: "Plugin Pack Conventions"
category: reference
service: ai-toolkit
tags: [plugins, plugin-packs, conventions, manifests, hooks, policy-packs]
version: "1.0.0"
created: "2026-03-28"
last_updated: "2026-04-13"
description: "Conventions for experimental ai-toolkit plugin packs, policy packs, hook packs, and plugin-creator scaffolding across Claude and Codex runtimes."
---

# Plugin Pack Conventions

## Purpose

`ai-toolkit` now includes experimental plugin packs under `app/plugins/` to formalize a runtime-aware plugin direction for Claude and optional global Codex layering without changing the default core install surface.

## Pack Types

| Type | Purpose | Example |
|------|---------|---------|
| `plugin-pack` | Curated bundle of existing assets by domain | `security-pack`, `research-pack` |
| `policy-pack` | Rules / compliance / governance overlays | future enterprise policy add-ons |
| `hook-pack` | Optional hook modules or observability bundles | status line, output style |

## Directory Contract

```text
app/plugins/<pack-name>/
├── plugin.json
├── README.md
├── hooks/        # optional, executable if present
├── rules/        # optional
├── skills/       # optional
├── agents/       # optional
└── templates/    # optional
```

## Manifest Contract

Required keys:
- `name`
- `description`
- `version`
- `domain`
- `type`
- `status`
- `requires`
- `includes`

`includes` should declare arrays for:
- `agents`
- `skills`
- `rules`
- `hooks`

## Naming Rules

- Pack directory and `name` should use lowercase-hyphen format
- Prefer `*-pack` suffix for curated bundles
- Hook module filenames should be kebab-case and executable
- Experimental packs should declare `"status": "experimental"`

## Adoption Rules

1. Packs are opt-in and must not be auto-installed by `ai-toolkit install`
2. Reuse core agents/skills before duplicating definitions
3. Optional hooks must be documented as opt-in and non-default
4. Policy packs should be additive and marker-injected where possible
5. Keep manifests small and reviewable; use README for narrative guidance

## CLI Management

```bash
ai-toolkit plugin list               # show all 11 packs with install status
ai-toolkit plugin install --editor claude <name>   # Claude global target
ai-toolkit plugin install --editor codex <name>    # Codex global target
ai-toolkit plugin install --editor all --all       # install all 11 packs for both runtimes
ai-toolkit plugin update --editor all --all        # update all installed packs
ai-toolkit plugin clean <name>       # prune data older than 90 days (default)
ai-toolkit plugin clean <name> --days 30  # prune data older than 30 days
ai-toolkit plugin remove --editor codex <name>     # remove from one runtime only
ai-toolkit plugin remove --editor all --all        # remove all installed packs everywhere
ai-toolkit plugin status --editor all              # show installed packs with runtime details
```

### What `plugin install` Does

1. **Parses** `--editor claude|codex|all` (default: `claude`)
2. **Copies** plugin-specific hooks to `~/.softspark/ai-toolkit/hooks/plugin-<pack>-<hook>.sh`
3. **Copies** plugin-specific scripts to `~/.softspark/ai-toolkit/plugin-scripts/<pack>/`
4. **Runs** init scripts if present (e.g. `init_db.py` for memory-pack — safe to re-run, preserves data)
5. **Claude target**: links missing agents/skills into `~/.claude/`, injects plugin-local rules into `~/.claude/CLAUDE.md`, and merges plugin hook entries into `~/.claude/settings.json`
6. **Codex target**: bootstraps global Codex assets in `HOME` (`~/AGENTS.md`, `~/.agents/skills`, `~/.agents/rules`, `~/.codex/hooks.json`) and then layers plugin-specific rules/hooks on top
7. **Records** installed state per runtime in `~/.softspark/ai-toolkit/plugins.json`

### What `plugin update` Does

1. **Removes** existing plugin runtime entries for the selected editor(s) (same as `remove`)
2. **Reinstalls** from the current source (same as `install`)
3. **Preserves plugin data** (e.g. memory-pack SQLite database is never deleted)
4. Shared plugin scripts/hooks are kept if another runtime still has the same pack installed
5. `--all` updates only currently installed packs for the selected runtime(s)

### What `plugin clean` Does

1. **Prunes** old plugin data based on `--days N` (default 90)
2. For memory-pack: deletes observations older than N days, removes orphan sessions, runs VACUUM
3. Shows before/after counts and DB size

### What `plugin remove` Does

1. **Claude target**: strips plugin hook entries from `~/.claude/settings.json` and removes plugin-local rule sections from `~/.claude/CLAUDE.md`
2. **Codex target**: strips plugin hook entries from `~/.codex/hooks.json` and removes `~/.agents/rules/plugin-<pack>-*.md`
3. **Shared assets** (`~/.softspark/ai-toolkit/hooks/plugin-*`, `plugin-scripts/<pack>/`) are removed only when no remaining runtime still uses that pack
4. **Updates** `plugins.json` state per runtime
5. **Leaves** core agents/skills untouched (they belong to the base install)
6. **Leaves** plugin data intact (e.g. `memory.db` — use `clean` to prune)

### Data Retention (memory-pack)

- **Auto-retention**: `session-summary.sh` hook auto-prunes observations older than 90 days on every session end (configurable via `MEMORY_RETENTION_DAYS` env var)
- **Manual clean**: `ai-toolkit plugin clean memory-pack --days 30`
- **Status**: `ai-toolkit plugin status --editor claude|codex|all` shows runtime-specific install details plus DB size, observation count, and date range where relevant

## Current Experimental Packs

| Pack | Domain | Agents | Skills | Hooks | Description |
|------|--------|--------|--------|-------|-------------|
| `security-pack` | security | 3 | 3 | 2 | Security auditing, threat modeling, OWASP |
| `research-pack` | research | 4 | 4 | 1 | Multi-source research, synthesis, fact-checking |
| `frontend-pack` | frontend | 3 | 3 | 1 | React/Vue/CSS, SEO, design engineering |
| `enterprise-pack` | enterprise | 3 | 3 | 3 | Executive briefings, infra architecture, status |
| `memory-pack` | memory | 0 | 1 | 2 | SQLite persistent memory with FTS5 search |
| `rust-pack` | rust | 0 | 1 | 0 | Rust patterns |
| `java-pack` | java | 0 | 1 | 0 | Java patterns |
| `csharp-pack` | csharp | 0 | 1 | 0 | C# patterns |
| `kotlin-pack` | kotlin | 0 | 1 | 0 | Kotlin patterns |
| `swift-pack` | swift | 0 | 1 | 0 | Swift patterns |
| `ruby-pack` | ruby | 0 | 1 | 0 | Ruby patterns |

## Optional Hook Modules

`enterprise-pack` provides two optional hook modules:
- `hooks/status-line.sh` — status line overlay
- `hooks/output-style.sh` — enterprise reporting style

`memory-pack` provides two hooks:
- `hooks/observation-capture.sh` — captures tool actions to SQLite (PostToolUse)
- `hooks/session-summary.sh` — summarizes session on Stop

These are intentionally excluded from the default install until explicitly enabled via `ai-toolkit plugin install`.

---

## kb/reference/quick-wins-implementation-summary.md

---
title: "Quick Wins Implementation Summary"
category: reference
service: ai-toolkit
tags: [implementation, hooks, cli, benchmark, validation]
version: "1.0.0"
created: "2026-03-28"
last_updated: "2026-04-01"
description: "Reference summary of the quick-win execution slice that became part of the baseline toolkit implementation."
---

# Quick Wins Implementation Summary

## Purpose

This document records the implementation slice that hardened the toolkit around creator workflows, diagnostics, lifecycle hooks, benchmark tooling, and validation.

## Delivered Runtime Features

### Creator workflows
- `app/skills/hook-creator/SKILL.md`
- `app/skills/command-creator/SKILL.md`
- `app/skills/agent-creator/SKILL.md`
- `app/skills/plugin-creator/SKILL.md`

### CLI and diagnostics
- `ai-toolkit doctor`
- `ai-toolkit benchmark-ecosystem`
- `scripts/harvest_ecosystem.py`

### Hook coverage
- `PreCompact`
- `PostToolUse`
- `UserPromptSubmit`
- `SubagentStart`
- `SubagentStop`
- `SessionEnd`

### Validation and benchmarks
- benchmark dashboard JSON
- benchmark harvest JSON
- plugin-pack validation
- benchmark freshness checks in `doctor`
- expanded lifecycle and asset checks in `validate.py`

## Delivered Documentation

Updated baseline docs:
- `README.md`
- `app/ARCHITECTURE.md`
- `kb/reference/architecture-overview.md`
- `kb/reference/hooks-catalog.md`
- `kb/reference/skills-catalog.md`
- `kb/reference/plugin-pack-conventions.md`
- `kb/reference/claude-ecosystem-benchmark-snapshot.md`
- `kb/procedures/maintenance-sop.md`

## Validation Evidence

The implementation is backed by:
- `scripts/validate.py`
- CLI tests
- install tests
- generator tests
- metadata contract tests
- validator negative tests

## Final Outcome

The quick-win slice is no longer a pending execution plan. Its outputs are part of the default toolkit baseline and should be treated as shipped product behavior.

---

## kb/reference/skill-templates.md

---
title: "AI Toolkit - Skill Templates"
category: reference
service: ai-toolkit
tags: [templates, scaffolding, create, skills]
version: "1.0.0"
created: "2026-03-29"
last_updated: "2026-04-01"
description: "5 skill templates for scaffolding new skills: linter, reviewer, generator, workflow, knowledge."
---

# Skill Templates

## Overview

`ai-toolkit create skill` scaffolds new skills from predefined templates. Each template produces a valid SKILL.md that passes `validate.py`.

## Usage

```bash
ai-toolkit create skill my-skill --template=linter
ai-toolkit create skill my-checker --template=reviewer --description="Review security headers"
```

## Available Templates

| Template | Skill Type | Key Frontmatter | Use When |
|----------|-----------|-----------------|----------|
| `linter` | Task | `disable-model-invocation: true`, `allowed-tools: Bash, Read` | Automated checks, validators |
| `reviewer` | Hybrid | `context: fork`, `agent: code-reviewer` | Code review with agent delegation |
| `generator` | Task | `allowed-tools: Read, Write, Bash, Glob` | File generation, scaffolding |
| `workflow` | Hybrid | `context: fork`, `agent: orchestrator`, `model: opus` | Multi-phase orchestration |
| `knowledge` | Knowledge | `user-invocable: false` | Auto-loaded domain patterns |

## Template Variables

| Variable | Replaced With | Example |
|----------|--------------|---------|
| `{{NAME}}` | Skill name argument | `my-linter` |
| `{{DESCRIPTION}}` | `--description` value or default | `Provides my-linter functionality` |

## Template Location

Templates are stored in `app/templates/skill/{type}/SKILL.md.template`.

## After Scaffolding

1. Edit the generated `app/skills/{name}/SKILL.md`
2. Add `reference/` or `templates/` subdirectories if needed
3. Run `ai-toolkit validate` to verify

---

## kb/reference/skills-catalog.md

---
title: "AI Toolkit - Skills Catalog"
category: reference
service: ai-toolkit
tags: [skills, domain-knowledge, catalog, task-skills, hybrid-skills]
version: "1.4.3"
created: "2026-03-23"
last_updated: "2026-04-12"
description: "Complete skills catalog with task, hybrid, and knowledge skills. Includes Codex adaptation notes, effort levels, skill-scoped hooks, executable scripts, security auditor, and persona presets."
---

# Skills Catalog

All functionality is unified under skills. Task and hybrid skills are user-invocable as slash commands. Knowledge skills provide domain patterns auto-loaded by agents.

## Skill Tiers

| Tier | Skills | When |
|------|--------|------|
| **1 — Quick single-agent** | `/debug`, `/review`, `/refactor`, `/analyze`, `/docs`, `/plan`, `/explain`, `/tdd`, `/grill-me`, `/triage-issue` | One concern, fast |
| **1.5 — Product planning** | `/write-a-prd` → `/prd-to-plan` → `/prd-to-issues` | Interview-driven PRD → vertical-slice plan → GitHub issues |
| **1.5 — Design & architecture** | `/design-an-interface`, `/architecture-audit`, `/refactor-plan`, `/ubiquitous-language`, `/qa-session` | Parallel sub-agent exploration |
| **2 — Multi-agent workflow** | `/workflow <type>` | Cross-cutting task with known pattern |
| **3 — Custom parallelism** | `/orchestrate`, `/swarm` | No predefined workflow matches |

## Task Skills (30)

Task skills execute a specific action. Invoked via slash commands. `disable-model-invocation: true`.

| Skill | Slash Command | Effort | Description |
|-------|---------------|--------|-------------|
| **commit** | `/commit` | medium | Create well-structured git commits (Conventional Commits) |
| **pr** | `/pr` | medium | Create GitHub pull request with template and checks |
| **test** | `/test` | medium | Run tests (auto-detect: pytest, vitest, jest, phpunit, flutter, go, cargo) |
| **build** | `/build` | low | Build the current project (auto-detects project type) |
| **lint** | `/lint` | low | Run linting and type checking (ruff/mypy, eslint/tsc, phpstan, dart analyze) |
| **fix** | `/fix` | low | Autonomously fix failing tests or lint errors (iterative loop) |
| **deploy** | `/deploy` | medium | Deploy to target environment with pre-deployment checks |
| **rollback** | `/rollback` | medium | Safe rollback (git, database migrations, deployments) |
| **migrate** | `/migrate` | medium | Database migration workflow (auto-detect: Alembic, Prisma, Laravel, Django) |
| **ci** | `/ci` | medium | Generate/manage CI/CD pipeline configuration (GitHub Actions, GitLab CI) |
| **panic** | `/panic` | low | EMERGENCY: Immediately halt all autonomous agent operations |
| **index** | `/index` | low | Reindex knowledge base to vector store with change detection |
| **onboard** | `/onboard` | medium | Guided project setup with the toolkit |
| **night-watch** | `/night-watch` | medium | Trigger Night Watchman autonomous maintenance cycle |
| **evolve** | `/evolve` | medium | Trigger Meta-Architect self-optimization cycle |
| **chaos** | `/chaos` | medium | Trigger Chaos Engineering experiment |
| **predict** | `/predict` | medium | Predict impact and risks of code changes |
| **biz-scan** | `/biz-scan` | medium | Scan project for business value opportunities and metric gaps |
| **briefing** | `/briefing` | medium | Generate daily executive summary of system status |
| **evaluate** | `/evaluate` | medium | Evaluate RAG quality using LLM-as-a-Judge methodology |
| **skill-creator** | `/skill-creator` | high | Create new skills following Agent Skills standard |
| **hook-creator** | `/hook-creator` | high | Create new Claude Code hooks with conventions and validation |
| **command-creator** | `/command-creator` | high | Create new slash commands with frontmatter and workflow guidance |
| **agent-creator** | `/agent-creator` | high | Create new specialized agents with trigger and tool selection guidance |
| **plugin-creator** | `/plugin-creator` | high | Create experimental opt-in plugin packs with manifests, conventions, and optional modules |
| **health** | `/health` | medium | Check health of project services (auto-detect) |
| **prd-to-issues** | `/prd-to-issues` | medium | Break PRD into GitHub issues with vertical slices and HITL/AFK tagging |
| **skill-audit** | `/skill-audit` | medium | Scan skills and agents for security risks: dangerous patterns, secrets, excessive permissions |
| **hipaa-validate** | `/hipaa-validate` | medium | Scan codebase for HIPAA compliance issues: PHI exposure, missing audit logging, unencrypted transmission/storage, access control gaps, temp file exposure, and missing BAA references |
| **a11y-validate** | `/a11y-validate` | medium | Scan codebase for accessibility violations: WCAG 2.1 Level AA, EN 301 549, European Accessibility Act (EAA / EU 2019/882). Covers semantics, keyboard, focus, color contrast, forms, media, ARIA, motion, mobile (React Native + Flutter), and EAA accessibility-statement documentation. |
| **seo-validate** | `/seo-validate` | medium | Scan codebase for SEO issues: W3C semantics, meta/OG tags, Schema.org, hreflang, Core Web Vitals (LCP/INP/CLS), resource hints, GEO, SPA/SSG/CSR crawlability, technical SEO, accessibility-for-SEO. Framework-aware (Next/Nuxt/Astro/Gatsby/SvelteKit/Remix/Angular/Vue/static HTML). |
| **mcp-builder** | `/mcp-builder` | high | Build production-grade MCP servers using the 4-phase methodology (research, implement, test, evaluate). TypeScript/Python, stdio/streamable-http. |

## Hybrid Skills (32)

Hybrid skills combine slash-command invocation with domain knowledge that agents reference.

| Skill | Slash Command | Effort | Description |
|-------|---------------|--------|-------------|
| **explore** | `/explore` | medium | Explore and understand codebase structure and tech stack |
| **debug** | `/debug` | medium | Systematic debugging with logs, health checks, diagnostics (Tier 1 — single agent) |
| **review** | `/review` | high | Review code changes: quality, security, performance (Tier 1 — single agent) |
| **plan** | `/plan` | high | Create structured plan with task breakdown and agent assignments |
| **refactor** | `/refactor` | high | Plan and execute code refactoring with safety checks (Tier 1 — single agent) |
| **analyze** | `/analyze` | medium | Analyze code quality, complexity, and patterns |
| **cve-scan** | `/cve-scan` | medium | Scan project dependencies for known CVEs using native audit tools (npm, pip, composer, cargo, go, ruby, dart) |
| **docs** | `/docs` | high | Generate/update docs: README, API docs, architecture notes, changelogs (Tier 1 — single agent) |
| **explain** | `/explain` | medium | Explain architecture of a file/module using Mermaid diagrams |
| **orchestrate** | `/orchestrate` | max | Custom multi-agent parallelism — Tier 3, native in Claude, Codex-adapted to `spawn_agent` workflows |
| **swarm** | `/swarm` | max | Massive parallelism: map-reduce, consensus, relay — Tier 3 |
| **workflow** | `/workflow` | max | 15 predefined multi-agent workflow types — Tier 2, Codex-adapted to native subagent orchestration |
| **instinct-review** | `/instinct-review` | low | Review, curate, and manage learned instincts from past sessions |
| **write-a-prd** | `/write-a-prd` | high | Create PRD through interactive interview, codebase exploration, and module design |
| **prd-to-plan** | `/prd-to-plan` | high | Convert PRD into phased implementation plan using tracer-bullet vertical slices |
| **tdd** | `/tdd` | high | Test-driven development with red-green-refactor loop and vertical slices |
| **design-an-interface** | `/design-an-interface` | high | Generate 3+ radically different interface designs using parallel sub-agents |
| **grill-me** | `/grill-me` | medium | Stress-test a plan through relentless Socratic questioning |
| **ubiquitous-language** | `/ubiquitous-language` | medium | Extract DDD-style ubiquitous language glossary from conversation |
| **refactor-plan** | `/refactor-plan` | high | Create detailed refactor plan with tiny commits via user interview |
| **qa-session** | `/qa-session` | high | Interactive QA session — report bugs conversationally, file GitHub issues |
| **triage-issue** | `/triage-issue` | high | Triage bug with deep codebase exploration and TDD fix plan |
| **architecture-audit** | `/architecture-audit` | high | Discover shallow modules and propose module-deepening refactors |
| **subagent-development** | `/subagent-development` | high | Execute plans with 2-stage review (spec + quality) per task |
| **repeat** | `/repeat` | medium | Autonomous loop with safety controls (Ralph Wiggum pattern) |
| **mem-search** | `/mem-search` | medium | Search past coding sessions via natural language (memory-pack) |
| **persona** | `/persona` | low | Switch engineering persona at runtime (backend-lead, frontend-lead, devops-eng, junior-dev) |
| **council** | `/council` | high | 4-perspective decision evaluation (Advocate, Critic, Pragmatist, User-Proxy) with synthesis and confidence-rated recommendation. Tier 1, orchestrator, `context: fork`. |
| **introspect** | `/introspect` | medium | Agent self-debugging: classify failure pattern, suggest smallest recovery action, emit structured introspection report |
| **brand-voice** | `/brand-voice` | medium | Anti-trope list, voice principles, LLM rhetoric prevention; output modes (`concise` ≤60% tokens, `strict` ≤40%) governing conversational responses. |

### `/workflow` types

| Type | Agents | Use case |
|------|--------|----------|
| `feature-development` | 8 | Full stack feature: plan → backend + frontend + DB + tests + security + docs |
| `backend-feature` | 5 | Backend only: API + logic + DB + tests + security |
| `frontend-feature` | 4 | UI: component + state + tests + docs |
| `api-design` | 7 | API contract → implement → test → benchmark → document |
| `database-evolution` | 7 | Schema change + migration + ORM update + tests + perf + security |
| `test-coverage` | 4 | Boost coverage: map gaps → unit tests + fixtures → review |
| `security-audit` | 7 | Multi-vector: OWASP + code + infra + DB → prioritize → report |
| `codebase-onboarding` | 6 | Read-only: structure + architecture + DB + tests + security → guide |
| `spike` | 7 | Research → feasibility → security + perf → architecture note |
| `debugging` | 5 | Diagnose → fix → test → document |
| `incident-response` | 3 | Triage → fix → postmortem |
| `performance-optimization` | 4 | Profile → optimize → benchmark → document |
| `infrastructure-change` | 5 | Design + implement + security + tests + runbook |
| `application-deploy` | 3 | Deploy → smoke test → release notes |
| `proactive-troubleshooting` | 4 | Investigate → check perf → preventive fix → docs |

## Knowledge Skills - Development (15)

| Skill | Directory | Domain |
|-------|-----------|--------|
| **app-builder** | `skills/app-builder/` | Full-stack application architecture |
| **api-patterns** | `skills/api-patterns/` | REST/GraphQL design, versioning, error handling |
| **database-patterns** | `skills/database-patterns/` | Schema design, indexing, query optimization |
| **flutter-patterns** | `skills/flutter-patterns/` | Flutter/Dart architecture, state management |
| **ecommerce-patterns** | `skills/ecommerce-patterns/` | E-commerce: catalog, cart, checkout, payments |
| **clean-code** | `skills/clean-code/` | Multi-language code quality: Python, TS, PHP, Go, Dart |
| **typescript-patterns** | `skills/typescript-patterns/` | TypeScript/JavaScript patterns for frontend and backend |
| **rust-patterns** | `skills/rust-patterns/` | Ownership, borrowing, error handling, Cargo, tokio, serde |
| **java-patterns** | `skills/java-patterns/` | Records, sealed classes, Stream API, Spring Boot, JUnit 5 |
| **csharp-patterns** | `skills/csharp-patterns/` | Nullable refs, async/await, ASP.NET Core, EF Core |
| **kotlin-patterns** | `skills/kotlin-patterns/` | Coroutines, DSLs, sealed classes, Ktor, MockK |
| **swift-patterns** | `skills/swift-patterns/` | Protocol-oriented, SwiftUI, async/await, SPM |
| **ruby-patterns** | `skills/ruby-patterns/` | Blocks, Rails conventions, RSpec, ActiveRecord |
| **design-engineering** | `skills/design-engineering/` | UI polish, animation craft, easing, transforms, accessibility |
| **documentation-standards** | `skills/documentation-standards/` | KB document conventions, frontmatter validation, category taxonomy |

## Knowledge Skills - Infrastructure (6)

| Skill | Directory | Domain |
|-------|-----------|--------|
| **docker-devops** | `skills/docker-devops/` | Docker, deployment, infrastructure |
| **security-patterns** | `skills/security-patterns/` | OWASP, auth, encryption, vulnerability prevention |
| **ci-cd-patterns** | `skills/ci-cd-patterns/` | GitHub Actions, GitLab CI, Docker builds, Kubernetes |
| **observability-patterns** | `skills/observability-patterns/` | Logging, metrics, tracing, monitoring, SLOs |
| **testing-patterns** | `skills/testing-patterns/` | Multi-language TDD: pytest, vitest, phpunit, go test, flutter |
| **migration-patterns** | `skills/migration-patterns/` | Database migrations, API versioning, zero-downtime |

## Knowledge Skills - AI/RAG (6)

| Skill | Directory | Domain |
|-------|-----------|--------|
| **rag-patterns** | `skills/rag-patterns/` | RAG pipelines, chunking, reranking, evaluation |
| **mcp-patterns** | `skills/mcp-patterns/` | MCP protocol, server/client design, tools |
| **prompt-caching-patterns** | `skills/prompt-caching-patterns/` | Anthropic prompt caching: TTL, breakpoints, hit rate, anti-patterns |
| **json-mode-patterns** | `skills/json-mode-patterns/` | Structured JSON output via tool-use; schema design; partial recovery |
| **content-moderation-patterns** | `skills/content-moderation-patterns/` | Two-stage moderation: pre-filter + LLM classifier; categories; thresholds |
| **model-routing-patterns** | `skills/model-routing-patterns/` | Haiku/Sonnet/Opus routing; escalation; sub-agent delegation; fallback |

## Knowledge Skills - Process (5)

| Skill | Directory | Domain |
|-------|-----------|--------|
| **git-mastery** | `skills/git-mastery/` | Git workflows, branching, conflict resolution |
| **architecture-decision** | `skills/architecture-decision/` | Architecture notes, trade-off analysis, alternatives |
| **performance-profiling** | `skills/performance-profiling/` | Profiling, bottleneck analysis, optimization |
| **research-mastery** | `skills/research-mastery/` | Multi-source research, synthesis, fact-checking |
| **verification-before-completion** | `skills/verification-before-completion/` | Iron Law: evidence-before-claims, no completion without fresh verification |

## Quality Guardrails

### Anti-Rationalization Tables

15 core skills include `## Common Rationalizations` — domain-specific tables of excuses and rebuttals that prevent agent drift and shortcut-taking:

| Skill | Example rationalization blocked |
|-------|---------------------------------|
| `/review` | "Small change, quick scan is enough" |
| `/debug` | "It must be a library bug" |
| `/refactor` | "It works, don't touch it" |
| `/tdd` | "Too simple to test" |
| `/plan` | "Planning is wasted time, just start coding" |
| `/docs` | "The code is self-documenting" |
| `/analyze` | "The linter is green, the code is fine" |
| `security-patterns` | "It's an internal API, security doesn't matter" |
| `testing-patterns` | "Tests slow down development" |
| `api-patterns` | "We'll version the API later" |
| `ci-cd-patterns` | "Manual deploys give us more control" |
| `clean-code` | "It's readable enough" |
| `performance-profiling` | "It feels slow, let me optimize this function" |
| `git-mastery` | "One big commit is simpler" |
| `database-patterns` | "We'll add indexes later when it's slow" |

### Confidence Scoring (`/review`)

The `/review` skill outputs structured findings with:
- **Severity**: critical / major / minor / nit
- **Confidence score**: 1-10 per finding with calibration guide
- **Evidence requirement**: each finding must include file:line + reasoning

### Self-Evaluation — LLM-as-Judge (`/review`)

After completing a review, the agent performs a self-evaluation pass:
1. Verify vs assume — did I read actual code for each finding?
2. Check the inverse — if X is a problem, is NOT-X also a problem elsewhere?
3. Detect anchoring bias — did early findings bias toward similar patterns?
4. Check unhappy paths — error handling, edge cases, failure modes
5. Calibrate confidence — overconfident? re-examine weakest finding

### Agent Verification Checklists

10 key agents include `## Verification Checklist` — exit criteria before presenting results:
`code-reviewer`, `test-engineer`, `security-auditor`, `debugger`, `backend-specialist`, `frontend-specialist`, `database-architect`, `performance-optimizer`, `devops-implementer`, `documenter`.

### Skill Reference Routing

7 core skills include `## Related Skills` sections suggesting logical follow-up skills:
`/review`, `/debug`, `/plan`, `/refactor`, `/tdd`, `/docs`, `/analyze`.

### Intent Capture Interview (`/onboard`)

Step 0 interview before setup — 5 targeted questions to capture undocumented project intent, customizing the generated `CLAUDE.md`.

---

## Advanced Features

### Effort Levels
- **low**: Mechanical operations (lint, build, fix, panic, index)
- **medium**: Standard operations (most skills)
- **high**: Complex reasoning (review, plan, refactor, docs, skill-creator)
- **max**: Multi-agent orchestration (orchestrate, swarm, workflow)

### Skill-Scoped Hooks
5 skills have lifecycle hooks:
- **commit**: PreToolUse — lint reminder before committing
- **test**: PostToolUse — coverage threshold reminder
- **deploy**: PostToolUse — health check reminder
- **migrate**: PreToolUse — backup reminder before migrations
- **rollback**: PostToolUse — verification reminder after rollback

### Skill Frontmatter Conventions
- `agent: <name>` — delegates to a specialized agent persona
- `context: fork` — runs skill in isolated forked context
- `allowed-tools: ...` — tools available to the agent when processing this skill
- `depends-on: skill-a, skill-b` — declares dependencies on other skills (validated by `validate.py`)

### Codex CLI Adaptation

Codex CLI receives the full skill catalog during `ai-toolkit install --local --editors codex`.

- Native Codex-compatible skills are symlinked directly into `.agents/skills/`
- Claude-oriented orchestration skills are generated as Codex wrappers
- Adapted wrappers translate `Agent`, `Team*`, and `Task*` guidance to `spawn_agent`, `send_input`, `wait_agent`, `close_agent`, and `update_plan`

Common adapted skills:

- `/orchestrate`
- `/workflow`
- `/swarm`
- `/subagent-development`
- `/tdd`

The translated skill content keeps the original support assets (`reference/`,
`scripts/`, `assets/`) while replacing Claude-specific runtime instructions.

See `kb/reference/codex-cli-compatibility.md` for the detailed mapping and hook limits.

### Skill Dependencies (`depends-on`)
Skills can declare dependencies on other skills (primarily knowledge skills) for documentation and validation:
```yaml
depends-on: clean-code, api-patterns
```
- CSV list of skill directory names
- Validated by `validate.py` — each dep must exist as `app/skills/{dep}/SKILL.md`
- Reported in `evaluate_skills.py` quality metrics
- No runtime autoloading — Claude loads knowledge skills contextually based on topic matching

### SLM Compilation (`compile-slm`)

Compiles the full toolkit into a minimal system prompt for local Small Language Models (Ollama, LM Studio, Aider, Continue.dev). Pipeline: Parse → Score → Compress → Pack → Validate → Emit.

| Flag | Purpose |
|------|---------|
| `--model-size` | 7b/8b/14b/32b/70b — auto-selects budget + compression level |
| `--budget` | Token budget override (2K-16K) |
| `--persona` | Boost persona-relevant skills in scoring |
| `--lang` | Include language-specific rules |
| `--format` | Output: raw, ollama, json-string, aider |
| `--dry-run` | Preview included components + token utilization |

Profile `offline-slm` in `manifest.json` — installs core only, then compiles.

### Executable Scripts (18 total, stdlib-only, JSON output)

| Skill | Script | Purpose |
|-------|--------|---------|
| **commit** | `scripts/pre-commit-check.py` | Staged files, secrets detection |
| **test** | `scripts/detect-runner.py` | Auto-detect test framework |
| **lint** | `scripts/detect-linters.py` | Detect available linters |
| **build** | `scripts/detect-build.py` | Detect build system |
| **deploy** | `scripts/pre_deploy_check.py` | Pre-deployment readiness |
| **rollback** | `scripts/rollback_info.py` | Rollback context |
| **migrate** | `scripts/migration-status.py` | Detect migration tool, status |
| **ci** | `scripts/ci-detect.py` | Detect CI platform |
| **fix** | `scripts/error-classifier.py` | Classify lint/test errors |
| **pr** | `scripts/pr-summary.py` | Generate PR title/description |
| **review** | `scripts/diff-analyzer.py` | Parse git diff, categorize files |
| **debug** | `scripts/error-parser.py` | Parse stack traces |
| **explore** | `scripts/visualize.py` | Interactive HTML codebase tree |
| **explain** | `scripts/dependency-graph.py` | Import graph → Mermaid |
| **docs** | `scripts/doc-inventory.py` | Inventory docs, measure coverage |
| **refactor** | `scripts/refactor-scan.py` | Detect code smells |
| **health** | `scripts/health_check.py` | JSON health report |
| **analyze** | `scripts/complexity.py` | Code complexity metrics |

---

## kb/reference/skills-unification.md

---
title: "Skills Unification Model"
category: reference
service: ai-toolkit
tags: [skills, commands, architecture, classification]
version: "1.0.0"
created: "2026-03-25"
last_updated: "2026-03-28"
description: "Reference explanation of why ai-toolkit standardizes on the Agent Skills format for slash-command behavior."
---

# Skills Unification Model

## Summary

`ai-toolkit` standardizes on the Agent Skills directory format for all reusable slash-command behavior.

The toolkit no longer treats commands and skills as separate implementation models. Instead, it uses one consistent format:

- task skills,
- hybrid skills,
- knowledge skills.

## Why this model is used

The Agent Skills format supports capabilities that plain command markdown files do not:
- richer frontmatter,
- progressive disclosure,
- bundled scripts,
- templates and reference files,
- cross-tool compatibility.

## Classification

| Type | Frontmatter signal | Purpose |
|------|--------------------|---------|
| Task skill | `disable-model-invocation: true` | explicit user-triggered actions |
| Hybrid skill | default invocation | user-invocable + agent-usable workflows |
| Knowledge skill | `user-invocable: false` | auto-loaded patterns and conventions |

## Consequences

### Positive
- one mental model for reusable behavior,
- easier validation,
- simpler install logic,
- better alignment with Claude Code ecosystem conventions.

### Trade-offs
- more directories than a flat commands model,
- stronger need for naming and frontmatter conventions,
- documentation and generators must stay synchronized with counts.

## Related Documents

- `kb/reference/skills-catalog.md`
- `kb/reference/architecture-overview.md`

---

## kb/reference/stats.md

---
title: "AI Toolkit - Usage Statistics"
category: reference
service: ai-toolkit
tags: [stats, usage, tracking, analytics]
version: "1.0.0"
created: "2026-03-29"
last_updated: "2026-03-29"
description: "Local usage tracking for skill invocations. CLI command, JSON format, hook mechanism."
---

# Usage Statistics

## Overview

`ai-toolkit stats` tracks how often each skill is invoked via slash commands. All data is local — stored in `~/.softspark/ai-toolkit/stats.json`. No telemetry, no network calls.

## CLI Commands

```bash
ai-toolkit stats           # Show usage table (sorted by count)
ai-toolkit stats --reset   # Clear all stats
ai-toolkit stats --json    # Output raw JSON
```

## How It Works

A `UserPromptSubmit` hook (`track-usage.sh`) fires on every prompt. When the prompt starts with `/skill-name`, it increments the counter in `stats.json`.

### Hook Details
- **Event**: `UserPromptSubmit`
- **Script**: `~/.softspark/ai-toolkit/hooks/track-usage.sh`
- **Detection**: `grep -oE '^/[a-z][a-z0-9-]*'`
- **Storage**: Atomic write via python3 `os.replace()`
- **Overhead**: ~50ms (python3 startup + JSON read/write)

## JSON Format

```json
{
  "commit": {
    "count": 42,
    "last_used": "2026-03-29 14:30:00"
  },
  "review": {
    "count": 15,
    "last_used": "2026-03-28 09:12:00"
  }
}
```

## Output Example

```
AI Toolkit Usage Stats
========================

Skill                           Count  Last Used
------------------------------------------------------------
commit                             42  2026-03-29 14:30:00
review                             15  2026-03-28 09:12:00
debug                               8  2026-03-27 16:45:00

Total invocations: 65
Unique skills: 3

File: ~/.softspark/ai-toolkit/stats.json
Reset: ai-toolkit stats --reset
```

---

## kb/reference/supported-tools-registry.md

---
title: "Supported Tools Registry"
category: reference
service: ai-toolkit
tags: [editors, platforms, generators, integration, ecosystem]
version: "1.3.0"
created: "2026-04-23"
last_updated: "2026-05-30"
description: "Human-readable view of scripts/ecosystem_tools.json — the canonical list of tools ai-toolkit integrates with (Claude Code + 11 editors), their documentation URLs, config paths, our generators, and tracked capability markers."
---

# Supported Tools Registry

The canonical data lives in **`scripts/ecosystem_tools.json`** and is consumed by `scripts/ecosystem_doctor.py`. This document is a derived view — when the JSON changes, update this table too.

## Tool Count: 12

1 primary runtime (Claude Code) + 11 editor integrations.

---

## Primary Runtime

### Claude Code

| Field | Value |
|-------|-------|
| ID | `claude-code` |
| Docs | https://platform.claude.com/docs/en/claude-code |
| Release notes | https://github.com/anthropics/claude-code/releases |
| Config paths | `~/.claude/settings.json`, `.claude/settings.local.json`, `CLAUDE.md`, `.claude/agents/*.md`, `.claude/skills/*/SKILL.md`, `~/.claude/themes/*.json` (v2.1.118+) |
| Our generators | — (Claude Code is the primary target; toolkit content ships directly as `.md` files and `settings.json` merges) |
| Tracked hook events | Core: `SessionStart`, `SessionEnd`, `UserPromptSubmit`, `Notification`, `MessageDisplay`. Tool: `PreToolUse`, `PostToolUse`, `PostToolUseFailure`, `PostToolBatch`. Turn: `Stop`, `StopFailure`, `UserPromptExpansion`. Subagent: `SubagentStart`, `SubagentStop`. Compaction: `PreCompact`, `PostCompact`. Permissions: `PermissionRequest`, `PermissionDenied`. Elicitation: `Elicitation`, `ElicitationResult`. Teams: `TaskCreated`, `TaskCompleted`, `TeammateIdle`. Worktrees/env: `WorktreeCreate`, `WorktreeRemove`, `CwdChanged`, `FileChanged`, `ConfigChange`. Setup: `Setup`, `InstructionsLoaded` |
| Tracked handler types | `command`, `prompt`, `agent`, `mcp_tool` |
| Other capabilities | slash commands, MCP server/client, sub-agent, output style, `SKILL.md` (≥500 lines warn) |
| Version probe | `claude --version` |

---

## Editor Integrations

### Cursor

| Field | Value |
|-------|-------|
| ID | `cursor` |
| Docs | https://cursor.com/docs |
| Changelog | https://cursor.com/changelog |
| Stable docs mirror | https://cursor.com/llms.txt (all doc pages have .md twins) |
| Config paths | `.cursorrules`, `.cursor/rules/*.mdc`, `.cursor/rules/*.md`, `AGENTS.md`, `.cursor/mcp.json`, `~/.cursor/mcp.json`, `.cursor/skills/*/SKILL.md`, `~/.cursor/skills/*/SKILL.md`, `.cursor/agents/*.md`, `.cursor/hooks.json` |
| Compat read paths | `.agents/skills/`, `~/.agents/skills/`, `.claude/skills/`, `~/.claude/skills/`, `.codex/skills/`, `~/.codex/skills/` |
| Our generators | `scripts/generate_cursor_rules.py`, `scripts/generate_cursor_mdc.py`, `scripts/generate_cursor_hooks.py` (profile=full), `scripts/generate_cursor_agents.py` (profile=full), `scripts/generate_cursor_skills.py` (profile=full pointer) |
| Tracked capabilities | `cursorrules`, `.cursor/rules`, `AGENTS.md`, `mcp.json`, Composer, Agent Mode, hooks.json, subagents, skills, plugins |

### Windsurf

| Field | Value |
|-------|-------|
| ID | `windsurf` |
| Docs | https://docs.windsurf.com |
| Changelog | https://windsurf.com/changelog |
| Stable docs mirror | https://docs.windsurf.com/llms.txt + per-page .md twins |
| Config paths | `.windsurfrules`, `.windsurf/rules/*.md`, `.windsurf/workflows/*.md`, `.windsurf/skills/*/SKILL.md`, `AGENTS.md`, `~/.codeium/windsurf/memories/global_rules.md`, `~/.codeium/windsurf/skills/*/SKILL.md`, `~/.codeium/windsurf/mcp_config.json` |
| Compat read paths | `.agents/skills/`, `~/.agents/skills/`, (with Claude Code config-reading) `.claude/skills/`, `~/.claude/skills/` |
| Our generators | `scripts/generate_windsurf.py`, `scripts/generate_windsurf_rules.py`, `scripts/generate_windsurf_hooks.py` (profile=full), `scripts/generate_windsurf_skills.py` (global + profile=full pointer) |
| Tracked capabilities | Cascade, `windsurfrules`, `AGENTS.md`, activation triggers (`always_on`/`glob`/`model_decision`), workflows, skills, MCP, memories, hooks |
| Activation modes emitted | always_on (agents/security/quality), glob (testing + language rules), model_decision (code-style/workflow) |

### GitHub Copilot

| Field | Value |
|-------|-------|
| ID | `github-copilot` |
| Docs | https://docs.github.com/en/copilot |
| Release notes | https://github.blog/changelog/label/copilot/ |
| Config paths | `.github/copilot-instructions.md`, `.github/instructions/*.instructions.md`, `.github/prompts/*.prompt.md`, `AGENTS.md` |
| Our generators | `scripts/generate_copilot.py` |
| Tracked capabilities | `copilot-instructions.md`, Copilot Chat, Copilot Workspace, Copilot cloud agent, `applyTo`, custom agents, prompt files, `instructions.md`, MCP |
| Tier notes | Custom agents (`.github/agents/*.agent.md`) and repo-level MCP config are Pro/Pro+/Business/Enterprise only and intentionally not integrated (class C per ecosystem-sync SOP). |

### Gemini CLI

| Field | Value |
|-------|-------|
| ID | `gemini-cli` |
| Docs | https://github.com/google-gemini/gemini-cli/tree/main/docs |
| Release notes | https://github.com/google-gemini/gemini-cli/releases |
| Config paths | `GEMINI.md`, `.gemini/settings.json`, `~/.gemini/settings.json`, `.gemini/commands/*.toml`, `.gemini/skills/*/SKILL.md`, `.agents/skills/*/SKILL.md`, `.gemini/extensions/gemini-extension.json` |
| Our generators | `scripts/generate_gemini.py`, `scripts/generate_gemini_hooks.py` (profile>=standard), `scripts/generate_gemini_commands.py` (profile=full), `scripts/generate_gemini_skills.py` (profile=full) |
| Tracked capabilities | `GEMINI.md`, `mcpServers`, tools, `settings.json`, `BeforeTool`, `AfterTool`, `BeforeAgent`, `AfterAgent`, `BeforeModel`, `SessionStart`, `SessionEnd`, `Stop`, `SKILL.md`, `activate_skill`, custom commands, `gemini-extension.json` |
| Version probe | `gemini --version` |
| Latest upstream | v0.39.0 (2026-04-23) |

### Cline

| Field | Value |
|-------|-------|
| ID | `cline` |
| Docs | https://docs.cline.bot |
| Release notes | https://github.com/cline/cline/releases |
| Config paths | `.clinerules/*.md` (compat), `.clinerules/workflows/*.md` (compat workflows), `.cline/rules/*.md`, `.cline/hooks/`, `.cline/skills/*/SKILL.md`, `~/.cline/rules/*.md`, `~/.cline/hooks/`, `~/.cline/skills/*/SKILL.md`, `~/.cline/data/settings/cline_mcp_settings.json` |
| Our generators | `scripts/generate_cline.py`, `scripts/generate_cline_rules.py`, `scripts/generate_cline_skills.py` |
| Tracked capabilities | `clinerules`, Plan Mode, Act Mode, MCP, custom modes, workflows, hooks, skills, subagents, conditional rules |
| Notes | Conditional rules (`paths:` YAML frontmatter) are emitted for testing and language-specific rules since 2026-04. Project rules still use `.clinerules/` for compatibility; the documented `~/.cline/rules/` path is used for global install. Skills are emitted as a pointer catalogue in `profile=full` and global installs. |
| Global install | `ai-toolkit install --editors cline` writes documented global rules under `~/.cline/rules/` and a skill pointer under `~/.cline/skills/`; MCP remains managed by `ai-toolkit mcp install --editor cline`. |

### Roo Code

| Field | Value |
|-------|-------|
| ID | `roo-code` |
| Docs | https://docs.roocode.com |
| Release notes | https://github.com/RooCodeInc/Roo-Code/releases |
| Config paths | `.roomodes`, `.roo/rules/*.md`, `.roo/rules-{slug}/*.md`, `.roo/mcp.json`, `~/.roo/rules/`, `~/.roo/custom_modes.yaml`, `mcp_settings.json` (global via Roo settings UI) |
| Our generators | `scripts/generate_roo_modes.py`, `scripts/generate_roo_rules.py` |
| Tracked capabilities | `roomodes`, custom modes, Code Actions, MCP, Orchestrator mode, `whenToUse`, `description`, `roleDefinition`, `groups` |
| Notes | `.roomodes` now includes `description` and `whenToUse` for every mode (since 2026-04). YAML `.roomodes` is upstream-preferred but not yet emitted — JSON is still accepted by Roo. Global install writes only `~/.roo/rules/` because the exact global MCP settings path is UI-managed. |

### Aider

| Field | Value |
|-------|-------|
| ID | `aider` |
| Docs | https://aider.chat/docs |
| Changelog | https://aider.chat/HISTORY.html |
| Config paths | `.aider.conf.yml`, `CONVENTIONS.md`, `~/.aider.conf.yml` |
| Our generators | `scripts/generate_aider_conf.py`, `scripts/generate_conventions.py` |
| Tracked capabilities | `.aider.conf.yml`, `CONVENTIONS.md`, `architect`, `auto-accept-architect`, `read`, `lint-cmd`, `test-cmd`, `commit-prompt`, `attribute-co-authored-by`, `chat-language`, `commit-language`, `watch-files`, `auto-commits` |
| Global install | `ai-toolkit install --editors aider` creates `~/.aider.conf.yml` only when absent and always refreshes `~/.aider-ai-toolkit-CONVENTIONS.md`; existing YAML is preserved. |
| Version probe | `aider --version` |
| Latest upstream | v0.86.1 (Aug 2025) |

### Augment

| Field | Value |
|-------|-------|
| ID | `augment` |
| Docs | https://docs.augmentcode.com |
| Changelog | https://www.augmentcode.com/changelog |
| Config paths | `.augment/rules/*.md`, `.augment/guidelines.md` (legacy), `.augment/agents/*.md`, `.augment/commands/*.md`, `.augment/skills/*/SKILL.md`, `~/.augment/rules/*.md`, `~/.augment/settings.json`, `/etc/augment/settings.json` |
| Our generators | `scripts/generate_augment.py`, `scripts/generate_augment_rules.py`, `scripts/generate_augment_agents.py` (profile=full), `scripts/generate_augment_commands.py` (profile=full), `scripts/generate_augment_hooks.py` (profile=full, HOME-scoped), `scripts/generate_augment_skills.py` (profile=full) |
| Tracked capabilities | `.augment`, Agent mode, Next Edit, MCP, context engine, Auggie CLI, `always_apply`, `agent_requested`, subagents, custom commands, `SKILL.md`, `PreToolUse`, `PostToolUse`, `SessionStart`, `SessionEnd`, `Stop`, ACP Mode |
| SPA caveat | Mintlify Next.js SPA; use `https://docs.augmentcode.com/<path>.md` siblings (discoverable via `/llms.txt`) for machine reads |

### Google Antigravity

| Field | Value |
|-------|-------|
| ID | `google-antigravity` |
| Docs | https://antigravity.google/docs (JavaScript SPA — use bundle strings / sitemap to verify) |
| Changelog | https://antigravity.google/changelog (SPA; changelog entries embedded in main-*.js) |
| Config paths | `.agent/rules/*.md`, `.agent/workflows/*.md`, `.agent/skills/*/SKILL.md`, `AGENTS.md`, `GEMINI.md` |
| Our generators | `scripts/generate_antigravity.py` (rules + workflows + skill pointer) |
| Tracked capabilities | Antigravity, agent manager, artifacts, MCP, workflows, rules, skills, `AGENTS.md`, `GEMINI.md`, agent permissions |
| Doc access note | Docs are JS-SPA — verify via `main-*.js` bundle strings or community skill repos. `WebFetch` returns an empty shell. |

### Codex CLI

| Field | Value |
|-------|-------|
| ID | `codex-cli` |
| Docs | https://github.com/openai/codex (redirects from developers.openai.com/codex) |
| Release notes | https://github.com/openai/codex/releases |
| Config paths | `AGENTS.md`, `.agents/rules/*.md`, `.agents/skills/*/SKILL.md`, `.codex/hooks.json`, `~/.codex/config.toml` |
| Our generators | `scripts/generate_codex.py`, `scripts/generate_codex_rules.py`, `scripts/generate_codex_hooks.py`, `scripts/generate_codex_skills.py` (opt-in via `--codex-skills`) |
| Tracked hook events | Upstream canonical (codex-rs `HookEventName` enum): `PreToolUse`, `PostToolUse`, `PermissionRequest`, `PreCompact`, `PostCompact`, `SessionStart`, `UserPromptSubmit`, `SubagentStart`, `SubagentStop`, `Stop` (10 events). We currently wire a subset: `SessionStart`, `PreToolUse`, `PermissionRequest`, `UserPromptSubmit`, `Stop`. |
| Tracked handler types | `command` (emitted by default); `prompt` and `agent` available upstream but authored by hand |
| Other capabilities | `AGENTS.md`, `config.toml`, `mcp_servers`, sandbox policies, `.agents/skills/*/SKILL.md` (native Codex skill discovery path) |
| Version probe | `codex --version` |

### opencode

| Field | Value |
|-------|-------|
| ID | `opencode` |
| Docs | https://opencode.ai/docs |
| Release notes | https://github.com/sst/opencode/releases |
| Config paths | `opencode.json`, `.opencode/agents/*.md`, `.opencode/commands/*.md`, `.opencode/plugins/*`, `.opencode/skills/*/SKILL.md` (v1.14+), `AGENTS.md`, `.claude/skills/*/SKILL.md` (fallback discovery) |
| Our generators | `scripts/generate_opencode.py`, `scripts/generate_opencode_agents.py`, `scripts/generate_opencode_commands.py`, `scripts/generate_opencode_json.py`, `scripts/generate_opencode_plugin.py` |
| Tracked plugin events | `session.created`, `session.compacted`, `session.deleted`, `message.updated`, `tool.execute.before`, `tool.execute.after`, `permission.asked`, `command.executed` |
| Other capabilities | `opencode.json` config, primary + subagent modes, `@`-mention subagents, `/`-invocation commands, MCP (local + remote), plugin hooks in JS/TS, native `SKILL.md` discovery with Claude-compatible fallback, `permission.skill.*` matrix |
| Version probe | `opencode --version` |

---

## How the Registry is Consumed

```
┌─────────────────────────┐
│  ecosystem_tools.json   │  ← authoritative config (this doc mirrors it)
└──────────┬──────────────┘
           │ read
           ▼
┌─────────────────────────┐     ┌─────────────────────────────────┐
│  ecosystem_doctor.py    │◄───►│  ecosystem-doctor-snapshot.json │  (last-seen state)
└──────────┬──────────────┘     └─────────────────────────────────┘
           │ emits
           ▼
    Drift report  (JSON or text)  →  human review  →  generator updates  →  commit
```

---

## Adding a New Tool

1. Append an entry to `scripts/ecosystem_tools.json` with all required fields (schema: `schema_version: 1`).
2. Add a generator under `scripts/generate_<tool>_*.py` (or link to an existing one).
3. Update this registry doc with a new section matching the format above.
4. Baseline the snapshot: `python3 scripts/ecosystem_doctor.py --update --tool <id>`.
5. Run the doctor again to confirm clean state: `python3 scripts/ecosystem_doctor.py --tool <id> --format text`.
6. Update tool count at the top of this document.

---

## Removing a Tool

1. Delete the tool's entry from `scripts/ecosystem_tools.json`.
2. Delete its section from this document.
3. Delete its snapshot entry from `benchmarks/ecosystem-doctor-snapshot.json` (or let the next `--update` prune it — currently not pruned automatically).
4. Decide whether to keep the generator (`scripts/generate_<tool>_*.py`) for backwards compatibility or delete it.
5. Remove references from `README.md`, `manifest.json` `description` field, and `kb/procedures/maintenance-sop.md` `Supported editors` line.

---

## Related

- [Ecosystem Sync SOP](../procedures/ecosystem-sync-sop.md) — how to use the doctor
- [MCP Editor Compatibility](./mcp-editor-compatibility.md) — MCP-specific subset
- `scripts/ecosystem_tools.json` — source of truth
- `scripts/ecosystem_doctor.py` — drift detector
- `benchmarks/ecosystem-doctor-snapshot.json` — last-seen state

---

## kb/reference/sync.md

---
title: "AI Toolkit - Config Sync"
category: reference
service: ai-toolkit
tags: [sync, gist, portability, config, backup]
version: "1.0.0"
created: "2026-03-29"
last_updated: "2026-03-29"
description: "Sync ai-toolkit config to/from GitHub Gist for cross-machine portability."
---

# Config Sync

## Overview

`ai-toolkit sync` exports and imports your toolkit configuration (rules, stats) via GitHub Gist or local files. Zero infrastructure — uses `gh` CLI for Gist operations.

## Commands

```bash
ai-toolkit sync --export              # JSON snapshot to stdout
ai-toolkit sync --push                # Create/update secret Gist
ai-toolkit sync --pull [gist-id]      # Pull from Gist and apply
ai-toolkit sync --import <file|url>   # Import from file or URL
```

## What Gets Synced

| Data | Included | Source |
|------|----------|--------|
| Custom rules | Yes | `~/.softspark/ai-toolkit/rules/*.md` |
| Usage stats | Yes | `~/.softspark/ai-toolkit/stats.json` |
| Toolkit version | Yes (metadata) | `package.json` |
| Agents/skills | No | Installed via `npm` |
| Hooks | No | Installed via `ai-toolkit install` |

## Workflow

### First machine (export)
```bash
ai-toolkit sync --push
# Creates secret Gist, saves ID to ~/.softspark/ai-toolkit/.gist-id
```

### Second machine (import)
```bash
ai-toolkit sync --pull abc123def456   # Use gist ID from first push
# Subsequent pulls: ai-toolkit sync --pull  (uses saved ID)
```

## Requirements

- `--export` / `--import`: No external dependencies
- `--push` / `--pull`: Requires [gh CLI](https://cli.github.com) + `gh auth login`

## JSON Schema

```json
{
  "schema_version": 1,
  "exported_at": "2026-03-29T14:00:00+00:00",
  "toolkit_version": "1.0.0",
  "rules": {
    "rule-name": "# Rule content..."
  },
  "stats": {
    "commit": { "count": 42, "last_used": "..." }
  }
}
```

## Security

- Gists are created as **secret** (not discoverable, but accessible via URL)
- Rules may contain project-specific instructions — review before sharing
- No credentials or tokens are stored in the snapshot

---

## kb/reference/unique-features.md

---
title: "Unique Features & Differentiators"
category: reference
service: ai-toolkit
tags: [features, differentiators, constitution, hooks, security, tdd, memory]
created: "2026-04-13"
last_updated: "2026-05-12"
description: "Detailed description of ai-toolkit's unique features: constitution enforcement, hooks system, security scanning, effort budgeting, quality gates, and more."
---

# Unique Features & Differentiators

## 1. Machine-Enforced Constitution

Unlike other toolkits that put safety rules in documentation only, ai-toolkit enforces a 6-article constitution via hooks. The hooks actually **block** execution of:
- Mass deletion (`rm -rf`, `DROP TABLE`)
- Blind overwrites of uncommitted work
- Any action that could cause irreversible data loss

## 2. Hooks as Executable Scripts

Hook logic lives in `app/hooks/*.sh` — not inline JSON one-liners. Scripts are copied to `~/.softspark/ai-toolkit/hooks/` on install and referenced from `~/.claude/settings.json`. Easy to read, debug, and extend.

**14 lifecycle events / 28 global hook entries:**

| Event | Script | Action |
|-------|--------|--------|
| SessionStart | `session-start.sh` | MANDATORY rules reminder + session context + instincts + reset session-edit state |
| SessionStart | `mcp-health.sh` | Check MCP server command availability (non-blocking warning) |
| SessionStart | `session-context.sh` | Capture environment snapshot to `~/.softspark/ai-toolkit/sessions/current-context.json` |
| Notification | `notify-waiting.sh` | Cross-platform desktop notification |
| PreToolUse | `guard-destructive.sh` | Block `rm -rf`, `DROP TABLE`, etc. |
| PreToolUse | `guard-path.sh` | Block wrong-user path hallucination |
| PreToolUse | `guard-config.sh` | Block edits to linter/formatter config files unless explicitly requested |
| PreToolUse | `commit-quality.sh` | Advisory validation of git commit messages |
| PreToolUse | `revert-guard.sh` | Block `git checkout/restore/reset --hard/clean` on files edited this session (Art. VI.2) |
| UserPromptSubmit | `user-prompt-submit.sh` | Prompt governance reminder + arm search-first flag only when RAG/Web is available |
| UserPromptSubmit | `track-usage.sh` | Record skill invocations to local stats |
| PostToolUse | `post-tool-use.sh` | Lightweight validation reminders + append edit to session state |
| PostToolUse | `governance-capture.sh` | Log security-sensitive operations to JSONL |
| PostToolUse | `test-cohesion.sh` | Run cohesion-mapped tests after edits; block on failure (Art. VI.3) |
| PostToolUse | `search-tracker.sh` | Clear search-first flag when smart_query/hybrid_search_kb/Web* runs |
| Stop | `quality-check.sh` | Multi-language lint (ruff/tsc/phpstan/dart/go) |
| Stop | `save-session.sh` | Persist session context for cross-session continuity |
| Stop | `quality-gate.sh` | Block final response on lint/type errors + cohesion tests for session edits |
| Stop | `stop-search-check.sh` | Continue conversation if search-first rule was skipped and a provider exists; no-op offline |
| TaskCompleted | `quality-gate.sh` | Block task completion on lint/type errors |
| SubagentStart | `subagent-start.sh` | Narrow-scope reminder for spawned subagents |
| SubagentStop | `subagent-stop.sh` | Completion checklist for subagent handoff |
| PreCompact | `pre-compact.sh` | Smart compaction: prioritized context |
| PreCompact | `pre-compact-save.sh` | Save timestamped context backup |
| SessionEnd | `session-end.sh` | Persist a session-end handoff note |
| InstructionsLoaded | `instructions-audit.sh` | Append every CLAUDE.md / rules load to audit log (which rules actually entered context) |
| ConfigChange | `config-desync-guard.sh` | Warn when `~/.claude/settings.json` drifts from `app/hooks.json` (advisory) |
| TeammateIdle | *(inline)* | Completeness reminder |

**5 skill-scoped hooks:**

| Skill | Hook | Action |
|-------|------|--------|
| `/commit` | Pre | Run linter, block on failure |
| `/test` | Post | Coverage check, report threshold |
| `/deploy` | Post | Health check, rollback if degraded |
| `/migrate` | Pre | Backup verification |
| `/rollback` | Post | State verification |

## 3. Security Scanning

Two complementary security tools:

**`/skill-audit`** — scan skills and agents for code-level risks:

```bash
/skill-audit                              # Interactive (Claude remediation)
python3 scripts/audit_skills.py --ci      # CI mode: exit 1 on HIGH
```

Detects: `eval()`/`exec()`, hardcoded secrets, permission issues, bash risks.

**`/cve-scan`** — scan project dependencies for known CVEs:

```bash
/cve-scan                                 # Auto-detect ecosystems, scan all
python3 app/skills/cve-scan/scripts/cve_scan.py          # Direct invocation
python3 app/skills/cve-scan/scripts/cve_scan.py --json   # Machine-readable
```

Supports: npm, pip, composer, cargo, go, ruby, dart. Uses native audit tools — zero external deps.

**Severity levels:** HIGH (blocks CI), WARN (should fix), INFO (review)

## 4. Effort-Based Model Budgeting

Every skill declares an effort level used for model token budgeting:
- `low` — lint, build, fix (fast, cheap)
- `medium` — debug, analyze, ci
- `high` — review, plan, refactor, docs
- `max` — orchestrate, swarm, workflow

## 5. Multi-Language Quality Gates

The `Stop` hook runs after every response across 5 languages:

| Language | Lint | Type Check |
|----------|------|-----------|
| Python | ruff | mypy --strict |
| TypeScript | ESLint/tsc | tsc --noEmit |
| PHP | phpstan | phpstan |
| Dart | dart analyze | dart analyze |
| Go | go vet | go vet |

## 6. Iron Law Enforcement

Three skills enforce non-negotiable quality gates with anti-rationalization tables:

| Skill | Iron Law | What it prevents |
|-------|----------|-----------------|
| `/tdd` | `NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST` | Code written before test? Delete it. Start over. |
| `debugging-tactics` | `NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST` | 4-phase debugging: root cause → pattern → hypothesis → fix. |
| `verification-before-completion` | `NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE` | Gate: IDENTIFY → RUN → READ → VERIFY → CLAIM. |

Additionally, **15 core skills** include `## Common Rationalizations` tables — domain-specific excuses with rebuttals that prevent agent drift.

## 7. Confidence Scoring & Self-Evaluation (`/review`)

The `/review` skill outputs findings with per-issue confidence scores (1-10) and severity classification (critical/major/minor/nit). After completing a review, an LLM-as-Judge self-evaluation pass checks for blind spots: anchoring bias, assumption vs verification, missing unhappy paths, and calibrates confidence scores.

## 8. Agent Verification Checklists

10 key agents include `## Verification Checklist` — exit criteria that MUST be met before presenting results:

| Agent | Key exit criteria |
|-------|------------------|
| `code-reviewer` | Every finding has file:line + evidence, not just opinion |
| `security-auditor` | Each finding includes proof-of-concept or exploit path |
| `test-engineer` | No empty/placeholder tests, mocks only at boundaries |
| `debugger` | Root cause identified, regression test added |
| `backend-specialist` | Input validation, error format, query optimization |
| `frontend-specialist` | Empty/loading/error states, accessibility, responsive |
| `database-architect` | Migration tested on prod-like volume, rollback tested |
| `performance-optimizer` | Baseline measured, profiler evidence attached |
| `devops-implementer` | Dry run passed, rollback documented, no hardcoded secrets |
| `documenter` | Code examples runnable, no placeholders, valid links |

## 9. Skill Reference Routing

7 core skills include `## Related Skills` sections that suggest logical follow-up skills:

```
/review → found issues? → /debug, /tdd, /cve-scan, /analyze
/debug  → bug fixed?   → /review, /tdd, /workflow incident-response
/plan   → approved?    → /orchestrate, /write-a-prd, /grill-me
```

## 10. Two-Stage Review (`/subagent-development`)

Per-task review pipeline inspired by [obra/superpowers](https://github.com/obra/superpowers):

```
Implementer → Spec Compliance Review → Code Quality Review → Next Task
```

- Implementer reports: `DONE` / `DONE_WITH_CONCERNS` / `NEEDS_CONTEXT` / `BLOCKED`
- Spec reviewer: all requirements met, nothing extra, nothing missing
- Quality reviewer: SOLID, naming, error handling, tests, security

## 11. Ralph Wiggum Loop (`/repeat`)

Autonomous agent loop with safety controls:

```bash
/repeat 5m /test          # run tests every 5 min until all pass
/repeat --iterations 3 /review   # max 3 review passes
```

| Safety Control | Default |
|----------------|---------|
| Max iterations | 5 |
| Circuit breaker | 3 consecutive failures → halt |
| Min interval | 1 minute |
| Exit detection | DONE / COMPLETE / ALL PASS |

## 12. Persistent Memory (`memory-pack` plugin)

SQLite-based session memory (opt-in plugin pack):

| Component | Purpose |
|-----------|---------|
| `observation-capture.sh` | PostToolUse hook — captures tool actions to SQLite |
| `session-summary.sh` | Stop hook — AI-compress session observations |
| `mem-search` skill | FTS5 full-text search across past sessions |
| `<private>` tags | Content between tags stripped before storage |
| Progressive disclosure | Summary (~500 tok) → relevant (~2k tok) → full |

## 13. Persona Presets

4 engineering personas that adjust Claude's communication style per role:

| Persona | Focus | Key Skills |
|---------|-------|------------|
| `backend-lead` | System design, scalability, data integrity | `/workflow backend-feature`, `/tdd` |
| `frontend-lead` | Component architecture, a11y, Core Web Vitals | `/design-an-interface`, `/review` |
| `devops-eng` | IaC, CI/CD, blast radius, rollback safety | `/workflow infrastructure-change`, `/deploy` |
| `junior-dev` | Step-by-step explanations, learning focus | `/explain`, `/explore`, `/debug` |

Persistent via `--persona` at install time, or session-scoped via `/persona` runtime command.

## 14. Visual Brainstorming Companion

Optional browser-based companion for `/write-a-prd` and `/design-an-interface`:
- Ephemeral Node.js HTTP server (auto-kills after 30min idle)
- Dark theme, responsive, zero external dependencies
- Per-question routing: mockups/diagrams → browser, text/conceptual → terminal

## 15. KB Integration Protocol

Agents follow a research-before-action protocol enforced via rules:
1. `smart_query()` or `hybrid_search_kb()` before any technical answer
2. Source citation mandatory (`[PATH: kb/...]`)
3. Strict order: KB → Files → External Docs → General Knowledge

---

## kb/reference/windows-support.md

---
title: "Windows Support"
category: reference
service: ai-toolkit
tags: [windows, wsl, install, dependencies, hooks]
created: "2026-04-24"
last_updated: "2026-04-24"
description: "Windows support model for ai-toolkit: WSL, Git Bash, dependency detection, and hook runtime constraints."
---

# Windows Support

ai-toolkit supports Windows through two practical modes:

1. **WSL recommended** — best compatibility for Bash hooks, POSIX paths, symlinks, and editor configs.
2. **Native Windows with Git Bash** — supported for CLI usage when Bash is available on `PATH`.

## Dependency Detection

`scripts/check_deps.py` now emits install hints for Windows package managers:

| Manager | Command Prefix |
|---------|----------------|
| winget | `winget install` |
| Chocolatey | `choco install -y` |
| Scoop | `scoop install` |

Required dependency package IDs:

| Dependency | winget | Chocolatey | Scoop |
|------------|--------|------------|-------|
| Python 3 | `Python.Python.3` | `python` | `python` |
| Git | `Git.Git` | `git` | `git` |
| Node.js | `OpenJS.NodeJS` | `nodejs` | `nodejs` |

## Hook Runtime

ai-toolkit hooks are Bash scripts. On Windows, use WSL or Git Bash so Claude Code can execute `~/.softspark/ai-toolkit/hooks/*.sh`.

Cross-platform hooks should keep the Bash entrypoint small and delegate complex work to Python or Node when Windows behavior diverges.

## Verification

```bash
ai-toolkit doctor
python3 scripts/check_deps.py
python3 scripts/validate.py
```

The Windows support contract is covered by `tests/test_windows_support.bats`.

---

## kb/troubleshooting/README.md

---
title: "Troubleshooting"
service: ai-toolkit
category: troubleshooting
tags: [troubleshooting, debugging]
last_updated: "2026-03-25"
---

# Troubleshooting

Problem resolution guides. Guides will be added here as they are created.

---

