madarX Vision · CTO Brief
madarX OS Vision · v0.1
2026-05-19
CTO Brief · v0.1 · Executive Review

An orchestrator for agents, not another chatbot.

madarX OS is a local-first agent runtime and control plane. A main orchestrator delegates to specialist sub-agents over structured tool calls, with capability-scoped policy gates, durable approval queues, and an append-only audit chain enforced at the storage layer. Foundation-model-agnostic (Anthropic, OpenAI, Ollama, Hermes, Claude Code CLI). MCP-compatible adapters. Designed for ADGM-grade regulated deployment — on the customer's hardware, behind their KMS, with zero outbound egress unless explicitly authorized.

§ 01 · Why now
Market · moment · posture

The market is moving from chat to control. We sell the control plane.

Position

The current generation of AI products optimizes for single-turn assistance: a user issues a prompt, receives an answer, then manually integrates the output into a downstream workflow. Enterprise and regulated use cases need three properties that chat does not provide — persistent agents that execute recurring work, human approval at every external boundary, and a verifiable record of intent, action, and outcome. madarX OS delivers all three on infrastructure the operator controls, with no data leaving the perimeter unless explicitly authorized.

Architecturally the product sits between the foundation-model layer (Anthropic, OpenAI, Ollama, Hermes, Claude Code CLI) and the end-user surfaces (chat, voice via OpenAI Realtime WebRTC, scheduled runs, WhatsApp via Baileys). The differentiation is not the LLM — it is the Agent-Computer Interface (ACI) and the policy plane wrapping it. Four primitives are shipped and load-bearing: (1) a four-tier risk taxonomy (low → medium → high → irreversible_external) where the top tier is never auto-approved — trust-mode auto bypasses high and below but the irreversible_external gate stands, so no LLM can authorize money, code merges, deploys, deletes, or external sends without human-in-the-loop. This is the load-bearing safety claim. (2) Append-only compliance audit chain enforced at the SQLite trigger layer (BEFORE UPDATE / BEFORE DELETE raise) — WORM-style integrity at the storage primitive, not the application layer. (3) Per-tool destination allowlists keyed by scope (recipient_domain, repo_full_name, table) — capability-token shape — composed with sliding-window rate limits where main-agent and per-agent counters are independent. (4) AES-256-GCM secrets vault with KMS-replaceable master-key derivation and a wrapFetch egress recorder logging host / method / status / bytes / duration per outbound call (no payloads — data minimization by construction). Remaining work is productization: artifact taxonomy with hybrid retrieval, multi-tenant teams, A2A federation, and the hosting matrix.

Want the engineering view? Flip the toggle at the top right to Technical. Each section reveals data models, schemas, and protocols.
Differentiator

Local-first by default.

Runs on your laptop, your server, or your data center. The cloud version is a convenience, never a dependency.

Wedge

ADGM, then UAE, then GCC.

Auditable systems map directly to existing regulatory expectations. We enter through the financial free zone that already mandates the controls we built.

§ 02 · Artifact taxonomy
Dashboard · Library · Archive

Every document has a surface, a space, and a story.

Three states

Every artifact produced by an agent — a report, a brief, a draft message, a generated UI — moves through a three-stage lifecycle. Dashboard is the operator's working surface: recent runs, pinned items, the live control view. Library is the searchable corpus, navigable by project, space, author, and tag. Archive is retention storage: out of the agent's default retrieval scope, recoverable on request, never silently deleted. Transitions between stages are one-click metadata moves; the underlying file and citation address remain stable.

surface is a lifecycle bucket driving retrieval scoring: dashboard = hot working set, library = warm searchable corpus, archive = cold. New CHECK-constrained column on artifacts with a partial index (project_id, space_id, created_at DESC) WHERE surface='dashboard' for the live-pane query, and a covering composite on (project_id, space_id, surface, created_at DESC) for cross-surface scans. Surface transitions are metadata-only mutations — disk paths stay stable so citation URIs never break; audit rows record the transition. Pinned items get a pinned_at timestamp and are excluded from auto-eviction. The dashboard cap policy is a heartbeat-task GC driven by a per-project soft/hard cap, not a hot-path eviction.

Surface · Dashboard

Live and visible

New runs land here. The operator reviews them. Pin to keep them in view.

  • Default for every new artifact
  • Sorted by recency
  • Auto-evicts past the cap
Surface · Library

Filed away

Navigable by folder, space, subject, and author. The searchable corpus of completed work.

  • Cleared from dashboard
  • Full-text searchable
  • Folder + tag navigation
Surface · Archive

Cold storage

Out of the way. Out of agent default search. Recoverable any time.

  • Retention policy default
  • Reversible to library
  • Never deleted silently
§ 03 · Projects and Spaces
A second axis of organization

Projects for clients. Spaces for contexts inside them.

Hierarchy

Projects are the top-level scope — one per client, product, or initiative. Inside a project, work splits across distinct contexts: Operations, Marketing, Legal Review, Q3 Audit. Spaces formalize that split as a second axis of organization. Each space carries its own dashboard view, its own assigned agents, its own document scope, its own access list, and its own retrieval boundary. The result is a clean separation of concerns without inflating the project list.

Spaces are retrieval partitions and memory namespaces: each space scopes its own FTS5 query window, its own agent memory store, and its own access list, so cross-context leakage between e.g. Legal Review and Marketing is structurally impossible without an explicit handoff. New spaces table (id, project_id, slug, name, parent_space_id, status, …); the dangling tasks.workspace_id column is repointed to tasks.space_id in a one-shot migration; space_id flows into artifacts, agents, schedules, heartbeat_tasks, and the retrieval tools. Nullable space_id means project root. Schema supports nesting via parent_space_id; UX ships flat in v1 to avoid combinatorial explosion in retrieval scopes (see Decision 5).

§ 04 · Smart referencing
Self-describing artifacts

The cheapest token is the one the agent never reads.

Metadata

When an agent retrieves "the latest pricing study on Veo3", it should resolve the target in a single step. Without structured metadata, the agent opens many candidates and consumes context-window budget before finding the right one — driving cost up and grounding quality down. We attach a self-describing header to every artifact: subject line, agent-written summary, author list, tags, source, and a stable citation address. Retrieval reads the summary, selects the correct artifact, and only then loads the full content.

The retrieval problem at agent scale is token economics: an agent that opens ten files to find one ground-truth burns context-window budget and increases hallucination surface. We solve it with self-describing artifacts. New columns on artifacts: subject, summary (LLM-written at creation, the cite-without-load primitive), source_kind, source_uri, authors_json, tags_json, parent_artifact_id, version, lifecycle_state, retention_until, pinned_at, last_referenced_at. Sibling tables artifact_authors and artifact_links form a provenance DAG — any agent-generated claim is traceable back to its source artifacts. Sparse retrieval ships first: FTS5 virtual table over subject + summary + tags + content_head with tokenize = 'porter unicode61', BM25-ranked. Phase D upgrades to hybrid retrieval by joining FTS5 with dense embeddings (sqlite-vec / Voyage / Cohere embed-v3) via Reciprocal Rank Fusion. Citation grade madarx:// URIs map cleanly onto Anthropic citations and OpenAI structured-output reference fields, so the same URI works inside the model's response, inside the dashboard, and across federation peers.

Column

subject

The canonical topic line. "Veo3 pricing tier comparison — Q2 2026."

Column

summary

Two sentences written by the agent at creation. Surfaces in search without opening the file.

Column

authors_json

Who made it, who reviewed it, who approved it. Agent or human, traceable.

Column

tags_json

Free tags, FTS-indexed. The kind of search a librarian would do.

Column

source_kind

Where it came from. Agent run, user upload, WhatsApp, federation peer.

§ 05 · AI-native folder layout
Deterministic · indexable · cheap to crawl

A folder layout an agent can read in two kilobytes.

Filesystem

Every project, space, and surface uses an identical directory shape. An INDEX.md manifest at each level enumerates the artifacts in scope: title, summary, author, timestamp, citation URI. An agent reads the manifest first, selects the relevant artifacts, and only loads the files it needs — a pattern borrowed from Anthropic's file-search and tree-grounded retrieval work. The same tree is navigable by a human in Finder or VS Code without translation.

Deterministic layout is an ACI design choice: the agent's filesystem read is a single INDEX.md manifest per level, embeddable in the system prompt or fetched on demand by a file-search tool — so the agent locates artifacts in O(depth) reads, not O(n). Helpers in lib/artifact-paths.js compute canonical paths workspace/projects/<slug>/spaces/<space>/<surface>/<file>. lib/artifact-indexer.js rebuilds INDEX.md at project and space level on artifact mutation, debounced 1 s and queued through a heartbeat task so bursty writes don't hot-loop the indexer. scripts/rebuild-indexes.mjs handles cold rebuilds. The pattern resembles Anthropic's file-search and tree-grounded retrieval — citation paths are stable, the directory shape is identical across deployments, and a Claude Code sub-agent or external MCP client can crawl the tree without bespoke instructions.

workspace/
 projects/
 <project-slug>/
 INDEX.md # project-level agent index
 spaces/
 <space-slug>/
 INDEX.md # space-level agent index
 dashboard/ # surface = dashboard
 <artifact-id>--slug.md
 library/ # surface = library
 <yyyy>/<mm>/
 <artifact-id>--slug.md
 archive/ # surface = archive
 <yyyy>/<mm>/
 <artifact-id>--slug.md
 _project-root/ # space_id IS NULL
 dashboard/ library/ archive/
 uploads/ # Disk uploads
 <yyyy>/<mm>/<sha256>.<ext>
§ 06 · Teams and Federation
Multi-user · multi-org · multi-orchestrator

One operator per orchestrator. Orchestrators speak to other orchestrators.

Topology

Inside one organization, a single madarX instance runs one main orchestrator and a team of specialist agents. The orchestrator delegates; the human operator approves the actions that matter. When two organizations collaborate, their orchestrators communicate across a cryptographically signed peering — neither side merges its data with the other. Agents on one side can request artifacts and deliverables from agents on the other. Trust is explicit, scope-bounded, and revocable.

Federation is our A2A (agent-to-agent) protocol with explicit capability scoping. Auth foundations are shipped: organizations, users, org_memberships, teams, team_memberships; agents and spaces gain team_id; approval_requests extends with phase and required_approver_role for multi-stage approval routing. The federation primitive is federation_peers(local_org_id, remote_endpoint_url, remote_org_pubkey, trust_label, scopes_json, status) where scopes_json is a manifest of the tools and project scopes this peer may delegate against — capability-token semantics. Transport is HTTPS POST to /api/federation/delegate with ed25519-signed payloads pinned to the peer pubkey; replay protection via signed nonce + monotonic clock. The delegate_to_peer tool sits at the irreversible_external tier — never auto-fired, always human-approved, even in trust-mode auto. Receivers materialize the brief as a sandboxed heartbeat task scoped to the delegated org; completion posts artifacts back to /api/federation/inbox with source_kind='federation' and a provenance link to the originating peer URI. Peer agents never execute on the local box. The protocol exchanges structured task briefs and artifacts, not code or weights — closing the obvious supply-chain attack vector that ad-hoc A2A schemes leave open.

§ 07 · Hosting matrix
One codebase · three shapes

Sell hosted convenience. Ship the same code for on-prem.

Deployment

Some customers want managed onboarding in minutes. Others must keep every byte inside their own infrastructure for residency or compliance reasons. madarX OS serves both from a single codebase. The deployment shape — managed cloud, customer-owned cloud, or on-premises appliance — is a configuration choice, not a separate product. Features and behaviors are identical across shapes.

One binary, three deployment shapes. Behavior gates on env: MADARX_HOST_MODE, MADARX_AUTH_MODE, MADARX_FEDERATION_ENABLED, MADARX_ARTIFACT_ENCRYPTION. Closed-source slices are additive packages (@madarx/cloud-billing, @madarx/cloud-quotas) with no runtime dependency in the OSS image. BYOK via per-org config (org.supabase_url + service key in customer-controlled KMS — we never see it). SQLite is the default store; Postgres adapter is the shape-C upgrade path for HA. Supply-chain hardening: deterministic Docker builds, CycloneDX SBOM on every release, cosign-signed images, optional in-toto attestations for ADGM-grade procurement. Residency invariants enforced at the auth + storage layer — the runtime never knows which region until env injection at boot, so the same image runs unmodified in UAE, EU, or air-gapped. Distribution: npm create madarx-app interactive installer, docker pull madarx/os:latest, hosted madarX.cloud signup.

Shape Who runs it Where the data lives Best for
A. madarX.cloud madarX Managed Supabase, UAE region · Vercel FRA1/IAD1 Individuals, small teams, ADGM pilots without on-prem mandate.
B. BYO cloud Customer Customer's own Supabase + Vercel account Companies that want managed infrastructure in their region, their account.
C. On-prem appliance Customer ops team Customer's server · local network only ADGM regulated entities, financial services, government, sovereign deployments.
§ 08 · Roadmap
Five phases · each shippable behind a flag

From taxonomy to federation in five releases.

Sequence
A
Data model
Columns, tables, FTS5 index, migrations, backfill. No UI change yet.
B
Folder layout
Path helpers, INDEX.md generator, file relocation migration.
C
Retrieval tools
FTS search tools, URI resolver, archive/move/pin tools. Prompt updates.
D
UI surfaces
Dashboard filter, Library tabs, Space pages, design review.
E
Teams + federation + hosting
Team-scoped agents, signed peering, installer matrix, ADGM doc.
Each phase gated by an eval suite measuring retrieval precision, tool-call accuracy, and approval-bypass attempts. No phase ships without green npm run check, green /qa, and a green eval pass.
§ 09 · Six decisions
Open questions with the CTO recommendation

Six calls to make before code goes in.

Decide

Six decisions materially shape the product. Each card below presents the options, the trade-offs, and the recommended call. Recommendations are the current CTO position; alternative paths remain viable and reversible if subsequent evidence warrants a different choice.

Each decision is framed by three constraints in priority order: (1) minimize attack surface — every default starts in the most restrictive posture (default-deny allowlists, federation off, secrets vault required, irreversible_external never auto-approved); (2) maximize interoperability — MCP-shaped adapters, standardized tool registry, foundation-model-agnostic engines, stable citation URIs; (3) preserve forward optionality — where two options are close, pick the one that does not foreclose the other via a breaking migration. The recommendation column is the call I would make today as CTO; alternatives stay reversible.

01
Which open-source license do we ship under?
Apache-2.0 · MIT · BSL · AGPL

The license decides who can adopt us, who can fork us, and what we can charge for later. Once we pick, we cannot retract — only re-license forward.

A
Apache-2.0 — permissive, OSI-approved, includes an explicit patent grant. Enterprises adopt it without legal review. Lets us run a paid hosted edition on the same code. Recommended.
B
MIT — also permissive, slightly shorter, no patent grant. Fine for small projects; large enterprises sometimes prefer Apache's clearer patent language.
C
BSL (Business Source License) — not OSS by OSI definition. Hashicorp/MariaDB style: source-available with a delayed conversion. Kills enterprise adoption and community trust.
D
AGPL-3.0 — copyleft. Forces hosted forks to open-source their changes. Discourages adoption inside companies with restrictive legal posture.
CTO call
Apache-2.0.

Largest possible adoption surface, explicit patent grant (matters for an AI product), proven path for OSS companies running a paid managed edition (Supabase, Cal.com, n8n's older license). MIT is the runner-up; the upgrade cost to Apache later is small but not free. Avoid BSL and AGPL until we have brand strength to absorb the friction.

02
Do we offer "we host the UI, you host the data"?
Hybrid mode for shape A+ · critical for ADGM residency

Some customers want our hosted UI (frictionless onboarding, no DevOps) but cannot let their data leave their region or their account. A hybrid mode runs the UI on madarX.cloud but talks to a customer-controlled Supabase project for storage and auth.

A
Yes, ship hybrid. Per-org config (org.supabase_url, anon key, service key in customer-controlled KMS). Customer keeps data in their UAE-region Supabase; we never see it. Recommended.
B
No, force the binary choice. Either hosted-with-our-data, or full BYO cloud / on-prem. Simpler engineering, lower legal surface, but loses every ADGM pilot that wants both convenience and residency.
CTO call
Yes — hybrid mode. Phase E item.

This is the single feature that unlocks ADGM-grade customers without forcing them onto on-prem. Engineering cost is low because the auth + data layer already abstracts over Supabase URL/key — we just promote it to a per-org setting. Legal cost is bounded by a clean DPA: their data, their key, our software running in front. The "no" option costs us the highest-margin segment.

03
Is federation on or off by default?
Cross-orchestrator agent-to-agent delegation

Federation lets one organization's main agent delegate a task to another organization's main agent. It is the team-of-teams story. It is also a new attack surface.

A
Off by default, enabled with MADARX_FEDERATION_ENABLED=1. Customers opt in deliberately, configure peers, sign keys, and grant scopes. Recommended.
B
On by default, with zero peers configured. Discoverable but inert. Lower configuration friction, higher chance of misconfigured peering ending up live.
CTO call
Off by default. Feature flag MADARX_FEDERATION_ENABLED.

Security follows the principle of least surprise. A regulator reviewing our on-prem deployment should see zero outbound federation traffic by default. Customers who want it flip one env var and add peers explicitly. This also matches our existing tool-marketplace pattern: every external integration is disabled until the operator chooses it.

04
What is the default auto-archive policy?
How long does library content stay hot?

Artifacts in the library that nobody references eventually pile up. An auto-archive policy moves cold items to the archive surface so agent search stays fast and the library stays browsable.

A
30 days of no reference. Aggressive; keeps the library lean; users may experience disorientation when items leave the active surface unexpectedly.
B
90 days of no reference. The default. Per-project override (30 / 90 / 180 / never). Pinned items never archive. Recommended.
C
180 days. Conservative; library grows large; works for very low-volume customers.
D
Never auto-archive. Pure manual control. Higher index cost over time, more guesswork in agent retrieval.
CTO call
90 days, per-project override.

Matches the quarterly cadence most knowledge work runs on. The archive is one click and one query away, so a wrong move is recoverable. Per-project override means a regulated team (legal, finance) can pick "never" without changing global behavior. Pinned items are immune.

05
Flat spaces or nested sub-spaces?
Information architecture under projects

The schema supports nested spaces (a parent_space_id). The UX question is whether to ship that nesting in v1 or keep spaces flat under each project.

A
Flat in v1, schema-ready for nested. One layer of spaces under each project. Navigation stays simple. We can add nesting later without a migration. Recommended.
B
Nested from day one. Sub-spaces inside spaces. More flexible, but the tree gets complex fast and the URL/breadcrumb design balloons.
CTO call
Flat in v1. Schema keeps the door open.

Most customers will live with project → space → folder. Nested spaces are a power-user feature we add once we see a real workload that needs them. Shipping flat lets us validate the IA before complicating the navigation, the URI scheme, and the access-control inheritance chain.

06
Is there a hard cap on dashboard artifacts per project?
Preventing dashboard pollution as agents scale

The dashboard surface is for live work. If every scheduled agent push lands there forever, the dashboard becomes unusable. A cap auto-evicts oldest non-pinned items to the library so the dashboard stays a control surface, not an inbox graveyard.

A
100 soft, 200 hard. Past 100 the dashboard shows a "you have N items beyond the visible cap" banner. Past 200 oldest non-pinned auto-clear to library. Configurable per project. Recommended.
B
No cap, manual cleanup. Trust the user to clear. Works for disciplined teams; will rot for everyone else.
C
Strict 50. Aggressive surface management. Cleaner dashboard, more user friction during high-volume bursts.
CTO call
100 soft, 200 hard. Pinned never evict.

The dashboard is the command surface, not an archive. The soft cap warns; the hard cap protects. Pinned items are sacred. Per-project override means a team that wants 500 can have it; the default protects everyone else.

§ 10 · Open Source Strategy
Earning market trust by opening the core

Open the core. Sell the convenience. Earn the trust.

Go-to-market

Open source is the trust mechanism for security-sensitive infrastructure. madarX OS releases the orchestrator core under Apache-2.0 with installation paths for laptop, server, and on-prem appliance. The hosted edition is the managed convenience path; the OSS distribution is the sovereign path. Competitive differentiation is built on agent quality, depth of integrations, and the auditability of the governance plane — not on vendor lock-in.

The OSS surface is the load-bearing trust artifact: runtime, SQLite schema, agent registry, marketplace domain layer, tool adapters with their allowlists and rate limits, the A2A federation protocol, and the installer. Adapters expose an MCP-compatible tool shape so external MCP clients (Claude Desktop, IDE agents, custom harnesses) can consume the same registry. Engine layer is foundation-model-agnostic — Anthropic, OpenAI, Ollama, Hermes, Claude Code CLI all plug in behind a unified runAgentTick contract; switching engines is a per-agent config change, not a rewrite. Closed slices are optional add-ons (billing, quotas, telemetry aggregation, premium connectors). Repos: madarx-os (core, Apache-2.0), create-madarx-app (installer), madarx-docs (Diataxis structure on Vercel), madarx-cloud (private — billing, quotas, multi-tenant control plane).

GitHub launch

Public Apache-2.0 repository, documented README, working npm create madarx-app installer, Show HN and launch post on day one.

Docs site

Diataxis structure: tutorials for first-time install, how-tos per integration, reference for every tool and table, explanations of the trust model.

Community

Discord for builders, monthly office hours, public roadmap on GitHub Projects, RFCs for every breaking change.

Regulator wedge

ADGM partnership: position madarX OS as the reference architecture for AI agent governance in regulated UAE entities. Free for non-commercial regulator pilots.

Hosted edition

madarX.cloud: hosted UI, hybrid mode for BYO data, free tier for individuals, team and enterprise pricing for collaboration features.

Appliance image

Docker image and signed installer for on-prem. One-page hardening guide. Compliance-grade boot checks shipped (npm run check:disk-enc).

Marketplace ecosystem

The existing tool + skill + bundle marketplace ships open so third parties publish adapters. Open the registry. Gate publishing behind review.

Customer wins as proof

Publish one detailed ADGM-regulated deployment case study per quarter. Real names, real numbers, real audit excerpts.

Source · specs/active/artifact-organization-and-team-orchestration.md