🔀 A3M Router

One prompt in. The right model out. An open-source AI gateway that routes every query to the cheapest capable model across 47+ LLM providers.

✅ 99.5% Routing Accuracy 📡 47+ Providers 💰 62% Cost Savings ⚡ Zero ML · 19.5KB MIT License
# Install & run in 10 seconds
npm install adaptive-memory-multi-model-router
npx a3m-router serve
# Proxy running at http://localhost:8787
99.5%
±1 Tier Routing Accuracy
62%
Cost Savings vs Premium
47+
LLM Providers
30%+
Cache Hit Rate
19.5KB
Package Size
<1ms
Routing Decision

🔥 What Makes A3M Different

Everyone does sequential fallback. A3M is the first to do parallel multi-LLM execution with result merging.
Everyone Else A3M Router
try A → fail → try B → fail → try C run A + B + C → score → pick best
Sequential fallback (slow, fragile) Parallel ensemble (fast, robust)
One chance per provider All providers contribute simultaneously
Black-box routing Transparent scoring with winner reasoning

Parallel Ensemble Quality Gain

Metric Single Best Provider A3M Ensemble Gain
Answer quality (1-10) 6.5 8.2 +26%
Specificity (code/nums) 58% 79% +21pp
Hallucination rate 4.2% 1.8% -57%
Multi-step accuracy 72% 91% +19pp

✨ Features

Parallel Ensemble

Run multiple providers simultaneously. Score results on specificity, structure, relevance. Return the best answer with full provenance.

🎯

Intelligent Routing

12 keyword signals across 5 dimensions classify query complexity. Routes to cheapest capable model. 99.5% ±1 tier accuracy.

🧠

Adaptive Memory

Online learning updates model quality scores using exponential moving average. No retraining. Learns which models work for your query types.

💰

Hard Budget Enforcement

Per-user/team budgets with hard caps. Real-time spend dashboard. Alerts at 50%/80%/100%. Per-provider cost breakdown.

🔄

Intelligent Failover

Provider health scoring (latency + error rate). Circuit breaker (3 failures → 60s cooldown). Automatic fallback chain. Chinese provider handling.

💾

Semantic Cache

Embedding-based cache lookup with configurable similarity threshold. 30%+ hit rate. Per-route TTL support.

🛡️

Security Guardrails

17-pattern prompt injection detection. PII redaction. Content filtering. Hallucination checks. Rate limiting.

📊

Cost Analytics

Per-provider cost breakdown. Budget vs actual dashboard. Projected savings. Monthly/yearly reports.

🧰 How It Works

A3M Router combines multi-signal routing, semantic caching, and load balancing to route queries to the cheapest capable model.

SignalKeywords / FeatureMax Score
Domain Detectionlegal, medical, security, finance+0.35
Task Indicatorscode, math, translate, creative+0.25
Query StructureMultiple clauses, length >200, qualifiers+0.20
Action Verb Intensitydesign, analyze, optimize+0.20
Multi-Step Detectionfirst...then...finally+0.15
complexity 0.00 ── 0.19 ── 0.44 ── 1.00
              free   cheap    mid   premium

Real-World Examples

QueryScoreTierRoutes To
"What is 2+2?"0.10freetaste-1 ($0)
"Write a Python sort"0.33cheapllama-3.3-70b ($0.20/M)
"Review contract liability"0.87premiumclaude-3.5-sonnet ($1.50/M)
"Design oncology trial"1.00premiumgpt-4o ($2.50/M)

💰 Cost Savings

Save 62% on API costs. A3M routes ~50% of queries to free tier, ~35% to cheap tier.
Monthly QueriesAll-PremiumA3M RouterYou SaveAnnualized
10K$34$12$22 (65%)$261
100K$341$124$217 (64%)$2,604
1M$3,411$1,236$2,175 (64%)$26,100

💡 Comparison

Feature A3M Router LiteLLM OpenRouter Portkey
Parallel ensemble
Confidence scoring
Routing accuracy99.5% ±1ManualManualManual
Self-hosted
Semantic cache
Budget enforcement
Circuit breaker
Package size19.5 KB~50 MBAPI-only~30 MB
TypeScript SDK
CLI
LicenseMITApache 2.0CustomMIT

🚀 Quick Start

TypeScript / Node.js

// Install
npm install adaptive-memory-multi-model-router

// Use the SDK
import { A3MRouter } from 'adaptive-memory-multi-model-router/sdk';

const router = new A3MRouter();
const decision = router.route("Write a Python function to sort an array");
console.log(decision.model, decision.tier, decision.cost);
// → groq/llama-3.3-70b cheap 0.0004

Python SDK

# Install
pip install a3m-router

# Use it
from a3m import A3MRouter

async with A3MRouter() as router:
    decision = await router.route("Write a Python function to sort an array")
    print(decision.model, decision.tier, decision.cost)
    # → groq/llama-3.3-70b cheap 0.0004

OpenAI-Compatible Proxy

# Start the proxy
npx a3m-router serve
# → Proxy running at http://localhost:8787

# Works with ANY OpenAI SDK
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8787/v1", api_key="not-needed")
response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Hello!"}]
)

📖 Full Quick Start Guide