🚀 Quick Start
Get A3M Router running in under 1 minute. Everything you need — TypeScript SDK, Python SDK, CLI, and OpenAI-compatible proxy.
npm
npm install adaptive-memory-multi-model-router
pip
pip install a3m-router
npx
npx a3m-router serve
curl
curl localhost:8787/v1/chat/completions
⏱ 1 Minute Setup
1. Install
# TypeScript / Node.js
npm install adaptive-memory-multi-model-router
# Or Python
pip install a3m-router
2. Start the Proxy
npx a3m-router serve
# → Proxy running at http://localhost:8787
# → Dashboard at http://localhost:8787/dashboard
3. Make Your First Request
curl http://localhost:8787/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"auto","messages":[{"role":"user","content":"Hello!"}]}'
✅ Done! Your queries are now being intelligently routed to the cheapest capable model across 47+ providers.
📦 Drop-in OpenAI Replacement
Zero code changes. Point your existing OpenAI SDK at A3M's proxy URL.
Python
from openai import OpenAI
# Just change the base_url — everything else stays the same
client = OpenAI(
base_url="http://localhost:8787/v1",
api_key="not-needed"
)
# All your existing code works — A3M routes to cheapest provider
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
TypeScript / Node.js
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'http://localhost:8787/v1',
apiKey: 'not-needed',
});
const response = await client.chat.completions.create({
model: 'auto',
messages: [{ role: 'user', content: 'Hello!' }],
});
console.log(response.choices[0].message.content);
🔨 TypeScript SDK
Route a Query
import { A3MRouter } from 'adaptive-memory-multi-model-router/sdk';
const router = new A3MRouter();
// Route a query — returns model + tier + cost + complexity
const decision = router.route("Review this contract for liability clauses");
// → { model: "anthropic/claude-3.5-sonnet", tier: "premium",
// cost: 0.008, complexity: 0.87, isExpert: true }
// Analyze why it chose that model
const features = router.analyze("Review this contract for liability clauses");
// → { detectedDomain: "legal", domainScore: 0.35, hasCode: false,
// requiresReasoning: true, complexity: 0.87 }
Parallel Ensemble Execution
import { executeEnsemble, recordFeedback } from 'adaptive-memory-multi-model-router/ensemble';
// Run against multiple providers simultaneously
const result = await executeEnsemble(
"Explain how vector databases work",
systemPrompt,
context,
{ nvidia: callNvidia, groq: callGroq, openai: callOpenAI },
{ providers: ['nvidia', 'groq', 'openai'], timeoutMs: 30000 }
);
console.log(`Winner: ${result.winner}`); // → nvidia
console.log(`Score: ${result.scores.nvidia}`); // → 75
Budget Enforcement
import { BudgetManager } from 'adaptive-memory-multi-model-router/billing';
const budgets = new BudgetManager({
monthlyLimit: 500, // $500/month hard cap
alerts: [0.5, 0.8, 1.0],
perTeamLimits: {
'engineering': 200,
'product': 150,
}
});
budgets.getSpendBreakdown();
// → { total: 340.50, byTeam: { engineering: 180, ... }, byProvider: {...} }
🐍 Python SDK
from a3m import A3MRouter
async with A3MRouter() as router:
# Route without executing
decision = await router.route("Write a Python function to sort an array")
print(decision.model, decision.tier, decision.cost)
# → groq/llama-3.3-70b cheap 0.0004
# Execute via OpenAI-compatible chat
response = await router.chat("What is 2+2?", model="auto")
print(response["choices"][0]["message"]["content"])
🖥️ CLI
| Command | Description |
|---|---|
npx a3m-router route "Explain quantum computing" | Route a query to the best model |
npx a3m-router serve --port 8787 | Start the OpenAI-compatible proxy |
npx a3m-router benchmark | Run routing accuracy tests |
npx a3m-router health | Check all provider health status |
npx a3m-router cost | Show cost analytics and budget usage |
npx a3m-router compare "What is AI?" | Compare all providers side-by-side |
# Route a query
$ npx a3m-router route "Design a clinical trial for oncology"
🔀 Routing Decision:
Query: "Design a clinical trial for oncology"
Complexity: 1.00 (premium)
Tier: premium
Route to: openai/gpt-4o ($2.50/1M tokens)
Signals: medical(+0.35) + design(+0.20) + multi-step(+0.15)
# Check costs
$ npx a3m-router cost
💰 Cost Analytics
Total Spend: $127.45 / $500.00 budget
Groq: $42.30 33%
DeepSeek: $51.20 40%
Claude: $28.90 23%
🌐 REST API
# Get routing decision (no LLM call)
curl -s http://localhost:8787/v1/route \
-H "Content-Type: application/json" \
-d '{"query": "Write a Python function"}' | jq .
# Chat completion (OpenAI format)
curl -s http://localhost:8787/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"auto","messages":[{"role":"user","content":"Hello"}]}'
| Method | Endpoint | Description |
|---|---|---|
| POST | /v1/chat/completions | OpenAI-compatible chat (streaming + non-streaming) |
| POST | /v1/completions | OpenAI text completions |
| POST | /v1/route | Routing decision without LLM call |
| GET | /v1/models | List available models with pricing |
| GET | /health | Provider health + cost summary |
| GET | /dashboard | Cost analytics dashboard |
⚙️ Configuration
Router Options
| Option | Type | Default | Description |
|---|---|---|---|
defaultModel | string | — | Fallback model when routing is ambiguous |
maxCostPerQuery | number | — | Max cost per query in USD |
preferSpeedOverQuality | boolean | false | Prefer fast models over higher quality |
providers | string[] | all | Restrict routing to these provider IDs |
Environment Variables
# .env — set your API keys
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-...
GROQ_API_KEY=gsk_...
DEEPSEEK_API_KEY=sk-...
# Optional: restrict to specific providers
A3M_PROVIDERS=groq,openai,deepseek
# Optional: set a monthly budget
A3M_MONTHLY_BUDGET=500
🏫 Next Steps
🔧 Configuration Guide
All configuration options, provider setup, and advanced usage.