📖 API Reference
Complete reference for the A3M Router — TypeScript SDK, Python SDK, CLI, REST API, and integrations.
On this page
TypeScript SDK
Installation
npm install adaptive-memory-multi-model-router
Import
import { A3MRouter } from 'adaptive-memory-multi-model-router/sdk';
Constructor
const router = new A3MRouter(config?: A3MRouterConfig);
| Option | Type | Default | Description |
|---|---|---|---|
defaultModel | string | — | Fallback model when routing is ambiguous |
maxCostPerQuery | number | — | Max cost per query in USD |
preferSpeedOverQuality | boolean | false | Prefer fast models over higher quality |
providers | string[] | all | Restrict routing to these provider IDs |
Methods
route(query: string): RoutingResult
Route a query to the best available model. Returns the decision without executing the query.
const decision = router.route("Write a Python function to sort a list");
console.log(decision.model); // "groq/llama-3.3-70b-versatile"
console.log(decision.tier); // "cheap"
console.log(decision.cost); // 0.00035
console.log(decision.complexity); // 0.35
| Field | Type | Description |
|---|---|---|
model | string | Selected model identifier (e.g. groq/llama-3.3-70b-versatile) |
tier | 'free' | 'cheap' | 'mid' | 'premium' | Cost tier |
cost | number | Estimated cost in USD |
complexity | number | Query complexity score (0.0–1.0) |
reasoning | string | Human-readable routing reason |
fallbackModels | string[] | Alternative models in priority order |
isFree | boolean | Whether the selected model is free |
isExpert | boolean | Whether this is an expert-level query |
routeBatch(queries: string[]): RoutingResult[]
Route multiple queries at once.
const decisions = router.routeBatch([
"What is 2+2?",
"Design a distributed database",
"Translate to French"
]);
recommend(task: string): RoutingResult
Get model recommendation for a task category.
const rec = router.recommend("code generation");
console.log(rec.model); // "deepseek/deepseek-chat"
analyze(query: string): QueryFeatures
Extract detailed features from a query for debugging.
const features = router.analyze("Design a secure authentication system");
console.log(features.complexity); // 0.72
console.log(features.detected_domain); // "security"
console.log(features.domain_score); // 0.30
| Field | Type | Description |
|---|---|---|
complexity | number | Overall complexity score (0.0–1.0) |
length | number | Word count |
has_code | boolean | Code-related keywords detected |
has_math | boolean | Math-related keywords detected |
is_multilingual | boolean | Non-ASCII characters detected |
is_translation | boolean | Translation request detected |
is_creative | boolean | Creative writing request |
requires_reasoning | boolean | Analytical/reasoning verbs detected |
is_security | boolean | Security domain keywords |
is_devops | boolean | DevOps/infrastructure keywords |
is_data | boolean | Data/ML keywords |
detected_domain | string | Best matching domain |
domain_score | number | Domain match confidence |
serve(port?: number): Promise<string>
Start the OpenAI-compatible proxy server.
const proxyURL = await router.serve(8787);
console.log(proxyURL); // "http://localhost:8787/v1"
// Now use with any OpenAI SDK
import OpenAI from 'openai';
const client = new OpenAI({ baseURL: proxyURL, apiKey: 'not-needed' });
Ensemble Module
import { executeEnsemble, recordFeedback } from 'adaptive-memory-multi-model-router/ensemble';
// Run multiple providers in parallel
const result = await executeEnsemble(
"Explain vector databases",
systemPrompt, context,
{ nvidia: callNvidia, groq: callGroq },
{ providers: ['nvidia', 'groq'], timeoutMs: 30000 }
);
console.log(`Winner: ${result.winner}`); // "nvidia"
console.log(`Reasoning: ${result.reasoning}`);
// Track historical accuracy
let history = {};
history = recordFeedback('nvidia', true, history);
history = recordFeedback('groq', false, history);
// → { nvidia: { good: 1, bad: 0 }, groq: { good: 0, bad: 1 } }
Budget Module
import { BudgetManager } from 'adaptive-memory-multi-model-router/billing';
const budgets = new BudgetManager({
monthlyLimit: 500,
alerts: [0.5, 0.8, 1.0],
perTeamLimits: { 'engineering': 200 }
});
budgets.onAlert((alert) => {
console.log(`${alert.type}: ${alert.team} at ${alert.percentage}%`);
});
Failover Module
import { HealthScoreManager, CircuitBreaker } from 'adaptive-memory-multi-model-router/failover';
// Provider health scoring
const health = new HealthScoreManager({
latencyWeight: 0.6, errorRateWeight: 0.4
});
health.getScore('groq'); // → 0.85
// Circuit breaker
const cb = new CircuitBreaker({
failureThreshold: 3, cooldownMs: 60000,
fallbackChain: ['groq', 'deepseek', 'openai']
});
Semantic Cache
import { SemanticCache } from 'adaptive-memory-multi-model-router/cache';
const cache = new SemanticCache({
maxSize: 1000,
similarityThreshold: 0.92,
ttl: 3600000,
perRouteTTL: { 'legal/*': 86400000 }
});
cache.getStats(); // { hits: 1, misses: 1, hitRate: 0.5 }
Query-Type Presets
import { createPresetRouter, DEFAULT_PRESETS } from 'adaptive-memory-multi-model-router/presets';
const router = createPresetRouter();
const preset = router.classify("Write a Python function");
// → 'code'
preset.provider; // 'nvidia'
preset.temperature; // 0.2
preset.ensemble; // true
Episodic Memory
import { EpisodicMemoryStore } from 'adaptive-memory-multi-model-router/memory';
const memory = new EpisodicMemoryStore(1000, './.a3m-memory.json');
memory.storeEntry({
task: { description: "Build a REST API", type: "code", complexity: 0.7 },
result: { success: true, duration_ms: 45000 },
agent: { id: "codex", model: "gpt-4o" }
});
memory.getStats(); // { total_entries: 142, success_rate: 0.94 }
🐍 Python SDK
Installation
pip install a3m-router
Usage
from a3m import A3MRouter
async with A3MRouter() as router:
# Route without executing
decision = await router.route("Write a Python function to sort an array")
print(decision.model, decision.tier, decision.cost)
# Execute via OpenAI-compatible chat
response = await router.chat("What is 2+2?", model="auto")
print(response["choices"][0]["message"]["content"])
# Batch routing
decisions = await router.route_batch([
"Hello",
"Write code",
"Medical advice"
])
🖥️ CLI
All CLI commands are accessible via npx a3m-router.
| Command | Description |
|---|---|
route <query> | Route a query to the best model |
serve [--port PORT] | Start OpenAI-compatible proxy server |
benchmark | Run routing accuracy tests |
health | Check all provider health status |
cost | Show cost analytics and budget usage |
compare <query> | Compare all providers side-by-side |
models | List available models with pricing |
# Start the proxy
$ npx a3m-router serve
┌──────────────────────────────────────────────────────┐
│ A3M Router v2.9.2 │
│ 🔀 Intelligent LLM Gateway │
├──────────────────────────────────────────────────────└
│ ✅ Proxy: http://localhost:8787 │
│ ✅ Dashboard: http://localhost:8787/dashboard │
│ ✅ Health: http://localhost:8787/health │
└──────────────────────────────────────────────────────┘
[GROQ] ✅ 145ms | [DEEPSEEK] ✅ 230ms | [KIMI] ✅ 312ms
🧠 Memory: 1,247 queries cached | 💰 Today: $2.34 / $50.00 budget
🌐 REST API
| Method | Endpoint | Description |
|---|---|---|
| POST | /v1/chat/completions | OpenAI-compatible chat (streaming + non-streaming) |
| POST | /v1/completions | OpenAI text completions |
| POST | /v1/route | Routing decision without LLM call |
| GET | /v1/models | List available models with pricing |
| GET | /health | Provider health + cost summary |
| GET | /dashboard | Cost analytics dashboard |
POST /v1/route
Get a routing decision without making an LLM call.
curl -s http://localhost:8787/v1/route \
-H "Content-Type: application/json" \
-d '{"query": "Design a clinical trial"}' | jq .
{
"model": "openai/gpt-4o",
"tier": "premium",
"cost": 0.0025,
"complexity": 1.0,
"reasoning": "Medical domain (+0.35) + design verb (+0.20) + multi-step (+0.15)"
}
POST /v1/chat/completions
Standard OpenAI-compatible chat completions endpoint. Pass model: "auto" for intelligent routing.
curl -s http://localhost:8787/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [{"role": "user", "content": "Explain quantum computing"}],
"stream": true
}'
GET /v1/models
List all available models with pricing information.
curl -s http://localhost:8787/v1/models | jq .
GET /health
Check provider health status and cost summary.
curl -s http://localhost:8787/health | jq .
🤝 OpenAI SDK Compatibility
A3M Router is a fully compatible OpenAI API proxy. Point any OpenAI SDK at it with model: "auto".
# Python
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8787/v1", api_key="not-needed")
response = client.chat.completions.create(model="auto", messages=[...])
// TypeScript
import OpenAI from 'openai';
const client = new OpenAI({ baseURL: 'http://localhost:8787/v1', apiKey: 'not-needed' });
Compatible with: LangChain, Vercel AI SDK, LlamaIndex, any OpenAI SDK.
⚙️ Configuration
Environment Variables
| Variable | Description |
|---|---|
OPENAI_API_KEY | OpenAI API key |
ANTHROPIC_API_KEY | Anthropic API key |
GROQ_API_KEY | Groq API key |
DEEPSEEK_API_KEY | DeepSeek API key |
NVIDIA_API_KEY | NVIDIA API key |
GOOGLE_API_KEY | Google Gemini API key |
MISTRAL_API_KEY | Mistral API key |
COHERE_API_KEY | Cohere API key |
A3M_PROVIDERS | Restrict to specific providers (comma-separated) |
A3M_MONTHLY_BUDGET | Monthly budget in USD |
A3M_PORT | Proxy server port (default: 8787) |
⚠️ Error Handling
Retry Configuration
import { RetryManager } from 'adaptive-memory-multi-model-router/retry';
const retry = new RetryManager({
providers: {
'openai': { timeout: 30000, maxRetries: 3, baseDelay: 1000 },
'anthropic': { timeout: 45000, maxRetries: 3, baseDelay: 1000 },
'groq': { timeout: 15000, maxRetries: 2, baseDelay: 500 },
},
backoffMultiplier: 2,
jitter: 0.3,
rateLimitHandling: 'retry-after',
});
| Option | Type | Default | Description |
|---|---|---|---|
timeout | number | 30000 | Request timeout in ms |
maxRetries | number | 3 | Maximum retry attempts |
baseDelay | number | 1000 | Initial backoff delay in ms |
backoffMultiplier | number | 2 | Exponential backoff multiplier |
jitter | number | 0.3 | Jitter factor (±30%) |
For the full configuration reference, see the Configuration Guide on GitHub.
📦 Package Exports
| Import Path | Description |
|---|---|
adaptive-memory-multi-model-router | Main package — everything |
adaptive-memory-multi-model-router/sdk | Clean high-level API (A3MRouter class) |
adaptive-memory-multi-model-router/cache | Semantic cache module |
adaptive-memory-multi-model-router/guardrails | Security guardrails (injection, PII) |
adaptive-memory-multi-model-router/cost | Cost analytics and tracking |
adaptive-memory-multi-model-router/analytics | Usage analytics |
adaptive-memory-multi-model-router/memory | Episodic memory store |
adaptive-memory-multi-model-router/langchain | LangChain integration |
adaptive-memory-multi-model-router/providers | Provider definitions and pricing |
adaptive-memory-multi-model-router/server | HTTP server |
adaptive-memory-multi-model-router/ensemble | Parallel ensemble execution (P0) |
adaptive-memory-multi-model-router/presets | Query-type presets (P1) |
adaptive-memory-multi-model-router/billing | Budget enforcement |
adaptive-memory-multi-model-router/failover | Circuit breaker + health scoring |
adaptive-memory-multi-model-router/retry | Per-provider retry logic |