📖 API Reference

Complete reference for the A3M Router — TypeScript SDK, Python SDK, CLI, REST API, and integrations.

TypeScript SDK

Installation

npm install adaptive-memory-multi-model-router

Import

import { A3MRouter } from 'adaptive-memory-multi-model-router/sdk';

Constructor

const router = new A3MRouter(config?: A3MRouterConfig);
OptionTypeDefaultDescription
defaultModelstringFallback model when routing is ambiguous
maxCostPerQuerynumberMax cost per query in USD
preferSpeedOverQualitybooleanfalsePrefer fast models over higher quality
providersstring[]allRestrict routing to these provider IDs

Methods

route(query: string): RoutingResult

Route a query to the best available model. Returns the decision without executing the query.

const decision = router.route("Write a Python function to sort a list");
console.log(decision.model);     // "groq/llama-3.3-70b-versatile"
console.log(decision.tier);      // "cheap"
console.log(decision.cost);      // 0.00035
console.log(decision.complexity); // 0.35
FieldTypeDescription
modelstringSelected model identifier (e.g. groq/llama-3.3-70b-versatile)
tier'free' | 'cheap' | 'mid' | 'premium'Cost tier
costnumberEstimated cost in USD
complexitynumberQuery complexity score (0.0–1.0)
reasoningstringHuman-readable routing reason
fallbackModelsstring[]Alternative models in priority order
isFreebooleanWhether the selected model is free
isExpertbooleanWhether this is an expert-level query

routeBatch(queries: string[]): RoutingResult[]

Route multiple queries at once.

const decisions = router.routeBatch([
    "What is 2+2?",
    "Design a distributed database",
    "Translate to French"
]);

recommend(task: string): RoutingResult

Get model recommendation for a task category.

const rec = router.recommend("code generation");
console.log(rec.model); // "deepseek/deepseek-chat"

analyze(query: string): QueryFeatures

Extract detailed features from a query for debugging.

const features = router.analyze("Design a secure authentication system");
console.log(features.complexity);       // 0.72
console.log(features.detected_domain);  // "security"
console.log(features.domain_score);     // 0.30
FieldTypeDescription
complexitynumberOverall complexity score (0.0–1.0)
lengthnumberWord count
has_codebooleanCode-related keywords detected
has_mathbooleanMath-related keywords detected
is_multilingualbooleanNon-ASCII characters detected
is_translationbooleanTranslation request detected
is_creativebooleanCreative writing request
requires_reasoningbooleanAnalytical/reasoning verbs detected
is_securitybooleanSecurity domain keywords
is_devopsbooleanDevOps/infrastructure keywords
is_databooleanData/ML keywords
detected_domainstringBest matching domain
domain_scorenumberDomain match confidence

serve(port?: number): Promise<string>

Start the OpenAI-compatible proxy server.

const proxyURL = await router.serve(8787);
console.log(proxyURL); // "http://localhost:8787/v1"

// Now use with any OpenAI SDK
import OpenAI from 'openai';
const client = new OpenAI({ baseURL: proxyURL, apiKey: 'not-needed' });

Ensemble Module

import { executeEnsemble, recordFeedback } from 'adaptive-memory-multi-model-router/ensemble';

// Run multiple providers in parallel
const result = await executeEnsemble(
    "Explain vector databases",
    systemPrompt, context,
    { nvidia: callNvidia, groq: callGroq },
    { providers: ['nvidia', 'groq'], timeoutMs: 30000 }
);
console.log(`Winner: ${result.winner}`);  // "nvidia"
console.log(`Reasoning: ${result.reasoning}`);

// Track historical accuracy
let history = {};
history = recordFeedback('nvidia', true, history);
history = recordFeedback('groq', false, history);
// → { nvidia: { good: 1, bad: 0 }, groq: { good: 0, bad: 1 } }

Budget Module

import { BudgetManager } from 'adaptive-memory-multi-model-router/billing';

const budgets = new BudgetManager({
    monthlyLimit: 500,
    alerts: [0.5, 0.8, 1.0],
    perTeamLimits: { 'engineering': 200 }
});

budgets.onAlert((alert) => {
    console.log(`${alert.type}: ${alert.team} at ${alert.percentage}%`);
});

Failover Module

import { HealthScoreManager, CircuitBreaker } from 'adaptive-memory-multi-model-router/failover';

// Provider health scoring
const health = new HealthScoreManager({
    latencyWeight: 0.6, errorRateWeight: 0.4
});
health.getScore('groq'); // → 0.85

// Circuit breaker
const cb = new CircuitBreaker({
    failureThreshold: 3, cooldownMs: 60000,
    fallbackChain: ['groq', 'deepseek', 'openai']
});

Semantic Cache

import { SemanticCache } from 'adaptive-memory-multi-model-router/cache';

const cache = new SemanticCache({
    maxSize: 1000,
    similarityThreshold: 0.92,
    ttl: 3600000,
    perRouteTTL: { 'legal/*': 86400000 }
});

cache.getStats(); // { hits: 1, misses: 1, hitRate: 0.5 }

Query-Type Presets

import { createPresetRouter, DEFAULT_PRESETS } from 'adaptive-memory-multi-model-router/presets';

const router = createPresetRouter();
const preset = router.classify("Write a Python function");
// → 'code'
preset.provider;     // 'nvidia'
preset.temperature;  // 0.2
preset.ensemble;     // true

Episodic Memory

import { EpisodicMemoryStore } from 'adaptive-memory-multi-model-router/memory';

const memory = new EpisodicMemoryStore(1000, './.a3m-memory.json');
memory.storeEntry({
    task: { description: "Build a REST API", type: "code", complexity: 0.7 },
    result: { success: true, duration_ms: 45000 },
    agent: { id: "codex", model: "gpt-4o" }
});
memory.getStats(); // { total_entries: 142, success_rate: 0.94 }

🐍 Python SDK

Installation

pip install a3m-router

Usage

from a3m import A3MRouter

async with A3MRouter() as router:
    # Route without executing
    decision = await router.route("Write a Python function to sort an array")
    print(decision.model, decision.tier, decision.cost)

    # Execute via OpenAI-compatible chat
    response = await router.chat("What is 2+2?", model="auto")
    print(response["choices"][0]["message"]["content"])

    # Batch routing
    decisions = await router.route_batch([
        "Hello",
        "Write code",
        "Medical advice"
    ])

🖥️ CLI

All CLI commands are accessible via npx a3m-router.

CommandDescription
route <query>Route a query to the best model
serve [--port PORT]Start OpenAI-compatible proxy server
benchmarkRun routing accuracy tests
healthCheck all provider health status
costShow cost analytics and budget usage
compare <query>Compare all providers side-by-side
modelsList available models with pricing
# Start the proxy
$ npx a3m-router serve
┌──────────────────────────────────────────────────────┐
│                     A3M Router v2.9.2                      │
│                🔀 Intelligent LLM Gateway                 │
├──────────────────────────────────────────────────────└
│  ✅ Proxy:     http://localhost:8787                      │
│  ✅ Dashboard: http://localhost:8787/dashboard             │
│  ✅ Health:    http://localhost:8787/health               │
└──────────────────────────────────────────────────────┘

[GROQ]  ✅ 145ms  |  [DEEPSEEK]  ✅ 230ms  |  [KIMI]  ✅ 312ms
🧠 Memory: 1,247 queries cached | 💰 Today: $2.34 / $50.00 budget

🌐 REST API

MethodEndpointDescription
POST/v1/chat/completionsOpenAI-compatible chat (streaming + non-streaming)
POST/v1/completionsOpenAI text completions
POST/v1/routeRouting decision without LLM call
GET/v1/modelsList available models with pricing
GET/healthProvider health + cost summary
GET/dashboardCost analytics dashboard

POST /v1/route

Get a routing decision without making an LLM call.

curl -s http://localhost:8787/v1/route \
  -H "Content-Type: application/json" \
  -d '{"query": "Design a clinical trial"}' | jq .
{
  "model": "openai/gpt-4o",
  "tier": "premium",
  "cost": 0.0025,
  "complexity": 1.0,
  "reasoning": "Medical domain (+0.35) + design verb (+0.20) + multi-step (+0.15)"
}

POST /v1/chat/completions

Standard OpenAI-compatible chat completions endpoint. Pass model: "auto" for intelligent routing.

curl -s http://localhost:8787/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Explain quantum computing"}],
    "stream": true
  }'

GET /v1/models

List all available models with pricing information.

curl -s http://localhost:8787/v1/models | jq .

GET /health

Check provider health status and cost summary.

curl -s http://localhost:8787/health | jq .

🤝 OpenAI SDK Compatibility

A3M Router is a fully compatible OpenAI API proxy. Point any OpenAI SDK at it with model: "auto".

# Python
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8787/v1", api_key="not-needed")
response = client.chat.completions.create(model="auto", messages=[...])
// TypeScript
import OpenAI from 'openai';
const client = new OpenAI({ baseURL: 'http://localhost:8787/v1', apiKey: 'not-needed' });

Compatible with: LangChain, Vercel AI SDK, LlamaIndex, any OpenAI SDK.

⚙️ Configuration

Environment Variables

VariableDescription
OPENAI_API_KEYOpenAI API key
ANTHROPIC_API_KEYAnthropic API key
GROQ_API_KEYGroq API key
DEEPSEEK_API_KEYDeepSeek API key
NVIDIA_API_KEYNVIDIA API key
GOOGLE_API_KEYGoogle Gemini API key
MISTRAL_API_KEYMistral API key
COHERE_API_KEYCohere API key
A3M_PROVIDERSRestrict to specific providers (comma-separated)
A3M_MONTHLY_BUDGETMonthly budget in USD
A3M_PORTProxy server port (default: 8787)

⚠️ Error Handling

Retry Configuration

import { RetryManager } from 'adaptive-memory-multi-model-router/retry';

const retry = new RetryManager({
    providers: {
        'openai': { timeout: 30000, maxRetries: 3, baseDelay: 1000 },
        'anthropic': { timeout: 45000, maxRetries: 3, baseDelay: 1000 },
        'groq': { timeout: 15000, maxRetries: 2, baseDelay: 500 },
    },
    backoffMultiplier: 2,
    jitter: 0.3,
    rateLimitHandling: 'retry-after',
});
OptionTypeDefaultDescription
timeoutnumber30000Request timeout in ms
maxRetriesnumber3Maximum retry attempts
baseDelaynumber1000Initial backoff delay in ms
backoffMultipliernumber2Exponential backoff multiplier
jitternumber0.3Jitter factor (±30%)

For the full configuration reference, see the Configuration Guide on GitHub.

📦 Package Exports

Import PathDescription
adaptive-memory-multi-model-routerMain package — everything
adaptive-memory-multi-model-router/sdkClean high-level API (A3MRouter class)
adaptive-memory-multi-model-router/cacheSemantic cache module
adaptive-memory-multi-model-router/guardrailsSecurity guardrails (injection, PII)
adaptive-memory-multi-model-router/costCost analytics and tracking
adaptive-memory-multi-model-router/analyticsUsage analytics
adaptive-memory-multi-model-router/memoryEpisodic memory store
adaptive-memory-multi-model-router/langchainLangChain integration
adaptive-memory-multi-model-router/providersProvider definitions and pricing
adaptive-memory-multi-model-router/serverHTTP server
adaptive-memory-multi-model-router/ensembleParallel ensemble execution (P0)
adaptive-memory-multi-model-router/presetsQuery-type presets (P1)
adaptive-memory-multi-model-router/billingBudget enforcement
adaptive-memory-multi-model-router/failoverCircuit breaker + health scoring
adaptive-memory-multi-model-router/retryPer-provider retry logic

🚀 Back to Quick Start 📊 View Benchmarks 📖 Full API Doc on GitHub