# Sogni SDK - LLM Context Index

> AI-friendly documentation for the Sogni SDK (JavaScript/Node.js)
> Full documentation: llms-full.txt | TypeScript types: dist/index.d.ts
> npm: @sogni-ai/sogni-client | GitHub: https://github.com/Sogni-AI/sogni-client

## Quick Reference

### Installation
```bash
npm install @sogni-ai/sogni-client
```

### Minimal Image Generation
```javascript
import { SogniClient } from '@sogni-ai/sogni-client';

// Option 1: API key auth (recommended) — no login() needed
const sogni = await SogniClient.createInstance({ appId: 'my-app-uuid', apiKey: 'your-api-key' });

// Option 2: Username/password auth
// const sogni = await SogniClient.createInstance({ appId: 'my-app-uuid' });
// await sogni.account.login('username', 'password');

await sogni.projects.waitForModels();

const project = await sogni.projects.create({
  type: 'image',
  modelId: 'flux1-schnell-fp8',
  positivePrompt: 'A cat wearing a hat',
  numberOfMedia: 1,
  steps: 4,
  guidance: 1
});

const urls = await project.waitForCompletion();
console.log(urls[0]); // Image URL (valid 24 hours)
```

### Minimal Video Generation
```javascript
const project = await sogni.projects.create({
  type: 'video',
  network: 'fast', // Required for video
  modelId: 'wan_v2.2-14b-fp8_t2v_lightx2v',
  positivePrompt: 'Ocean waves at sunset',
  numberOfMedia: 1,
  duration: 5, // seconds
  fps: 16
});

const urls = await project.waitForCompletion();
console.log(urls[0]); // Video URL (valid 24 hours)
```

---

## Index of Topics

1. **Authentication & Setup** - Client initialization, login, network types
2. **Image Generation** - Text-to-image, img2img, ControlNets
3. **Video Generation (WAN 2.2)** - t2v, i2v, s2v, animate-move, animate-replace
4. **Video Generation (LTX-2.3 / LTX-2)** - Recommended video models with different fps behavior
5. **Audio Generation (ACE-Step 1.5)** - Text-to-music with optional lyrics
6. **LLM Text Generation** - Chat completions, streaming, multi-turn conversations
7. **LLM Tool Calling** - Function calling with custom tools and Sogni platform tools
8. **Vision Chat** - Multimodal image understanding with VLM (scene description, OCR, object detection, visual analysis)
9. **Project Parameters** - Complete parameter reference
10. **Events & Progress** - Real-time tracking, completion handling
11. **Models & Presets** - Discovering available models, size presets, samplers
12. **Error Handling** - Common errors and recovery
13. **API Reference** - Full method signatures

---

## 1. Authentication & Setup

### Client Creation with API Key (Recommended)

Get your API key: Log in to dashboard.sogni.ai and click your Username dropdown in the top-right corner.

```javascript
const sogni = await SogniClient.createInstance({
  appId: 'unique-uuid',     // Required - identifies your app
  network: 'fast',          // 'fast' (GPU) or 'relaxed' (Mac)
  apiKey: 'your-api-key'    // Auto-authenticates, no login() needed
});
```

### Client Creation with Username/Password
```javascript
const sogni = await SogniClient.createInstance({
  appId: 'unique-uuid',
  network: 'fast'
});
await sogni.account.login(username, password);
```

### API Key vs Username/Password
- **API key**: Pass `apiKey` to `createInstance()`. Auto-authenticates via WebSocket. No `login()` call needed. Most REST API calls (balance, profile, etc.) available. Sensitive operations (withdrawals, staking, 2FA) not available.
- **Username/password**: Call `sogni.account.login()` after creating instance. Full REST API access.

### Network Types
- `fast` - High-end GPUs, faster, more expensive. **Required for video.**
- `relaxed` - Mac devices, cheaper. Image only.

---

## 2. Image Generation

### Basic Parameters
```javascript
{
  type: 'image',
  modelId: string,           // e.g., 'flux1-schnell-fp8'
  positivePrompt: string,
  negativePrompt?: string,
  stylePrompt?: string,
  numberOfMedia: number,     // How many images
  steps?: number,            // 4 for Flux, 20-40 for SD
  guidance?: number,         // 1 for Flux, 7.5 for SD
  sizePreset?: string,       // or 'custom' with width/height
  width?: number,
  height?: number,
  seed?: number,
  sampler?: string,
  scheduler?: string,
  outputFormat?: 'png' | 'jpg'
}
```

### With Starting Image (img2img)
```javascript
{
  type: 'image',
  startingImage: fs.readFileSync('./input.png'),
  startingImageStrength: 0.5  // 0-1, higher = more influence
}
```

### With ControlNet
```javascript
{
  type: 'image',
  controlNet: {
    name: 'canny' | 'depth' | 'openpose' | 'lineart' | ...,
    image: imageBuffer,
    strength: 0.8,
    mode: 'balanced' | 'prompt_priority' | 'cn_priority'
  }
}
```

---

## 3. Video Generation (WAN 2.2)

### CRITICAL: WAN 2.2 FPS Behavior
WAN models always generate at 16fps internally. The `fps` parameter only controls post-render interpolation:
- `fps: 16` - No interpolation, output is 16fps
- `fps: 32` - Frames doubled via interpolation

Frame calculation: `duration * 16 + 1`
Example: 5 seconds = 81 frames (regardless of fps setting)

### Model IDs
| Workflow | Speed Model | Quality Model |
|----------|-------------|---------------|
| Text-to-Video | wan_v2.2-14b-fp8_t2v_lightx2v | wan_v2.2-14b-fp8_t2v |
| Image-to-Video | wan_v2.2-14b-fp8_i2v_lightx2v | wan_v2.2-14b-fp8_i2v |
| Sound-to-Video | wan_v2.2-14b-fp8_s2v_lightx2v | wan_v2.2-14b-fp8_s2v |
| Animate-Move | wan_v2.2-14b-fp8_animate-move_lightx2v | - |
| Animate-Replace | wan_v2.2-14b-fp8_animate-replace_lightx2v | - |

### Workflow Asset Requirements
| Workflow | referenceImage | referenceAudio | referenceVideo |
|----------|----------------|----------------|----------------|
| t2v | - | - | - |
| i2v | Required | - | - |
| s2v | Required | Required | - |
| animate-move | Required | - | Required |
| animate-replace | Required | - | Required |

### Image-to-Video Example
```javascript
const project = await sogni.projects.create({
  type: 'video',
  network: 'fast',
  modelId: 'wan_v2.2-14b-fp8_i2v_lightx2v',
  positivePrompt: 'camera slowly zooms in',
  referenceImage: fs.readFileSync('./image.png'),
  duration: 5,
  fps: 16,
  numberOfMedia: 1
});
```

### Sound-to-Video (Lip Sync)
```javascript
const project = await sogni.projects.create({
  type: 'video',
  network: 'fast',
  modelId: 'wan_v2.2-14b-fp8_s2v_lightx2v',
  referenceImage: fs.readFileSync('./face.jpg'),
  referenceAudio: fs.readFileSync('./speech.m4a'),
  audioStart: 0,       // Start position in audio
  audioDuration: 5,    // Seconds of audio to use
  duration: 5,
  fps: 16,
  numberOfMedia: 1
});
```

---

## 4. Video Generation (LTX-2.3 / LTX-2)

### LTX FPS Behavior (Different from WAN!)
LTX models generate at the actual specified FPS (1-60 range). No interpolation.
- Frame calculation: `duration * fps + 1`
- Frame count must follow: `1 + n*8` (1, 9, 17, 25, 33, ...)

Example: 5 seconds at 24fps = 121 frames

### LTX Model IDs
- Speed models (`_distilled` suffix): 8-step, faster
- Quality models (`_dev` suffix or no suffix): 20-step, best quality

**LTX-2.3 22B (Recommended):**
| Workflow | Fast | Quality |
|----------|------|---------|
| Text-to-Video | `ltx23-22b-fp8_t2v_distilled` | `ltx23-22b-fp8_t2v_dev` |
| Image-to-Video | `ltx23-22b-fp8_i2v_distilled` | `ltx23-22b-fp8_i2v_dev` |
| Audio-to-Video | `ltx23-22b-fp8_a2v_distilled` | `ltx23-22b-fp8_a2v_dev` |
| Image+Audio-to-Video | `ltx23-22b-fp8_ia2v_distilled` | `ltx23-22b-fp8_ia2v_dev` |

**Video-to-Video ControlNet (LTX-2 only):**
| Workflow | Fast | Quality |
|----------|------|---------|
| Video-to-Video (ControlNet) | `ltx2-19b-fp8_v2v_distilled` | `ltx2-19b-fp8_v2v` |

IMPORTANT: ControlNet (canny/pose/depth/detailer) requires a `_v2v` model, NOT `_i2v`.

---

## 5. Audio Generation (ACE-Step 1.5)

### Model Variants
| Model ID | Name | Description |
|----------|------|-------------|
| ace_step_1.5_turbo | Fast & Catchy | Quick generation, best quality sound |
| ace_step_1.5_sft | More Control | More accurate lyrics, less stable |

### Text-to-Music (Instrumental)
```javascript
const project = await sogni.projects.create({
  type: 'audio',
  modelId: 'ace_step_1.5_turbo',
  positivePrompt: 'Upbeat electronic dance music with synth leads',
  numberOfMedia: 1,
  duration: 30,       // 10-600 seconds
  bpm: 128,           // 30-300
  keyscale: 'C major',
  timesignature: '4', // 4/4 time
  steps: 8,
  outputFormat: 'mp3'
});
const urls = await project.waitForCompletion();
```

### Text-to-Music (With Lyrics)
```javascript
const project = await sogni.projects.create({
  type: 'audio',
  modelId: 'ace_step_1.5_sft',
  positivePrompt: 'Soft acoustic folk ballad',
  lyrics: 'Verse 1:\nWalking down a quiet road...',
  language: 'en',
  numberOfMedia: 2,   // Generate 2 versions
  duration: 60,
  bpm: 90,
  keyscale: 'A minor',
  composerMode: true,
  creativity: 0.85,
  promptStrength: 2.0
});
```

### Key Parameters
| Parameter | Range | Default | Notes |
|-----------|-------|---------|-------|
| duration | 10-600 | 30 | Seconds of audio |
| bpm | 30-300 | 120 | Beats per minute |
| keyscale | key + scale | C major | e.g., "A minor", "F# major" |
| timesignature | 2, 3, 4, 6 | 4 | Time signature |
| lyrics | string | - | Omit for instrumental |
| language | code | en | 51 languages supported |
| composerMode | boolean | true | AI composer mode |
| promptStrength | 0-10 | 2.0 | Prompt adherence |
| creativity | 0-2 | 0.85 | Composition temperature |
| steps | 4-16 | 8 | Inference steps |
| outputFormat | mp3/wav/flac | mp3 | Audio format |

---

## 6. LLM Text Generation

The Sogni SDK supports LLM text generation via the Supernet, providing an OpenAI-compatible chat completions API.

### Chat Completion (Non-Streaming)
```javascript
const response = await sogni.projects.chatCompletion({
  model: 'qwen3.5-35b-a3b-gguf-q4km',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Explain quantum computing briefly.' }
  ],
  max_tokens: 4096,
  temperature: 0.7,
  top_p: 0.9
});
console.log(response.choices[0].message.content);
```

### Streaming Chat Completion
```javascript
const stream = await sogni.projects.chatCompletionStream({
  model: 'qwen3.5-35b-a3b-gguf-q4km',
  messages: [{ role: 'user', content: 'Tell me a story' }],
  max_tokens: 4096,
  stream: true
});
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0].delta.content || '');
}
```

### Key Parameters
| Parameter | Type | Default | Notes |
|-----------|------|---------|-------|
| model | string | qwen3.5-35b-a3b-gguf-q4km | LLM model ID |
| messages | array | - | Chat messages (system/user/assistant/tool roles) |
| max_tokens | number | 4096 | Maximum output tokens |
| temperature | number | 0.7 | Sampling temperature (0-2) |
| top_p | number | 0.9 | Nucleus sampling (0-1) |
| frequency_penalty | number | 0 | Repetition penalty (-2 to 2) |
| presence_penalty | number | 0 | Topic penalty (-2 to 2) |
| stream | boolean | false | Enable token-by-token streaming |

### Thinking Mode
Enable model reasoning/thinking with `chat_template_kwargs`:
- Thinking enabled: Model shows `<think>` blocks with reasoning steps
- Thinking disabled: Direct responses only

---

## 7. LLM Tool Calling (Function Calling)

Define custom tools the LLM can invoke. The LLM returns structured tool call arguments; you execute the function and feed results back.

### Tool Definition (OpenAI-compatible format)
```javascript
const tools = [{
  type: 'function',
  function: {
    name: 'get_weather',
    description: 'Get current weather for a city',
    parameters: {
      type: 'object',
      properties: {
        location: { type: 'string', description: 'City name' }
      },
      required: ['location']
    }
  }
}];

const response = await sogni.projects.chatCompletion({
  model: 'qwen3.5-35b-a3b-gguf-q4km',
  messages: [{ role: 'user', content: "What's the weather in Austin?" }],
  tools,
  tool_choice: 'auto'
});
// Check response.choices[0].message.tool_calls for tool invocations
```

### Sogni Platform Tools — Media Generation via Chat
Combine LLM tool calling with Sogni's generation APIs to create media from natural language:

- **Image Generation** — "Create an image of a cyberpunk city at night"
- **Video Generation** — "Generate a video of ocean waves at sunset"
- **Music Generation** — "Compose a jazz song about the rain"

The LLM detects media intent, enhances the prompt, and calls Sogni's image/video/audio generation APIs directly. See `workflow_text_chat_sogni_tools.mjs` for the complete implementation.

Default generation models used by platform tools:
| Media Type | Model | Description |
|------------|-------|-------------|
| Image | z_image_turbo_bf16 | Z-Image Turbo, ultra-fast 8-step |
| Video | ltx23-22b-fp8_t2v_distilled | LTX-2.3 text-to-video, fast |
| Audio | ace_step_1.5_turbo | ACE-Step 1.5 Turbo, fast music |

---

## 8. Vision Chat (Multimodal Image Understanding)

The SDK supports multimodal vision chat via VLM (Vision-Language Model) workers on the Sogni network. Send images alongside text messages for scene description, OCR, object detection, visual analysis, and multi-image comparison.

### VLM Model
| Model ID | Name | Description |
|----------|------|-------------|
| `qwen3.5-35b-a3b-gguf-q4km` | Qwen3.5 35B VLM | Vision-language model with 32K context |

### Multimodal Message Format
```javascript
const messages = [
  { role: 'system', content: 'You are a visual analysis assistant.' },
  {
    role: 'user',
    content: [
      { type: 'image_url', image_url: { url: 'data:image/jpeg;base64,...' } },
      { type: 'text', text: 'Describe this image in detail.' }
    ]
  }
];

const stream = await sogni.chat.completions.create({
  model: 'qwen3.5-35b-a3b-gguf-q4km',
  messages,
  max_tokens: 4096,
  stream: true
});
```

### Supported Capabilities
- **Scene description** — Detailed image descriptions including subjects, colors, lighting, mood
- **OCR / Text extraction** — Extract visible text with layout preservation
- **Object detection** — Identify objects with location, size, and spatial relationships
- **Structured analysis** — Subject, composition, lighting, color, technical, mood, style, context
- **Multi-image comparison** — Compare two images across multiple aspects

See `workflow_text_chat_vision.mjs` for a complete interactive vision chat implementation.

---

## 9. Events & Progress

### Promise-based
```javascript
const urls = await project.waitForCompletion();
```

### Event-based
```javascript
project.on('progress', (percent) => console.log(`${percent}%`));
project.on('jobCompleted', (job) => console.log(job.resultUrl));
project.on('completed', (urls) => console.log('Done:', urls));
project.on('failed', (error) => console.error(error));
```

### Job-level Events
```javascript
project.jobs[0].on('progress', ({ step, stepCount }) => {
  console.log(`Step ${step}/${stepCount}`);
});
```

---

## 10. Discovering Models & Options

### List Available Models
```javascript
const models = await sogni.projects.waitForModels();
// or
const models = sogni.projects.availableModels;
```

### Get Size Presets
```javascript
const presets = await sogni.projects.getSizePresets('fast', 'flux1-schnell-fp8');
```

### Get Sampler/Scheduler Options
```javascript
const options = await sogni.projects.getModelOptions('flux1-schnell-fp8');
console.log(options.sampler.allowed);
console.log(options.scheduler.allowed);
```

---

## 11. Cost Estimation

```javascript
const cost = await sogni.projects.estimate({
  network: 'fast',
  model: 'flux1-schnell-fp8',
  imageCount: 1,
  stepCount: 4,
  previewCount: 0
});
console.log(cost.sogni, cost.usd);
```

---

## 12. Key Types Summary

```typescript
type ProjectParams = ImageProjectParams | VideoProjectParams | AudioProjectParams;

interface ImageProjectParams {
  type: 'image';
  modelId: string;
  positivePrompt: string;
  numberOfMedia: number;
  // ... see llms-full.txt for complete list
}

interface VideoProjectParams {
  type: 'video';
  modelId: string;
  positivePrompt: string;
  numberOfMedia: number;
  duration?: number;
  fps?: number;
  referenceImage?: File | Buffer | Blob;
  referenceAudio?: File | Buffer | Blob;
  referenceVideo?: File | Buffer | Blob;
  // ... see llms-full.txt for complete list
}

interface AudioProjectParams {
  type: 'audio';
  modelId: string;
  positivePrompt: string;
  numberOfMedia: number;
  duration?: number;
  bpm?: number;
  lyrics?: string;
  language?: string;
  // ... see llms-full.txt for complete list
}
```

---

## Full Documentation

For complete API reference, all parameters, and advanced usage:
- **llms-full.txt** - Comprehensive guide in this repository
- **https://sdk-docs.sogni.ai** - TypeDoc API documentation
- **https://github.com/Sogni-AI/sogni-client/tree/main/examples** - Working examples
