Voice Mode

Use your voice to interact with Clew Code via microphone input and OpenAI Whisper transcription.

Overview

Voice mode captures audio from your microphone, transcribes it using OpenAI's Whisper API, and feeds the transcribed text to the agent as regular input.

Setup

1. Set OpenAI API Key

export OPENAI_API_KEY=sk-...

2. Install SoX (required for recording)

macOS:

brew install sox

Ubuntu/Debian:

sudo apt-get install sox

Windows:
Download from sox.sourceforge.net

Usage

❯ /voice              # activate voice mode
❯ /voice status       # check voice session status
❯ /voice off          # deactivate voice mode

Once voice mode is active, press F2 to start/stop recording.

How It Works

User presses F2
      ↓
VoiceRecorder.start() → Microphone → PCM audio chunks
      ↓
VoiceRecorder.stop() → Buffer.concat → WAV conversion
      ↓
VoiceWhisper.transcribe() → OpenAI Whisper API
      ↓
Session.handleInput(transcribedText) → Normal command flow

Key Files

FilePurpose
src/tools/voiceRecorder.tsMicrophone recording via node-record-lpcm16
src/tools/voiceWhisper.tsOpenAI Whisper API for speech-to-text
src/tools/voiceTools.tsVoice session management (start/stop/transcribe)