Skip to main content

🛸 Lokutor JavaScript SDK

The Lokutor JavaScript SDK is a modern, TypeScript-first library designed for high-performance audio applications in both the Browser and Node.js.

Installation

npm install @lokutor/sdk

Initialization

import { LokutorClient } from '@lokutor/sdk';

const lokutor = new LokutorClient({
  apiKey: 'YOUR_API_KEY',
  baseUrl: 'https://api.lokutor.ai' // Optional: defaults to production
});

🎙️ Text-to-Speech (TTS) Module

The tts module handles all speech synthesis tasks.

1. Simple Synthesis (Batch)

Perfect for short texts where you need the full audio at once.
const { audio, visemes, duration } = await lokutor.tts.synthesize("Hello world!", {
  voice: 'F1',
  quality: 'high',
  outputFormat: 'mp3_22050',
  includeVisemes: true
});

// audio is base64-encoded audio data
Parameters:
  • text (required): Text to synthesize (1-50,000 characters)
  • voice (required): Voice ID - M1, M2, F1, F2
  • quality: ultra_fast, fast, medium, high, ultra_high (default: medium)
  • speed: Speech speed multiplier, 0.5-2.0 (default: 1.05)
  • outputFormat: pcm_22050, mp3_22050, ulaw_8000 (default: pcm_22050)
  • includeVisemes: Include viseme timing data (default: false)
  • language: en or es (default: en)

2. Real-Time Streaming

Use this for the lowest possible latency. Requires voiceId as a parameter.
const stream = lokutor.tts.stream("M1", "This is a long sentence being streamed in real-time.", {
  quality: 'ultra_fast',
  includeVisemes: true
});

for await (const chunk of stream) {
  // chunk.audio: ArrayBuffer or base64 string
  // chunk.visemes: Viseme[] (if enabled)
  console.log("Received chunk", chunk.audio.byteLength, "bytes");
}
Note: The streaming endpoint requires the voice ID as the first parameter, as it’s part of the API path: POST /api/tts/{voice_id}/stream

3. Async Long-Form Jobs

For texts longer than 5,000 characters, use the async jobs API.
const task = await lokutor.tts.createAsyncJob("F1", "Your very long text here...", {
  quality: 'high'
});

console.log("Task ID:", task.task_id);

// Poll for status
const status = await lokutor.tts.getTaskStatus(task.task_id);
if (status.status === 'completed') {
  console.log("Download URL:", status.download_url);
}
Note: The async endpoint requires the voice ID as the first parameter: POST /api/tts/{voice_id}/async

🗣️ Voice Chat Module (Flagship)

The voiceChat module is a high-level wrapper around Lokutor’s bidirectional WebSocket protocol. It manages the conversation state engine and audio processing.

Creating a Session

const session = lokutor.voiceChat.createSession({
  voiceId: 'F1',
  quality: 'ultra_fast',
  includeVisemes: true,
  language: 'en'
});

Event Handlers

EventDescriptionData
connectedFired when WS connection is established.void
agentSpeakingAgent started sending audio chunks.boolean
visemeReal-time viseme for LipSync.{ id: number, offset_ms: number }
interruptedFired when the agent stops due to user speech.void
textDeltaPartial transcript of user speech (real-time).{ text: string }
configuredConfiguration confirmation from server.{ voice_id, quality, sample_rate, include_visemes, language }
errorAny session or network error.{ message: string }
disconnectedConnection closed.void

Starting a Conversation (Browser)

// 1. Establish WebSocket and register listeners
await session.start();

session.on('viseme', (v) => {
  myAvatar.animate(v.id);
});

session.on('textDelta', (data) => {
  console.log('User is saying:', data.text);
});

// 2. Start Microphone capture and Transcription
// This automatically sends user audio to Lokutor
await session.startTranscription();

WebSocket Message Format

The WebSocket connection uses JSON messages with the following structure: Client → Server:
// Configure session
{
  type: "configure",
  data: {
    voice_id: "F1",
    quality: "ultra_fast",
    include_visemes: true,
    language: "en"
  },
  message_id: "optional-id"
}

// Send audio chunk
{
  type: "audio",
  data: {
    audio: "base64-encoded-pcm-data",
    sample_rate: 16000
  }
}

// Send text message
{
  type: "text",
  data: {
    text: "Hello, how are you?"
  }
}

// Ping
{
  type: "ping",
  message_id: "ping-1"
}
Server → Client:
// Audio chunk
{
  type: "audio_chunk",
  data: {
    audio: "base64-encoded-audio",
    sample_rate: 22050,
    visemes: [{ id: 0, offset_ms: 0 }] // if enabled
  },
  message_id: "optional-id"
}

// Audio stream ended
{
  type: "audio_end",
  message_id: "optional-id"
}

// Audio interrupted
{
  type: "audio_interrupted",
  data: {
    reason: "user_speaking" | "speech_start"
  }
}

// Partial transcript
{
  type: "text_delta",
  data: {
    text: "partial user transcript..."
  }
}

// Configuration confirmed
{
  type: "configured",
  data: {
    voice_id: "F1",
    quality: "ultra_fast",
    sample_rate: 22050,
    include_visemes: true,
    language: "en"
  }
}

// Pong response
{
  type: "pong",
  message_id: "ping-1"
}

// Error
{
  type: "error",
  data: {
    message: "Error description"
  }
}

Authentication

WebSocket connections support multiple authentication methods:
// Option 1: JWT in query parameter (recommended for browsers)
const wsUrl = `wss://api.lokutor.ai/ws/voice-chat?token=${jwtToken}`;

// Option 2: API key in query parameter
const wsUrl = `wss://api.lokutor.ai/ws/voice-chat?api_key=${apiKey}`;

// Option 3: JWT in Authorization header
headers: { 'Authorization': `Bearer ${jwtToken}` }

// Option 4: API key in header
headers: { 'xi-api-key': apiKey }

🛠️ Audio Utilities

The SDK exports internal utilities used by the modules, which you can use for custom implementations.

LokutorPlayer (Browser Only)

A PCM/MP3 player optimized for streaming with a built-in Jitter Buffer.
import { LokutorPlayer } from '@lokutor/sdk';

const player = new LokutorPlayer(0.05); // 50ms jitter buffer
await player.playChunk(pcmData, 22050);

// Stop immediately on user interrupt
player.stopImmediately();

LokutorRecorder (Browser Only)

A microphone recorder that handles everything for you.
import { LokutorRecorder } from '@lokutor/sdk';

const recorder = new LokutorRecorder(16000); // 16kHz
await recorder.start((pcm) => {
  console.log("Int16 PCM Chunk:", pcm);
});

🎭 Voices & Models

Retrieve available assets for your application.
const voices = await lokutor.voices.list();
const models = await lokutor.voices.listModels();
Available Voices:
  • M1 - Male voice 1
  • M2 - Male voice 2
  • F1 - Female voice 1
  • F2 - Female voice 2
Supported Languages:
  • en - English
  • es - Spanish

Advanced: Latency & Jitter

The JS SDK implements Precise Scheduling. When a chunk arrives, it is scheduled in the Web Audio API timeline at currentTime + jitterBuffer. This prevents gaps in audio caused by transient network jitter.

TypeScript Interfaces

export interface Viseme {
  id: number;       // Azure standard index (0-21)
  offset_ms: number; // Offset from start of audio chunk
}

export type Quality = 'ultra_fast' | 'fast' | 'medium' | 'high' | 'ultra_high';

export type OutputFormat = 'pcm_22050' | 'mp3_22050' | 'ulaw_8000';

export type Language = 'en' | 'es';

export interface SynthesizeOptions {
  voice: 'M1' | 'M2' | 'F1' | 'F2';
  quality?: Quality;
  speed?: number; // 0.5 - 2.0
  outputFormat?: OutputFormat;
  includeVisemes?: boolean;
  language?: Language;
}

Error Handling

try {
  const audio = await lokutor.tts.synthesize("Hello", {
    voice: 'M1',
    quality: 'high'
  });
} catch (error) {
  if (error.status === 401) {
    console.error('Authentication failed');
  } else if (error.status === 429) {
    console.error('Rate limit exceeded');
  } else {
    console.error('TTS error:', error.message);
  }
}