Real-time Streaming

Overview

Real-time streaming is built for applications that require immediate responsiveness, such as conversational AI, voice assistants, and interactive gaming. Our WebSocket-based API delivers audio with sub-50ms latency, streaming raw PCM audio chunks as they’re generated.

Interactive Playground

Test the real-time speed of the Lokutor engine. The TTFB measurement shows exactly how fast we start delivering audio.

WebSocket API Reference

Connection

Initiate a WebSocket connection to wss://api.lokutor.com/ws. Include your API key as a query parameter in the WebSocket URL:

URL: wss://api.lokutor.com/ws?api_key=your-api-key

Synthesis Request (Client to Server)

Send a JSON-formatted string to start synthesis.

Request Schema

{
  "text": "Hello, how can I help you today?",
  "voice": "M1",
  "lang": "en",
  "speed": 1.05,
  "steps": 5,
  "version": "versa-1.0",
  "visemes": true
}

Parameter	Type	Required	Default	Description
`text`	string	Yes	-	The text to be synthesized into speech.
`voice`	string	No	`M1`	Voice ID. See Available Voices.
`lang`	string	No	`en`	ISO language code. Options: `en`, `es`, `ko`, `pt`, `fr`.
`speed`	float	No	`1.05`	Synthesis speed multiplier (0.5 to 2.0 recommended).
`steps`	int	No	`5`	Denoising steps. Higher = higher quality, higher latency.
`version`	string	No	`versa-1.0`	Model version to use.
`visemes`	boolean	No	`false`	Enable/Disable high-fidelity lipsync data generation.

Synthesis Response (Server to Client)

The server responds with a stream of messages.

Binary Messages (Audio): Multiple binary chunks containing raw PCM audio.
- Format: Signed 16-bit PCM (S16LE).
- Sample Rate: 44,100 Hz.
- Channels: 1 (Mono).
- Byte Order: Little Endian.
Text Message (JSON Visemes): If visemes: true is requested, the server sends JSON arrays containing lipsync metadata synchronized with the audio stream.
Text Message “EOS”: Sent when the synthesis for the current request is complete.
Text Message “ERR: <message>”: Sent if an error occurs during processing.

Viseme Data Format

When visemes is enabled, the server will yield metadata in the following format:

[
  {
    "v": 10,       // Character index in the input text
    "c": "w",      // Character being spoken
    "t": 0.418     // Absolute timestamp (seconds) relative to sentence start
  }
]

These messages are sent as WebSocket Text Messages and are interleaved with the binary audio chunks. They always represent the alignment for the immediately following audio data.

Available Voices

ID	Gender	Description
`F1` - `F5`	Female	Various feminine tones from professional to casual.
`M1` - `M5`	Male	Various masculine tones from deep baritone to energetic.

[!TIP] Use F1 or M1 for the best general-purpose performance and naturalness.

Best Practices for Low Latency

Persistent Connections: Keep the WebSocket connection open for multiple requests to avoid handshake latency.
Buffer Management: Since we stream audio in small chunks (~20-40ms), ensure your client-side player has a small buffer (e.g., 100-200ms) to handle network jitter without gaps.
Optimized Steps: Use steps: 3 for extreme low latency (Real-time agents) or steps: 10 for high-quality content generation.

Error Codes

401 Unauthorized: Missing or invalid X-API-Key.
429 Too Many Requests: Rate limit exceeded.
503 Service Unavailable: Server overload or maintenance.

Implementation Examples

JavaScript
Python

const ws = new WebSocket(`wss://api.lokutor.com/ws?api_key=your-api-key`);

ws.onopen = () => {
  ws.send(JSON.stringify({
    text: "Hello, world!",
    voice: "M1"
  }));
};

ws.onmessage = (event) => {
  if (typeof event.data === 'string') {
    if (event.data === 'EOS') {
      console.log('Done');
    }
  } else {
    // Handle audio data
    playAudio(event.data);
  }
};

import websocket
import json

def on_message(ws, message):
    if isinstance(message, str):
        if message == 'EOS':
            ws.close()
    else:
        # Handle audio
        pass

ws = websocket.WebSocketApp(
    "wss://api.lokutor.com/ws?api_key=your-api-key",
    on_message=on_message
)
ws.on_open = lambda ws: ws.send(json.dumps({"text": "Hello"}))
ws.run_forever()

Welcome

API Reference

​Overview

​Interactive Playground

​WebSocket API Reference

​Connection

​Synthesis Request (Client to Server)

​Request Schema

​Synthesis Response (Server to Client)

​Viseme Data Format

​Available Voices

​Best Practices for Low Latency

​Error Codes

​Implementation Examples