Understanding the Latency Chain
The “End-to-End” latency is the sum of:- User Audio to Server: Network latency for streaming the user’s speech.
- STT (Transcription): Time to turn audio into text.
- LLM (Thinking): Time for the model to generate the first token.
- TTS (Lokutor): Time to generate the first audio chunk (TTFB).
- Playback Buffer: The safety margin your player keeps to prevent stuttering.
Lokutor Optimizations
1. Adjusting Denoising Steps
Thesteps parameter in our API directly controls the compute time.
- Use
steps: 3for extreme low latency. - Use
steps: 5as the sweet spot for quality vs. speed.