How it Works
When using the LokutorVoiceAgentClient, the system is designed to handle this loop:
- User Speaks: Your application continues to stream audio input.
- VAD Detection: Our Voice Activity Detection (VAD) identifies that the user has started speaking.
- Interrupt Signal: The server immediately stops generating audio for the current response.
- Buffer Clearing: Your client application must clear its local playback buffer to stop the audio immediately.
Implementation Guide
Using the SDK (Recommended)
The Lokutor SDKs handle the interruption signal automatically at the protocol level. However, you must ensure your audio playback queue is cleared.Manual WebSocket Implementation
If you are communicating directly via WebSocket, you should listen for an interruption event:- Drop all chunks: Discard any binary chunks currently in your network buffer.
- Kill Playback: Stop the current audio playback stream immediately.
Best Practices
- Latency is Key: The faster you detect the user’s voice and kill the agent’s playback, the more natural it feels. We recommend a local VAD (like Silero or WebRTC VAD) in addition to our server-side detection for the fastest possible response.
- Graceful Fade: instead of a hard cut, a very fast (50ms) linear volume fade-out can make the interruption feel less jarring to the user.