Skip to main content
In a natural conversation, people often talk over each other. “Barge-in” is the ability for a user to interrupt the AI while it is speaking. Implementing this correctly is crucial for making an AI feel “real.”

How it Works

When using the Lokutor VoiceAgentClient, the system is designed to handle this loop:
  1. User Speaks: Your application continues to stream audio input.
  2. VAD Detection: Our Voice Activity Detection (VAD) identifies that the user has started speaking.
  3. Interrupt Signal: The server immediately stops generating audio for the current response.
  4. Buffer Clearing: Your client application must clear its local playback buffer to stop the audio immediately.

Implementation Guide

The Lokutor SDKs handle the interruption signal automatically at the protocol level. However, you must ensure your audio playback queue is cleared.
client.onStatus((status) => {
  if (status === 'interrupted') {
    // 1. Stop your local speaker/audio player
    audioPlayer.stop();
    // 2. Clear any pending audio chunks
    audioBuffer.clear();
    console.log("Agent was interrupted by user.");
  }
});

Manual WebSocket Implementation

If you are communicating directly via WebSocket, you should listen for an interruption event:
{
  "type": "event",
  "event": "agent_interrupted",
  "reason": "user_speech_detected"
}
Upon receiving this:
  1. Drop all chunks: Discard any binary chunks currently in your network buffer.
  2. Kill Playback: Stop the current audio playback stream immediately.

Best Practices

  • Latency is Key: The faster you detect the user’s voice and kill the agent’s playback, the more natural it feels. We recommend a local VAD (like Silero or WebRTC VAD) in addition to our server-side detection for the fastest possible response.
  • Graceful Fade: instead of a hard cut, a very fast (50ms) linear volume fade-out can make the interruption feel less jarring to the user.