# Interruption Engine (Reference: https://docs.iqra.bot/build/agent/interruption)
The **Interruption Engine** is the referee of the conversation. It manages the delicate balance between "Listening" and "Speaking."
Visualizing the Logic [#visualizing-the-logic]
Before configuring parameters, it helps to understand how the agent makes decisions in real-time when it hears a sound.
***
1. Turn-End Detection [#1-turn-end-detection]
**"Are you done talking?"**
How does the agent know the user has finished their sentence and is waiting for a response?
} title="VAD (Voice Activity)">
**Fastest.** Detects silence. If the user stops making sound for X milliseconds (e.g., 500ms), the turn ends.
} title="Transcription">
**Balanced.** Uses the STT provider to check for a grammatically complete sentence (Final Segment).
} title="LLM Decision">
**Context Aware.** Sends the transcript to an LLM. It understands that a user saying *"I want..."* is not done, even if they pause for 2 seconds.
} title="ML (Smart Turn)">
**Experimental / High Fidelity.** Uses a specialized ML model to predict turn-end based on tone and prosody.
Deep Dive: ML Smart Turn (Pipecat) [#deep-dive-ml-smart-turn-pipecat]
This strategy uses the open-source **[Smart Turn](https://github.com/pipecat-ai/smart-turn)** model by Pipecat AI. It analyzes the raw audio stream to detect when a user has finished speaking based on intonation, not just silence.
While this is the "Gold Standard" for natural conversation, it is currently **Experimental**.
* **Numbers & Emails:** It may struggle to detect turn-end correctly when a user is reciting long number sequences or spelling emails.
* **Language Support:** Check the [GitHub Repo](https://github.com/pipecat-ai/smart-turn) for the latest supported languages.
* **Recommendation:** Use standard VAD for IVR/Data Collection bots. Use Smart Turn for conversational/support bots.
***
2. Interruption Handling (Barge-In) [#2-interruption-handling-barge-in]
**"Did you just interrupt me?"**
When the agent is speaking, and the user makes a sound, the agent must decide whether to stop or ignore it.
Pause Triggers [#pause-triggers]
How much "noise" triggers the agent to pause?
* **Pause via VAD:** Triggered by continuous sound duration (e.g., user speaks for 400ms). Best for low latency.
* **Pause via Word Count:** Triggered only after X words are transcribed. Prevents pausing for coughs or door slams.
AI Verification (False Positive Protection) [#ai-verification-false-positive-protection]
This feature prevents the agent from stopping due to "Backchanneling" (the user saying "Uh-huh", "Yes", or "Right" to show they are listening).
**How it works:**
1. User speaks > Agent **Pauses**.
2. The audio is transcribed and sent to a fast LLM.
3. **The Check:** The LLM compares the user's input against the Agent's current speech.
* *Input:* "Wait, stop." > **Agent Stops** and handles the new query.
* *Input:* "I see..." > **Agent Resumes** speaking exactly where it left off.
If the interruption was valid (Agent Stops), you can enable **"Include Interrupted Speech in Next Turn"**. This ensures that whatever the agent *didn't* get to say is fed back into the LLM context, so the logic isn't lost.
***
3. Turn-by-Turn Mode [#3-turn-by-turn-mode]
**Disable Interruptions**
For specific use cases (like taking a dictation, a formal interview, or legacy radio-style comms), you might want to disable interruptions entirely.
* **Behavior:** The agent will ignore all user audio until it has finished playing its current TTS response.
* **Use Case:** Highly disciplined workflows or environments with extreme background noise (Construction sites, Busy streets).
***
Future Vision: Dynamic Context Adaptation [#future-vision-dynamic-context-adaptation]
Currently, Interruption and Turn-End settings are **Static**—they apply globally to the Agent for the entire duration of the call.
However, we understand that conversation flow changes based on context.
* **Scenario A (Yes/No):** *"Do you want to proceed?"* > Requires short silence threshold (e.g., 400ms) for a snappy feel.
* **Scenario B (Data Entry):** *"Please spell your email address."* > Requires long silence threshold (e.g., 1500ms) because users pause while spelling.
We are actively developing **Context-Aware Turn Taking**. This will allow the Agent (or specific Script Nodes) to programmatically adjust the VAD sensitivity, silence duration, and interruption strategy mid-call based on the specific question being asked.