Caching Strategy

Enabling latency reduction and cost savings for your agent via Semantic Caching.

The Caching tab allows you to connect your agent to shared memory banks. By reusing previously generated audio and text, you can achieve 0-latency responses and significantly reduce API costs.

How it Works

When the agent decides to speak or search, it first checks if it has done this exact task before.

Scenario: Agent needs to say "Welcome to Medina Dental."
Check: Is this phrase in the assigned Audio Cache Group?
Hit: Yes -> Play the file instantly (0ms latency, $0 cost).
Miss: No -> Call ElevenLabs -> Stream Audio -> Save to Cache for next time.

Configuration

To use caching, you must assign a Cache Group from your Data Layer to the specific function of the agent.

1. Audio Cache (TTS)

Priority: Critical.

Function: Stores generated MP3 files.
Strategy: Assign a group here to skip TTS generation for common phrases (Greetings, Standard Questions).
Warning: If you change the Agent's Voice ID, you must clear or change this group, otherwise the agent will speak with mixed voices (Old Cached Voice + New Live Voice).

2. Embedding Cache (RAG)

Priority: High.

Function: Stores vector math for text chunks.
Strategy: Assign a group here to save money on Embedding Models (e.g., OpenAI text-embedding-3). If users ask similar questions, we reuse the math.

3. Message Cache (LLM)

Priority: Low / Experimental.

Function: Stores the LLM's text response.
Strategy: Only use this for Static FAQs (e.g., "Business Hours").
Risk: Using this for dynamic conversation can lead to "Context Blindness" (The agent repeating a generic answer when a specific one was needed).

Shared Intelligence

Cache Groups are reusable. You can assign the same Audio Cache Group to 10 different agents if they all use the same Voice ID. This means Agent B benefits from what Agent A has already learned.

Auto-Caching (Learning Mode)

By default, cache is read-only. To make the agent "learn" over time, enable Auto-Cache.

Behavior: Every time the agent generates new content (that wasn't in the cache), it saves it to the group automatically.
Result: The first user gets standard latency. The second user gets instant responses.

Manage Your Data

Want to view the saved files, set expiry dates (TTL), or manually pre-warm the cache?

Go to the Data & Knowledge > Caching page.