# Knowledge Base (RAG) (Reference: https://docs.iqra.bot/build/knowledge/rag)

The **Knowledge Base** is your agent's long-term memory. It allows the AI to "read" your proprietary documents (PDFs, Manuals, Policies) and answer questions based *only* on that data.

Iqra AI uses a **RAG (Retrieval Augmented Generation)** pipeline. We do not retrain the model; we inject relevant information into the context window in real-time.

The Ingestion Pipeline [#the-ingestion-pipeline]

It is important to understand how we treat your data. **We do not store your raw files.** Once uploaded, a document is converted into text, chunked, and embedded.

<Mermaid
  chart="graph LR
    File[Upload File] --> Extract[Text Extraction]
    Extract --> Chunk[Chunking Strategy]
    Chunk --> Edit[User Review / Edit]
    Edit --> Embed[Embedding Model]
    Embed --> Vector[Vector Database]
    
    style Edit fill:#f59e0b,stroke:#333,stroke-width:2px,color:#fff
    style Vector fill:#2563eb,stroke:#333,stroke-width:2px,color:#fff"
/>

***

1. Creating a Group [#1-creating-a-group]

Documents are organized into **Groups** (e.g., "HR Policies", "Technical Manuals"). Retrieval settings are defined at the Group level.

Chunking Strategies [#chunking-strategies]

How should we split your documents?

<Cards>
  <Card icon={<Split />} title="General Chunking">
    Splits text linearly based on character count.

    * **Best for:** Simple text files, FAQs.
    * **Settings:** Max Chunk Length (e.g., 500 chars), Overlap.
  </Card>

  <Card icon={<FileText />} title="Parent-Child Chunking">
    **High Precision.** Splits documents into small "Child" chunks for precise searching, but retrieves the larger "Parent" chunk for context.

    * **Best for:** Complex documents where a single sentence loses meaning without its surrounding paragraph.
  </Card>
</Cards>

Retrieval Configuration [#retrieval-configuration]

* **Vector Search:** Matches semantic meaning (concepts).
* **Full Text:** Matches exact keywords.
* **Hybrid Search (Recommended):** Combines both scores for best results.
* **Reranking:** Re-orders the top results using a high-precision model to ensure the most relevant chunk is first.

***

2. Managing Documents [#2-managing-documents]

Once a group is created, you can upload and manage the data.

<Steps>
  <Step>
    Upload & Pre-processing [#upload--pre-processing]

    Upload PDF, DOCX, TXT, or MD files. You can enable **Cleaning Rules** to automatically strip URLs, emails, or excessive whitespace during extraction.
  </Step>

  <Step>
    Chunk Management (Crucial) [#chunk-management-crucial]

    After processing, the file exists only as a **List of Text Chunks**.

    * **Edit Chunks:** If the PDF parser messed up a table, you can click on the chunk and fix the text manually.
    * **Add Chunks:** You can manually add a text block (e.g., a quick policy update) without uploading a file.
    * **Delete Chunks:** Remove irrelevant footers or legal disclaimers that confuse the AI.
  </Step>
</Steps>

***

3. Connecting to Agent [#3-connecting-to-agent]

Creating a database is useless if the Agent can't access it.

You must link your Knowledge Base Group to an Agent in the **[Agent Studio](/build/agent/intelligence)**.

<Callout type="info" title="Search Triggers">
  Simply linking the KB doesn't mean the agent searches it every time. You must define a **Search Strategy** (e.g., *Always Search*, *Smart Classifier*, or *Script Tool*).

  Read the **[Agent Intelligence Guide](/build/agent/intelligence#knowledge-base)** to configure *when* the agent searches.
</Callout>

***

Roadmap: Future Capabilities [#roadmap-future-capabilities]

We are actively expanding our Knowledge Engine.

<Cards>
  <Card icon={<RefreshCw />} title="Dynamic Data Sources">
    **Live Sync.** Instead of manual uploads, connect to **Google Drive**, **Notion**, or a **Website URL**. The system will periodically re-crawl and re-index the data to keep the agent up to date automatically.
  </Card>

  <Card icon={<Network />} title="GraphRAG">
    **Knowledge Graph.** Moving beyond simple vectors. We plan to map relationships between entities (e.g., "Product A *is compatible with* Product B"). This allows the agent to answer complex reasoning questions that standard RAG fails at.
  </Card>
</Cards>