Knowledge Base
Multi-format ingest · Three chunking strategies · Vectorization · ACL · Retrieval tuning
The knowledge base is an Agent's factual foundation. Structure and vectorize enterprise documents so the Agent retrieves before answering.
→ Unfamiliar with RAG? Start with RAG fundamentals.
Multi-Format Ingest
| Format category | Supported |
|---|---|
| Documents | PDF · Word · TXT · Markdown · PPT |
| Tables | CSV · Excel |
| Structured | JSON |
| Web | URL crawl · REST API |
| Database | Relational / non-relational |
Processing Pipeline
You can view status, re-parse, and preview intermediate results for each step on the document detail page.
Three Chunking Strategies
| Strategy | Behavior | Best for |
|---|---|---|
| Fixed length | Cut every N tokens with overlap | Simple fallback, general purpose |
| Semantic split | Split on natural boundaries (paragraph / chapter / sentence) | PDF / Markdown / policy docs |
| Smart split | Use an LLM to detect semantic boundaries | High-quality scenarios; slightly higher cost |
Where to Adjust Chunking
Knowledge base → Settings → Chunking strategy:
- Chunk size (tokens): default 1000, range 200–4000
- Overlap (tokens): default 200, range 0–500
- Strategy: fixed / semantic / smart
Vectorization
| Setting | Default | Notes |
|---|---|---|
| Embedding model | Org default | Set the org default at Model platform · Default model configuration |
| Multilingual | Chinese-English mix works out of the box | Recommended: a multilingual embedding |
| Custom embedding | Self-hosted supported under Private | See Model deployment |
Access Control (ACL)
Knowledge base permissions have 4 levels:
| Permission | What you can do |
|---|---|
| View | See it exists, but cannot search |
| Use | Can be retrieved by Agents / Workflows |
| Edit | Upload / delete documents · modify settings |
| Manage | + change ACL · delete the knowledge base |
Subjects can be users / roles / departments, inherited from org-level resource policies and refinable inside the knowledge base.
Versioning and Approval
- Version tracking: each document change creates a version snapshot
- Approval flow (optional): when enabled, document add/modify must be approved before entering the vector store
Using It in RAG
A knowledge base can be invoked in three ways:
| Method | Description |
|---|---|
| Passive retrieval | Once bound to an Agent, retrieves automatically each turn |
| Active query | Add a knowledge retrieval node in Chatflow / Workflow with explicit TopK / filters / sorting |
| Knowledge recommendations | The Workbench chat panel recommends related documents in the right rail |
Three-Step Retrieval Tuning
By effect, work top-down:
1 · Adjust TopK
Default 5. Try this first.
- Off-topic → try 8 / 10
- Answers too long and unfocused → back to 5
2 · Add a Reranker
Enable Reranker in the knowledge retrieval node to re-order the initial top-K. Significantly improves precision; latency goes up slightly.
3 · Adjust Chunking
Only after the first two haven't solved it:
| Symptom | Adjust |
|---|---|
| Key info split across chunks | Increase chunk size to 1500–2000 |
| Chunks too broad | Reduce to 500–800 |
| Tables/code fragmented | Enable smart split |
Usage Tracking
Knowledge base detail → Usage:
- Which Agents / Workflows / Chatflows reference it
- Retrieval hit rate, empty hits
- User feedback (satisfied / not satisfied)
- Knowledge gap detection (frequent but low-hit queries — tells you which docs to add)
Data Source vs Knowledge Base
They are two layers of abstraction, don't mix them:
| Data Source | Knowledge Base | |
|---|---|---|
| Role | File storage + parsing | Retrieval service |
| Vectorized? | No | Yes |
| How to use | Read directly in Workflow / feed to KB | Retrieved by Agent / Workflow |
| Structured data | Suitable | Not suitable |
Anti-Patterns
- Dumping a database table to CSV and stuffing it into the knowledge base — use data source + SQL tool instead
- Putting real-time data (orders, inventory) in the knowledge base — call live APIs via HTTP tool
- A single huge knowledge base for everything — split by business domain for better control
Next Steps
- Hands-on → First Agent · with knowledge base
- Data ETL → Data source
- Tuning → RAG fundamentals