Knowledge Base

Multi-format ingest · Three chunking strategies · Vectorization · ACL · Retrieval tuning

The knowledge base is an Agent's factual foundation. Structure and vectorize enterprise documents so the Agent retrieves before answering.

→ Unfamiliar with RAG? Start with RAG fundamentals.

Multi-Format Ingest

Format category	Supported
Documents	PDF · Word · TXT · Markdown · PPT
Tables	CSV · Excel
Structured	JSON
Web	URL crawl · REST API
Database	Relational / non-relational

Processing Pipeline

Ingest → format detection → content parsing → smart chunking → embedding → index build → ready

You can view status, re-parse, and preview intermediate results for each step on the document detail page.

Three Chunking Strategies

Strategy	Behavior	Best for
Fixed length	Cut every N tokens with overlap	Simple fallback, general purpose
Semantic split	Split on natural boundaries (paragraph / chapter / sentence)	PDF / Markdown / policy docs
Smart split	Use an LLM to detect semantic boundaries	High-quality scenarios; slightly higher cost

Defaults are good enough

The default uses smart chunking. Don't tune chunking parameters unless you have explicit metrics showing retrieval quality is lacking.

Where to Adjust Chunking

Knowledge base → Settings → Chunking strategy:

Chunk size (tokens): default 1000, range 200–4000
Overlap (tokens): default 200, range 0–500
Strategy: fixed / semantic / smart

Vectorization

Setting	Default	Notes
Embedding model	Org default	Set the org default at Model platform · Default model configuration
Multilingual	Chinese-English mix works out of the box	Recommended: a multilingual embedding
Custom embedding	Self-hosted supported under Private	See Model deployment

Switching embedding model = re-embed the entire knowledge base

Different embeddings have incompatible vector spaces. Switching reprocesses every document — the process cannot be interrupted.

Access Control (ACL)

Knowledge base permissions have 4 levels:

Permission	What you can do
View	See it exists, but cannot search
Use	Can be retrieved by Agents / Workflows
Edit	Upload / delete documents · modify settings
Manage	+ change ACL · delete the knowledge base

Subjects can be users / roles / departments, inherited from org-level resource policies and refinable inside the knowledge base.

Versioning and Approval

Version tracking: each document change creates a version snapshot
Approval flow (optional): when enabled, document add/modify must be approved before entering the vector store

Using It in RAG

A knowledge base can be invoked in three ways:

Method	Description
Passive retrieval	Once bound to an Agent, retrieves automatically each turn
Active query	Add a knowledge retrieval node in Chatflow / Workflow with explicit TopK / filters / sorting
Knowledge recommendations	The Workbench chat panel recommends related documents in the right rail

Three-Step Retrieval Tuning

By effect, work top-down:

1 · Adjust TopK

Default 5. Try this first.

Off-topic → try 8 / 10
Answers too long and unfocused → back to 5

2 · Add a Reranker

Enable Reranker in the knowledge retrieval node to re-order the initial top-K. Significantly improves precision; latency goes up slightly.

3 · Adjust Chunking

Only after the first two haven't solved it:

Symptom	Adjust
Key info split across chunks	Increase chunk size to 1500–2000
Chunks too broad	Reduce to 500–800
Tables/code fragmented	Enable smart split

Usage Tracking

Knowledge base detail → Usage:

Which Agents / Workflows / Chatflows reference it
Retrieval hit rate, empty hits
User feedback (satisfied / not satisfied)
Knowledge gap detection (frequent but low-hit queries — tells you which docs to add)

Data Source vs Knowledge Base

They are two layers of abstraction, don't mix them:

	Data Source	Knowledge Base
Role	File storage + parsing	Retrieval service
Vectorized?	No	Yes
How to use	Read directly in Workflow / feed to KB	Retrieved by Agent / Workflow
Structured data	Suitable	Not suitable

→ Data source

Anti-Patterns

Dumping a database table to CSV and stuffing it into the knowledge base — use data source + SQL tool instead
Putting real-time data (orders, inventory) in the knowledge base — call live APIs via HTTP tool
A single huge knowledge base for everything — split by business domain for better control

Next Steps

Hands-on → First Agent · with knowledge base
Data ETL → Data source
Tuning → RAG fundamentals

Knowledge Base

On this page