Evose
BuildData

Knowledge Base

Multi-format ingest · Three chunking strategies · Vectorization · ACL · Retrieval tuning

The knowledge base is an Agent's factual foundation. Structure and vectorize enterprise documents so the Agent retrieves before answering.

→ Unfamiliar with RAG? Start with RAG fundamentals.

Multi-Format Ingest

Format categorySupported
DocumentsPDF · Word · TXT · Markdown · PPT
TablesCSV · Excel
StructuredJSON
WebURL crawl · REST API
DatabaseRelational / non-relational

Processing Pipeline

Ingest → format detection → content parsing → smart chunking → embedding → index build → ready

You can view status, re-parse, and preview intermediate results for each step on the document detail page.

Three Chunking Strategies

StrategyBehaviorBest for
Fixed lengthCut every N tokens with overlapSimple fallback, general purpose
Semantic splitSplit on natural boundaries (paragraph / chapter / sentence)PDF / Markdown / policy docs
Smart splitUse an LLM to detect semantic boundariesHigh-quality scenarios; slightly higher cost

Defaults are good enough

The default uses smart chunking. Don't tune chunking parameters unless you have explicit metrics showing retrieval quality is lacking.

Where to Adjust Chunking

Knowledge base → Settings → Chunking strategy:

  • Chunk size (tokens): default 1000, range 200–4000
  • Overlap (tokens): default 200, range 0–500
  • Strategy: fixed / semantic / smart

Vectorization

SettingDefaultNotes
Embedding modelOrg defaultSet the org default at Model platform · Default model configuration
MultilingualChinese-English mix works out of the boxRecommended: a multilingual embedding
Custom embeddingSelf-hosted supported under PrivateSee Model deployment

Switching embedding model = re-embed the entire knowledge base

Different embeddings have incompatible vector spaces. Switching reprocesses every document — the process cannot be interrupted.

Access Control (ACL)

Knowledge base permissions have 4 levels:

PermissionWhat you can do
ViewSee it exists, but cannot search
UseCan be retrieved by Agents / Workflows
EditUpload / delete documents · modify settings
Manage+ change ACL · delete the knowledge base

Subjects can be users / roles / departments, inherited from org-level resource policies and refinable inside the knowledge base.

Versioning and Approval

  • Version tracking: each document change creates a version snapshot
  • Approval flow (optional): when enabled, document add/modify must be approved before entering the vector store

Using It in RAG

A knowledge base can be invoked in three ways:

MethodDescription
Passive retrievalOnce bound to an Agent, retrieves automatically each turn
Active queryAdd a knowledge retrieval node in Chatflow / Workflow with explicit TopK / filters / sorting
Knowledge recommendationsThe Workbench chat panel recommends related documents in the right rail

Three-Step Retrieval Tuning

By effect, work top-down:

1 · Adjust TopK

Default 5. Try this first.

  • Off-topic → try 8 / 10
  • Answers too long and unfocused → back to 5

2 · Add a Reranker

Enable Reranker in the knowledge retrieval node to re-order the initial top-K. Significantly improves precision; latency goes up slightly.

3 · Adjust Chunking

Only after the first two haven't solved it:

SymptomAdjust
Key info split across chunksIncrease chunk size to 1500–2000
Chunks too broadReduce to 500–800
Tables/code fragmentedEnable smart split

Usage Tracking

Knowledge base detail → Usage:

  • Which Agents / Workflows / Chatflows reference it
  • Retrieval hit rate, empty hits
  • User feedback (satisfied / not satisfied)
  • Knowledge gap detection (frequent but low-hit queries — tells you which docs to add)

Data Source vs Knowledge Base

They are two layers of abstraction, don't mix them:

Data SourceKnowledge Base
RoleFile storage + parsingRetrieval service
Vectorized?NoYes
How to useRead directly in Workflow / feed to KBRetrieved by Agent / Workflow
Structured dataSuitableNot suitable

Data source

Anti-Patterns

  • Dumping a database table to CSV and stuffing it into the knowledge base — use data source + SQL tool instead
  • Putting real-time data (orders, inventory) in the knowledge base — call live APIs via HTTP tool
  • A single huge knowledge base for everything — split by business domain for better control

Next Steps