Data Source

File storage and parsing layer · 4 import methods · Relationship with the knowledge base

A Data source is the upstream of the knowledge base. It owns file storage, parsing, and metadata. One data source can feed multiple knowledge bases, and can also be queried directly by Workflow nodes.

4 Import Methods

Method	Use
Bulk upload	Drag in a batch of documents
Reference an existing data source	Link in another workspace's or org-level data source
Web crawl	Provide a URL; auto-fetch and parse
App sync (SaaS)	Sync from Notion / Feishu / DingTalk / Confluence, etc.
Manual text	Paste text directly

Supported Formats

Focused on unstructured documents:

Format	Parse behavior
PDF / Word	Text extraction + paragraph detection + image OCR (optional)
TXT / Markdown	Read directly
PPT	Per-page text + image extraction
CSV / Excel	Per-row / per-sheet parsing

Use a SQL tool for structured data

Database tables / API JSON should not enter a data source. Use HTTP tool or a SQL tool for direct queries.

Document Management

Each document carries full metadata:

Field	Description
Name / type / size	The file itself
Parse status	Pending / parsing / ready / failed
Source	Upload / crawl / reference / sync
Upload time / creator	Audit
Word count / token count	Capacity estimate

Operations: Re-parse · Preview parsed result · Soft delete (recoverable).

File Groups and ACL

A data source can have file groups, each with its own ACL (view / use / edit / manage). Common pattern:

Data source: Product Documentation
├─ Public/ (visible to everyone)
│   ├─ User Manual.pdf
│   └─ FAQ.md
├─ Internal/ (visible to R&D role)
│   ├─ Design.pdf
│   └─ Roadmap.md
└─ Confidential/ (PMs only)
    └─ Sensitive Business Data.xlsx

Import Settings

When uploading, you can specify:

Setting	Description
Target group	Pick existing / create new
Chunk mode	Smart / general (only affects later vectorization)
File-type restrictions	Allowlist / denylist
Size limits	Per-file max / total quota

Data Source vs Knowledge Base

┌───────────────────────────────────────────────────┐
│  Data Source                                      │
│  Raw files + metadata + parsed result (text/table)│
└───────────────────┬───────────────────────────────┘
                    ↓ feeds                ↓ direct read (Workflow node)
┌───────────────────┐                  ┌──────────────┐
│  Knowledge Base    │                  │  Workflow    │
│  Chunked + vectors │                  │  Read on demand │
└───────────────────┘                  └──────────────┘
                    ↑ retrieval
                Agent / Chatflow

Key boundaries:

A data source does not have to enter a knowledge base (Workflow can consume directly)
The same data source can feed multiple knowledge bases (with different chunking strategies)

Anti-Patterns

Binding a data source directly to an Agent (should bind a knowledge base instead)
Uploading the same document twice (no auto-dedup; wastes vector space)
Not splitting huge files (PDFs > 100MB parse slowly and are easy to break)

Next Steps

Connect a data source to a knowledge base → Knowledge base
Read directly in a Workflow → Workflow · Data nodes

On this page