Evose
BuildData

Data Source

File storage and parsing layer · 4 import methods · Relationship with the knowledge base

A Data source is the upstream of the knowledge base. It owns file storage, parsing, and metadata. One data source can feed multiple knowledge bases, and can also be queried directly by Workflow nodes.

4 Import Methods

MethodUse
Bulk uploadDrag in a batch of documents
Reference an existing data sourceLink in another workspace's or org-level data source
Web crawlProvide a URL; auto-fetch and parse
App sync (SaaS)Sync from Notion / Feishu / DingTalk / Confluence, etc.
Manual textPaste text directly

Supported Formats

Focused on unstructured documents:

FormatParse behavior
PDF / WordText extraction + paragraph detection + image OCR (optional)
TXT / MarkdownRead directly
PPTPer-page text + image extraction
CSV / ExcelPer-row / per-sheet parsing

Use a SQL tool for structured data

Database tables / API JSON should not enter a data source. Use HTTP tool or a SQL tool for direct queries.

Document Management

Each document carries full metadata:

FieldDescription
Name / type / sizeThe file itself
Parse statusPending / parsing / ready / failed
SourceUpload / crawl / reference / sync
Upload time / creatorAudit
Word count / token countCapacity estimate

Operations: Re-parse · Preview parsed result · Soft delete (recoverable).

File Groups and ACL

A data source can have file groups, each with its own ACL (view / use / edit / manage). Common pattern:

Data source: Product Documentation
├─ Public/ (visible to everyone)
│   ├─ User Manual.pdf
│   └─ FAQ.md
├─ Internal/ (visible to R&D role)
│   ├─ Design.pdf
│   └─ Roadmap.md
└─ Confidential/ (PMs only)
    └─ Sensitive Business Data.xlsx

Import Settings

When uploading, you can specify:

SettingDescription
Target groupPick existing / create new
Chunk modeSmart / general (only affects later vectorization)
File-type restrictionsAllowlist / denylist
Size limitsPer-file max / total quota

Data Source vs Knowledge Base

┌───────────────────────────────────────────────────┐
│  Data Source                                      │
│  Raw files + metadata + parsed result (text/table)│
└───────────────────┬───────────────────────────────┘
                    ↓ feeds                ↓ direct read (Workflow node)
┌───────────────────┐                  ┌──────────────┐
│  Knowledge Base    │                  │  Workflow    │
│  Chunked + vectors │                  │  Read on demand │
└───────────────────┘                  └──────────────┘
                    ↑ retrieval
                Agent / Chatflow

Key boundaries:

  • A data source does not have to enter a knowledge base (Workflow can consume directly)
  • The same data source can feed multiple knowledge bases (with different chunking strategies)

Anti-Patterns

  • Binding a data source directly to an Agent (should bind a knowledge base instead)
  • Uploading the same document twice (no auto-dedup; wastes vector space)
  • Not splitting huge files (PDFs > 100MB parse slowly and are easy to break)

Next Steps

On this page