BuildConcepts
RAG Fundamentals
What RAG is · Why you need it · How to tune · When not to use RAG
RAG (Retrieval Augmented Generation) is how Evose's knowledge base works. Understanding it saves you 80% of the detours.
What It Is
An LLM only knows the world in its training data. To make it answer your company's questions, you have two paths:
| Path | Description | Cost |
|---|---|---|
| Fine-tuning | Continue training the LLM on company data | High (data prep + GPU + maintenance) |
| RAG | At inference time, retrieve relevant chunks from the knowledge base and inject them into the prompt | Low (just feed documents to the vector store) |
For the vast majority of enterprise scenarios, RAG is enough.
RAG's Three-Step Pipeline
Key Concepts
| Concept | One-liner |
|---|---|
| Chunking | Cut documents into ~500–2000 token segments; each segment is embedded independently |
| Embedding | Use an embedding model to convert text into vectors |
| Similarity retrieval | Use cosine/inner product to find the most semantically similar top-K chunks |
| Reranking | Use a more precise reranker to re-order the top-K |
| TopK | Number of chunks returned by retrieval (default 5; commonly 5–10) |
| Prompt template | The format used to insert retrieval results into the prompt |
Three Chunking Strategies in Evose's Knowledge Base
| Strategy | Behavior | Best for |
|---|---|---|
| Fixed length | Cut every N tokens with overlap | General-purpose fallback, simple documents |
| Semantic split | Split along natural boundaries (paragraph / chapter / sentence) | Structured documents (PDF / Markdown) |
| Smart split | Use an LLM to detect semantic boundaries while preserving completeness | High-quality scenarios; slightly higher cost |
→ Knowledge base · Chunking in detail
Tuning Map
Inaccurate answers? Walk through this map:
| Symptom | Likely cause | Fix |
|---|---|---|
| Off-topic | Retrieval missed | Increase TopK · use a better embedding model · add a reranker |
| Info exists in KB but answer omits it | Chunks too small, key info scattered | Larger chunk size · enable semantic chunking |
| Fabrication (hallucination) | LLM is freelancing | Strengthen the prompt: "If something isn't in the KB, say you don't know" |
| Slow | Both model and retrieval slow | Faster embedding · smaller TopK · cache common questions |
| Cross-section Q&A (compare/summarize) fails | RAG fetches a few chunks; not great at cross-document synthesis | Use a Workflow for multi-step retrieval + synthesis |
When Not to Use RAG
RAG is not a silver bullet. Switch tools in these cases:
| Your scenario | Better approach |
|---|---|
| Need real-time data (inventory, price, order status) | Use a Tool to call the live API; don't freeze data into the KB |
| Need precise math/computation | Use LLM + Code tool, not document retrieval |
| Need full-text comparison (find every diff between two contracts) | RAG only fetches some chunks — coverage isn't guaranteed. Use a Workflow to iterate the full text and compare |
| Very few documents (< 5) | Stuff the full text into the prompt; skip the RAG complexity |
| Structured data queries (DB tables) | Use Data source + Workflow + SQL tool — not RAG |
Three Common Pitfalls
Next Steps
- Build a knowledge base hands-on → Create a knowledge base · First Agent · with knowledge base
- See retrieval end-to-end → Observability · Trace
- Learn embedding / reranking models → Model platform