Evose
BuildConcepts

RAG Fundamentals

What RAG is · Why you need it · How to tune · When not to use RAG

RAG (Retrieval Augmented Generation) is how Evose's knowledge base works. Understanding it saves you 80% of the detours.

What It Is

An LLM only knows the world in its training data. To make it answer your company's questions, you have two paths:

PathDescriptionCost
Fine-tuningContinue training the LLM on company dataHigh (data prep + GPU + maintenance)
RAGAt inference time, retrieve relevant chunks from the knowledge base and inject them into the promptLow (just feed documents to the vector store)

For the vast majority of enterprise scenarios, RAG is enough.

RAG's Three-Step Pipeline

1. Ingest (one-time)
   Documents → parse → chunk → embed → store in vector DB

2. Retrieve (per query)
   Question → embed → similarity search → top-K chunks

3. Generate (per query)
   Prompt template + top-K chunks + user question → LLM → answer

Key Concepts

ConceptOne-liner
ChunkingCut documents into ~500–2000 token segments; each segment is embedded independently
EmbeddingUse an embedding model to convert text into vectors
Similarity retrievalUse cosine/inner product to find the most semantically similar top-K chunks
RerankingUse a more precise reranker to re-order the top-K
TopKNumber of chunks returned by retrieval (default 5; commonly 5–10)
Prompt templateThe format used to insert retrieval results into the prompt

Three Chunking Strategies in Evose's Knowledge Base

StrategyBehaviorBest for
Fixed lengthCut every N tokens with overlapGeneral-purpose fallback, simple documents
Semantic splitSplit along natural boundaries (paragraph / chapter / sentence)Structured documents (PDF / Markdown)
Smart splitUse an LLM to detect semantic boundaries while preserving completenessHigh-quality scenarios; slightly higher cost

Knowledge base · Chunking in detail

Tuning Map

Inaccurate answers? Walk through this map:

SymptomLikely causeFix
Off-topicRetrieval missedIncrease TopK · use a better embedding model · add a reranker
Info exists in KB but answer omits itChunks too small, key info scatteredLarger chunk size · enable semantic chunking
Fabrication (hallucination)LLM is freelancingStrengthen the prompt: "If something isn't in the KB, say you don't know"
SlowBoth model and retrieval slowFaster embedding · smaller TopK · cache common questions
Cross-section Q&A (compare/summarize) failsRAG fetches a few chunks; not great at cross-document synthesisUse a Workflow for multi-step retrieval + synthesis

When Not to Use RAG

RAG is not a silver bullet. Switch tools in these cases:

Your scenarioBetter approach
Need real-time data (inventory, price, order status)Use a Tool to call the live API; don't freeze data into the KB
Need precise math/computationUse LLM + Code tool, not document retrieval
Need full-text comparison (find every diff between two contracts)RAG only fetches some chunks — coverage isn't guaranteed. Use a Workflow to iterate the full text and compare
Very few documents (< 5)Stuff the full text into the prompt; skip the RAG complexity
Structured data queries (DB tables)Use Data source + Workflow + SQL tool — not RAG

Three Common Pitfalls

Pitfall 1: Dump every document into the knowledge base

Retrieval quality = max(chunk quality, embedding quality). Garbage in = garbage out. Clean, deduplicate, and normalize first.

Pitfall 2: Blindly crank TopK to 50

More retrieval means longer LLM context and worse hallucination (the model picks signals from noise). Start with 5; cap at 8–10.

Pitfall 3: Ship without evaluation

Build a Workflow that runs 50 real questions and have humans label whether the RAG answer is correct. It's the cheapest insurance for quality.

Next Steps

On this page