Retrieval-Augmented Generation (RAG)

RAG, or Retrieval-Augmented Generation, is an AI framework that enhances the accuracy and relevance of large language model (LLM) responses by integrating information retrieval from external knowledge sources before generating text.

A REST API that fetches user data from a DB and inserts it into a prompt that is then passed to an LLM can be considered a basic or simplistic form of RAG. But it lacks the sophistication of semantic search and relies on exact match/structured query

RAG search is a hybrid approach that:

RAG Pipeline (Simplified)

  1. Preprocess Data
    • Split documents into chunks (e.g., 500 words)
    • Generate embeddings for each chunk
    • Store embeddings in a vector database
  2. At Query Time
    • Embed the user’s question
    • Use vector search to retrieve top-k similar chunks
    • Inject those into the prompt: Based on the following documents: [doc1], [doc2], ... Answer: "How do I register for sales tax in Quebec?"
  3. Send to LLM and get answer

✅ Why RAG is Useful

Problem LLMs Have How RAG Helps
Hallucinations Grounds answers in real facts
Limited context window Retrieves only relevant info
No access to custom data Injects private/company data
Outdated model knowledge Real-time retrieval from fresh sources