Study · Claude Course

Retrieval-Augmented Generation (RAG)

Instead of stuffing a whole document into the prompt (token limits, worse accuracy, higher cost), **chunk** the document and retrieve only the most relevant chunks per query.

type conceptstatus activerag · retrieval · embeddings · search

Key points

  • Chunking: size-based (+overlap to keep context), structure-based (split on headers — best for markdown), or semantic. No universal best.
  • Embeddings: turn text into vectors of meaning; semantic search matches query↔chunk by meaning (course recommends Voyage AI).
  • Full flow (7 steps): chunk → embed → normalize → store in vector DB → embed query → similarity search → assemble prompt. Cosine similarity (→1 = similar).
  • Hybrid search: run semantic + BM25 (lexical) in parallel; merge via Reciprocal Rank Fusion.
  • Reranking: an LLM reorders candidates by relevance (use doc IDs for efficiency).
  • Contextual retrieval: prepend LLM-generated situating context to each chunk before embedding so chunks don't lose document context.

Sources

  • 2026-06-28-claude-course
Compiled from wiki/study/claude-course/Retrieval-Augmented-Generation.md · git is the source of truth