Study · Claude Course

Retrieval-Augmented Generation (RAG)

Instead of stuffing a whole document into the prompt (token limits, worse accuracy, higher cost), **chunk** the document and retrieve only the most relevant chunks per query.

type conceptstatus activerag · retrieval · embeddings · search

Key points

Chunking: size-based (+overlap to keep context), structure-based (split on headers — best for markdown), or semantic. No universal best.
Embeddings: turn text into vectors of meaning; semantic search matches query↔chunk by meaning (course recommends Voyage AI).
Full flow (7 steps): chunk → embed → normalize → store in vector DB → embed query → similarity search → assemble prompt. Cosine similarity (→1 = similar).
Hybrid search: run semantic + BM25 (lexical) in parallel; merge via Reciprocal Rank Fusion.
Reranking: an LLM reorders candidates by relevance (use doc IDs for efficiency).
Contextual retrieval: prepend LLM-generated situating context to each chunk before embedding so chunks don't lose document context.

Anthropic-API-Basics · Model-Context-Protocol

Sources

2026-06-28-claude-course

Compiled from wiki/study/claude-course/Retrieval-Augmented-Generation.md · git is the source of truth

Key points

Related

Sources