Key points
- Chunking: size-based (+overlap to keep context), structure-based (split on headers — best for markdown), or semantic. No universal best.
- Embeddings: turn text into vectors of meaning; semantic search matches query↔chunk by meaning (course recommends Voyage AI).
- Full flow (7 steps): chunk → embed → normalize → store in vector DB → embed query → similarity search → assemble prompt. Cosine similarity (→1 = similar).
- Hybrid search: run semantic + BM25 (lexical) in parallel; merge via Reciprocal Rank Fusion.
- Reranking: an LLM reorders candidates by relevance (use doc IDs for efficiency).
- Contextual retrieval: prepend LLM-generated situating context to each chunk before embedding so chunks don't lose document context.
Related
Sources
- 2026-06-28-claude-course
Compiled from
wiki/study/claude-course/Retrieval-Augmented-Generation.md · git is the source of truth