Retrieval Augmented Generation (RAG)

# Retrieval Augmented Generation (RAG) Retrieval Augmented Generation combines a retrieval system with a generative language model, so that responses are grounded in the content of a specific corpus rather than relying only on what the model learned during training. It is the dominant pattern for building question-answering and assistant systems over private or up-to-date knowledge bases, and sits on top of [[Large Language Model (LLM)]]s. ## How it works 1. **Retrieval**: the system retrieves relevant documents or passages from a corpus based on the input query. This is typically done with dense vector search, where both the query and documents are embedded into a high-dimensional space so semantically similar content can be found. 2. **Augmentation**: the retrieved documents are combined with the original query to form a richer input for the generator. 3. **Generation**: a generative model takes the augmented input and produces a response informed by the retrieved content, not just its pre-trained knowledge. ## Benefits - **Grounding**: answers can cite specific sources from the corpus, which reduces hallucination. - **Freshness**: the corpus can be updated without retraining the model. - **Scalability**: a single model can serve many different domains by swapping the underlying corpus. ## Applications - Customer support over internal knowledge bases - Research assistants that synthesise information from papers - Code assistants that retrieve from a specific codebase # References