# Retrieval Augmented Generation (RAG)
Retrieval Augmented Generation combines a retrieval system with a generative language model, so that responses are grounded in the content of a specific corpus rather than relying only on what the model learned during training. It is the dominant pattern for building question-answering and assistant systems over private or up-to-date knowledge bases, and sits on top of [[Large Language Model (LLM)]]s.
## How it works
1. **Retrieval**: the system retrieves relevant documents or passages from a corpus based on the input query. This is typically done with dense vector search, where both the query and documents are embedded into a high-dimensional space so semantically similar content can be found.
2. **Augmentation**: the retrieved documents are combined with the original query to form a richer input for the generator.
3. **Generation**: a generative model takes the augmented input and produces a response informed by the retrieved content, not just its pre-trained knowledge.
## Benefits
- **Grounding**: answers can cite specific sources from the corpus, which reduces hallucination.
- **Freshness**: the corpus can be updated without retraining the model.
- **Scalability**: a single model can serve many different domains by swapping the underlying corpus.
## Applications
- Customer support over internal knowledge bases
- Research assistants that synthesise information from papers
- Code assistants that retrieve from a specific codebase
# References