Retrieval Augmented Generation combines a retrieval system with a generative language model, so that responses are grounded in the content of a specific corpus rather than relying only on what the model learned during training. It is the dominant pattern for building question-answering and assistant systems over private or up-to-date knowledge bases, and sits on top of Large Language Model (LLM)s.
How it works
- Retrieval: the system retrieves relevant documents or passages from a corpus based on the input query. This is typically done with dense vector search, where both the query and documents are embedded into a high-dimensional space so semantically similar content can be found.
- Augmentation: the retrieved documents are combined with the original query to form a richer input for the generator.
- Generation: a generative model takes the augmented input and produces a response informed by the retrieved content, not just its pre-trained knowledge.
Benefits
- Grounding: answers can cite specific sources from the corpus, which reduces hallucination.
- Freshness: the corpus can be updated without retraining the model.
- Scalability: a single model can serve many different domains by swapping the underlying corpus.
Applications
- Customer support over internal knowledge bases
- Research assistants that synthesise information from papers
- Code assistants that retrieve from a specific codebase