Retrieval Augmented Generation (RAG)

Retrieval Augmented Generation combines a retrieval system with a generative language model, so that responses are grounded in the content of a specific corpus rather than relying only on what the model learned during training. It is the dominant pattern for building question-answering and assistant systems over private or up-to-date knowledge bases, and sits on top of Large Language Model (LLM)s.

How it works

Retrieval: the system retrieves relevant documents or passages from a corpus based on the input query. This is typically done with dense vector search, where both the query and documents are embedded into a high-dimensional space so semantically similar content can be found.
Augmentation: the retrieved documents are combined with the original query to form a richer input for the generator.
Generation: a generative model takes the augmented input and produces a response informed by the retrieved content, not just its pre-trained knowledge.

Benefits

Grounding: answers can cite specific sources from the corpus, which reduces hallucination.
Freshness: the corpus can be updated without retraining the model.
Scalability: a single model can serve many different domains by swapping the underlying corpus.

Applications

Customer support over internal knowledge bases
Research assistants that synthesise information from papers
Code assistants that retrieve from a specific codebase

Benjamin Scholtz

Explorer

Retrieval Augmented Generation (RAG)

How it works

Benefits

Applications

Graph View

Table of Contents

Backlinks