Advanced RAG Architecture

Understanding Self-RAG

Self-Reflective Retrieval-Augmented Generation (Self-RAG) is a new framework that trains a single LM to critique its own retrieval and generation quality.

How Self-RAG Works

Unlike traditional RAG which blindly retrieves and generates, Self-RAG introduces a critique loop to ensure quality and accuracy.

01

1. Retrieve?

The model first decides if retrieval is actually necessary for the query.

02

2. Retrieve

If needed, relevant documents are retrieved from the knowledge base.

03

3. Critique

The model evaluates retrieved docs for relevance and its own generation for support.

04

4. Refine

If quality is low, the model refines its answer to be more faithful.

Deep Evaluation Metrics

Self-RAG doesn't just generate text; it generates critique tokens to evaluate its own output across multiple dimensions. This allows for fine-grained control over the generation process.

Relevance

Are the retrieved documents actually relevant to the query?

Support (Faithfulness)

Is the answer fully supported by the retrieved context?

Usefulness

Is the response helpful and complete for the user?

Live Critique Example

Query

"What is the revenue growth?"

Critique Output

Is Relevant?Yes (0.98)
Is Supported?Partial (0.65)
Is Useful?Yes (0.85)
Action: Refine response to remove unsupported claims.