Self-Reflective Retrieval-Augmented Generation (Self-RAG) is a new framework that trains a single LM to critique its own retrieval and generation quality.
Unlike traditional RAG which blindly retrieves and generates, Self-RAG introduces a critique loop to ensure quality and accuracy.
The model first decides if retrieval is actually necessary for the query.
If needed, relevant documents are retrieved from the knowledge base.
The model evaluates retrieved docs for relevance and its own generation for support.
If quality is low, the model refines its answer to be more faithful.
Self-RAG doesn't just generate text; it generates critique tokens to evaluate its own output across multiple dimensions. This allows for fine-grained control over the generation process.
Are the retrieved documents actually relevant to the query?
Is the answer fully supported by the retrieved context?
Is the response helpful and complete for the user?
Query
"What is the revenue growth?"
Critique Output