Self-RAG and Corrective RAG — The Agent Evaluates Its Own Retrieval

Self-RAG and Corrective RAG — How Agents Evaluate Their Own Retrieval

In Part 1, we solved "where to search" with Query Routing. But what if the retrieved documents are useless? The biggest problem with Naive RAG is that it never evaluates retrieval quality. It throws search results directly to the LLM and hopes the LLM will somehow produce a good answer. In this post, we cover two key patterns where the Agent evaluates its own retrieval results and switches to a different strategy when quality is low.

Series: Part 1: Query Routing | Part 2 (this post) | Part 3: Production Pipeline

The Problem After Routing

Let's say Query Routing worked perfectly and selected the right data source. Even so, three failure modes remain.

1. Documents are completely irrelevant — You asked about "how to use conditional edges in LangGraph," but the retrieved documents are about "LCEL chaining in LangChain." The keywords are similar so vector similarity is high, but the documents are practically useless for generating an answer. The LLM combines such documents to produce plausible hallucinations.

2. Documents are only partially relevant — Out of 10 documents, only 3 are relevant and the other 7 are noise. When relevant information is buried in noise, the LLM generates unfocused answers.

3. Documents contradict each other — On "GIL changes in Python 3.12," one source claims "the GIL has been removed" while another says "optional disabling is now possible." If you feed both into the context without judgment, you get contradictory answers.

This is exactly why the Faithfulness score measured in RAG Evaluation is low. Without controlling retrieval quality, answer quality will be inconsistent no matter how good the LLM is.

To solve these problems, we need to insert an evaluation step between retrieval and generation. This is the core idea behind Self-RAG and CRAG.

Token	Purpose	Values	When Used
`[Retrieve]`	Is retrieval needed?	yes / no / continue	Before generation (retrieval decision)
`[IsRel]`	Is the document relevant to the query?	relevant / irrelevant	After retrieval (document filtering)
`[IsSup]`	Is the generated response grounded in documents?	fully / partially / no	After generation (hallucination check)
`[IsUse]`	Is the final response useful?	5 / 4 / 3 / 2 / 1	After generation (quality assessment)

Token

Purpose

Values

When Used

[Retrieve]

Is retrieval needed?

yes / no / continue

Before generation (retrieval decision)

[IsRel]

Is the document relevant to the query?

relevant / irrelevant

After retrieval (document filtering)

[IsSup]

Is the generated response grounded in documents?

fully / partially / no

After generation (hallucination check)

[IsUse]

Is the final response useful?

5 / 4 / 3 / 2 / 1

After generation (quality assessment)

Self-RAG and Corrective RAG — The Agent Evaluates Its Own Retrieval

Self-RAG and Corrective RAG — How Agents Evaluate Their Own Retrieval

The Problem After Routing

Self-RAG — Retrieve, Generate, and Self-Reflect

The 4 Reflection Tokens

Sign in to continue reading

Related Posts

TurboQuant in Practice — KV Cache Compression with llama.cpp and HuggingFace

TurboQuant Explained — Google's Extreme KV Cache Compression Algorithm

AgentScope Production Deployment — Runtime, Monitoring, Scaling