Self-RAG and Corrective RAG — The Agent Evaluates Its Own Retrieval
Implement Self-RAG reflection tokens and CRAG quality-based fallback. Build retry/fallback logic with LangGraph conditional edges.

Self-RAG and Corrective RAG — How Agents Evaluate Their Own Retrieval
In Part 1, we solved "where to search" with Query Routing. But what if the retrieved documents are useless? The biggest problem with Naive RAG is that it never evaluates retrieval quality. It throws search results directly to the LLM and hopes the LLM will somehow produce a good answer. In this post, we cover two key patterns where the Agent evaluates its own retrieval results and switches to a different strategy when quality is low.
Series: Part 1: Query Routing | Part 2 (this post) | Part 3: Production Pipeline
The Problem After Routing
Let's say Query Routing worked perfectly and selected the right data source. Even so, three failure modes remain.
1. Documents are completely irrelevant — You asked about "how to use conditional edges in LangGraph," but the retrieved documents are about "LCEL chaining in LangChain." The keywords are similar so vector similarity is high, but the documents are practically useless for generating an answer. The LLM combines such documents to produce plausible hallucinations.
2. Documents are only partially relevant — Out of 10 documents, only 3 are relevant and the other 7 are noise. When relevant information is buried in noise, the LLM generates unfocused answers.
3. Documents contradict each other — On "GIL changes in Python 3.12," one source claims "the GIL has been removed" while another says "optional disabling is now possible." If you feed both into the context without judgment, you get contradictory answers.
This is exactly why the Faithfulness score measured in RAG Evaluation is low. Without controlling retrieval quality, answer quality will be inconsistent no matter how good the LLM is.
To solve these problems, we need to insert an evaluation step between retrieval and generation. This is the core idea behind Self-RAG and CRAG.
Related Posts

TurboQuant in Practice — KV Cache Compression with llama.cpp and HuggingFace
Build llama.cpp with turbo3, HuggingFace integration, memory calculator, config guide. 536K context on 70B models.

TurboQuant Explained — Google's Extreme KV Cache Compression Algorithm
Compress KV cache to 3-bit with PolarQuant + Lloyd-Max. 4.6x memory savings with zero accuracy loss, no retraining.

AgentScope Production Deployment — Runtime, Monitoring, Scaling
Docker deployment with agentscope-runtime, OpenTelemetry tracing, AgentScope Studio, RL fine-tuning, production checklist.