Building a Multi-Agent RAG System with LangGraph

Overview

Building Ledger Lens taught me a lot about the practical challenges of production RAG systems. In this post I'll walk through the architecture decisions and the lessons learned along the way.

Why Multi-Agent?

A single LLM call isn't enough for complex financial queries. You need specialized agents for retrieval, reasoning, and validation. LangGraph makes it straightforward to wire these together as a directed graph where each node has a clear responsibility.

The graph looks roughly like this:

retrieval → reasoning → validation → response
               ↑                        |
               └──── retry if flagged ──┘

The Architecture

The system consists of three core agents:

Retrieval Agent — queries Pinecone for semantically relevant financial documents
Reasoning Agent — synthesizes retrieved context into structured answers
Validation Agent — runs hallucination detection before returning results

Each agent is a LangGraph node. State flows between them as a typed dict, which keeps things debuggable.

Hallucination Detection

This was the hardest part. We use a cross-encoder to score the faithfulness of each claim against the retrieved sources. Anything below a confidence threshold gets flagged and either regenerated or returned with a disclaimer.

def validate_response(state: AgentState) -> AgentState:
    score = cross_encoder.predict([
        state["query"],
        state["response"]
    ])
    if score < CONFIDENCE_THRESHOLD:
        state["needs_retry"] = True
    return state

Lessons Learned

Chunk size matters enormously for financial data — smaller chunks (256 tokens) outperformed larger ones
Hybrid search (dense + sparse) significantly improves recall on numerical queries
Always validate before returning — users trust financial data implicitly
LangGraph's checkpointing is invaluable for debugging long agent chains