Overview
Building Ledger Lens taught me a lot about the practical challenges of production RAG systems. In this post I'll walk through the architecture decisions and the lessons learned along the way.
Why Multi-Agent?
A single LLM call isn't enough for complex financial queries. You need specialized agents for retrieval, reasoning, and validation. LangGraph makes it straightforward to wire these together as a directed graph where each node has a clear responsibility.
The graph looks roughly like this:
retrieval → reasoning → validation → response
↑ |
└──── retry if flagged ──┘
The Architecture
The system consists of three core agents:
- Retrieval Agent — queries Pinecone for semantically relevant financial documents
- Reasoning Agent — synthesizes retrieved context into structured answers
- Validation Agent — runs hallucination detection before returning results
Each agent is a LangGraph node. State flows between them as a typed dict, which keeps things debuggable.
Hallucination Detection
This was the hardest part. We use a cross-encoder to score the faithfulness of each claim against the retrieved sources. Anything below a confidence threshold gets flagged and either regenerated or returned with a disclaimer.
def validate_response(state: AgentState) -> AgentState:
score = cross_encoder.predict([
state["query"],
state["response"]
])
if score < CONFIDENCE_THRESHOLD:
state["needs_retry"] = True
return state
Lessons Learned
- Chunk size matters enormously for financial data — smaller chunks (256 tokens) outperformed larger ones
- Hybrid search (dense + sparse) significantly improves recall on numerical queries
- Always validate before returning — users trust financial data implicitly
- LangGraph's checkpointing is invaluable for debugging long agent chains