PolicyRAG — auditable QA over SEC filings

The problem

LLMs answer questions about SEC filings fluently — and sometimes wrongly. For analysts and compliance teams a confident wrong answer is worse than no answer, so the system has to show receipts.

Approach

PolicyRAG is a full RAG pipeline behind a chat interface. Documents are chunked into ChromaDB, retrieval goes through a reranker, and generation is constrained to cite the numbered context passages it used. Every answer is then evaluated on three axes — faithfulness (NLI-based scoring of claims against the retrieved context), citation validity, and context relevance — and the scores are surfaced in the UI next to the answer instead of hidden in logs. The backend is FastAPI with SSE streaming and per-request JWT verification through Supabase auth; the LLM provider is swappable behind one interface.

Results

[Add your evaluation numbers: faithfulness score distribution, share of answers with fully valid citations, retrieval hit rate on a held-out question set.]

What broke

[The honest section — chunking strategy? NLI false positives on numeric claims? citation drift across providers?]