Quantifying Retrieval Quality in GraphRAG: A Schema-Agnostic Approach
While LLMs have achieved significant success in natural language tasks, their tendency to hallucinate remains a critical challenge. RAG tries to address this issue by grounding models in external data; however, standard vector-based RAGs often fail when working with highly interconnected datasets. GraphRAG has emerged as a superior alternative in this setting by modelling the relational topology, yet evaluating GraphRAGs remains challenging. Current benchmarks predominantly focus on the final LLM-generated output frequently overlooking the structural accuracy of the underlying retrieval process. In this paper, we propose a novel schema-agnostic framework for the automated generation of synthetic evaluation datasets from KGs. Unlike previous approaches, our framework establishes a rigorous, deterministic ground truth to specifically quantify the retriever performance across nine distinct query categories, including multi-hop and aggregation tasks. We demonstrate the utility of this benchmark by applying it to a biochemical KG and evaluating four diverse retrieval architectures. Our results indicate that agentic, LLM-driven retrievers provide the highest recall and reasoning capacity, effectively navigating complex topologies where other methods struggle. This work provides a robust, scalable methodology for performance tracking, shifting the evaluation of GraphRAG toward a more topologically precise standard.
Thibaud Vanmechelen, Alexandre Achten, Zaineb Gabsi, Sabri Skhiri, Quantifying Retrieval Quality in GraphRAG: A Schema-Agnostic Approach, in Proceedings of the Knowledge Graphs and Large Language Models Workshop 2026 (KG-LLM@LREC26), May 2026.
Click here to access the paper.