Evidence-ledger draft

Evidence-Ledger Synthesis of Retrieval-Augmented Generation — Claim ledger

CSV-backed claim ledger tying paper claims to paper IDs and evidence status.

paper_idclaimclaim_statusevidence_statussource_depthsource_quotepage_or_sectiontaxonomy_fitaudit_status
2510.22344v1We introduce a novel agentic RAG architecture centered on an Iterative Refinement loop.preliminary-linkedhas evidence rowfull-textOn HotpotQA, it achieves an F1-score of 0.453—an absolute improvement of 8.3 points over the strongest iterative baseline—establishing a new state-of-the-art for this class of methods on these benchmarks.Abstractin-scope: LLM extractor confirmed direction matchneeds work; filled but source-depth unclear; report=audit_report.md
2502.01113v3We introduce a graph foundation model for retrieval augmented generation (GFM-RAG), powered by a novel query-dependent GNN to enable efficient multi-hop retrieval within a single step.supportedhas evidence rowfull-textThis supports the opinion that GPT-4o-mini generally outperforms GPT-3.5-turbo in constructing high quality KG-index, which is crucial for the graph-enhanced retrieval.1in-scope: LLM extractor confirmed direction matchneeds work; full-text verified; report=audit_report.md
2210.15133v1The proposed ROM enables term importance information to help language model pre-training thus achieving better performance on multiple passage retrieval benchmarks.supportedhas evidence rowfull-textHow- ever, the language model trained by the random masking strategy is flawed. 3.3 Retrieval Oriented Masking As mentioned above, term importance is instruc- tive for passage retrieval.4.4 Evaluation Resultsin-scope: LLM extractor confirmed direction matchneeds work; full-text verified; report=audit_report.md
2002.08909v1We demonstrate the effectiveness of Retrieval-Augmented Language Model pre-training (REALM) by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA).preliminary-linkedhas evidence rowfull-textwe find that we outperform all previous methods by a significant margin (4-16% absolute accuracy)Abstractin-scope: LLM extractor confirmed direction matchneeds work; filled but source-depth unclear; report=audit_report.md
2505.18906v2This paper presents the first systematic mapping study of Federated RAG, covering literature published between 2020 and 2025.supportedhas evidence rowfull-text+12.7% QA accuracy (59.8 →72.5) using SGXExtended Resources and Comparative Synthesisin-scope: LLM extractor confirmed direction matchneeds work; full-text verified; report=audit_report.md