Evidence-ledger draft

Open Problems in During Inference: An Evidence-Ledger Investigation — Claim ledger

CSV-backed claim ledger tying paper claims to paper IDs and evidence status.

paper_idclaimclaim_statusevidence_statussource_depthsource_quotepage_or_sectiontaxonomy_fitaudit_status
2309.05605v3We propose a lightweight memory injection method that can be employed to correct a multi-hop reasoning failure during inference.preliminary-linkedhas evidence rowfull-textby employing our method to inject the memory of 'The Great Barrier Reef' into the multi-hop prompt 'The largest coral reef system in the world is located off the coast of. . . ' during inference, we increase the probability of the next token 'Australia' by 189%; refer to Fig. 3 for details.Abstractin-scope: LLM extractor confirmed direction matchneeds work; filled but source-depth unclear; report=audit_report.md
2511.11834v1VC effectively reflects performance degradation without requiring labeled data.preliminary-linkedhas evidence rowfull-textOur results reveal a strong negative correlation between classification accuracy and log(VC) (correlation ρ < −0.90 in most cases), suggesting that VC effectively reflects performance degradation without requiring labeled data.Abstractin-scope: LLM extractor confirmed direction matchneeds work; filled but source-depth unclear; report=audit_report.md
1811.10649v1We model the analog noise of neuromorphic circuits as additive and multiplicative Gaussian noise.supportedhas evidence rowfull-textthe accuracy has been further increased to (99 :5%;89:1%;89:6%) for the three datasets when noise power equals the signal power.18in-scope: LLM extractor confirmed direction matchneeds work; full-text verified; report=audit_report.md
1807.06555v1One of our contributions is to apply the noise injection method during both training and inference of RNNs to realize that the noisy computation problem in neuromorphic computing can be largely mitigated by this method.preliminary-linkedhas evidence rowfull-textExperiments on the MNIST dataset reveal that with the presence of noise during computation and for all test RNN architectures, including LSTMs and vanilla RNNs, validation accuracy can be improved from (12:5%;10:5%;15%) to over (98%;92%;94%) , respectively.9in-scope: LLM extractor confirmed direction matchneeds work; filled but source-depth unclear; report=audit_report.md
2403.02181v3AdaInfer can achieve an average of 17.8% pruning ratio, and up to 43% on sentiment tasks, with nearly no performance drop (<1%)preliminary-linkedhas evidence rowfull-textObservation 1. Not all layers of LLMs are necessary during inference: Early Stopping works.3.2in-scope: LLM extractor confirmed direction matchneeds work; filled but source-depth unclear; report=audit_report.md