Evidence-ledger draft

Automated Evidence-Ledger Production of Research Papers — Claim ledger

CSV-backed claim ledger tying paper claims to paper IDs and evidence status.

paper_idclaimclaim_statusevidence_statussource_depthsource_quotepage_or_sectiontaxonomy_fitaudit_status
2405.18781v2We establish that with pure self-attention, the exponential convergence of tokens to a common representation holds for a broad class of attention masks.supportedhas evidence rowfull-text(26) 19 C.1.1 Exponential convergence rate Having established the convergence in (26), we then establish the exponential convergence rate.4.1in-scope: LLM extractor confirmed direction matchpass; full-text verified; report=audit_report.md
2510.17189v1We propose Efficient log2 quantized Softmax (E2Softmax), a hardware-friendly softmax algorithm based on log2 quantization of exponent function.supportedhas evidence rowfull-textIn comparison to state-of-the-art custom hardware, SOLE provides 2.82x and 3.32x area-efficiency improvements and 3.04x and 3.86x energy-efficiency improvements for Softmax and LayerNorm, respectively II.Abstractin-scope: LLM extractor confirmed direction matchpass; full-text verified; report=audit_report.md
2403.20284v1LayerNorm changes more than any other components when fine-tuned for different General Language Understanding Evaluation (GLUE) tasks.supportedhas evidence rowfull-textWe find that output LayerNorm changes more than any other components when fine-tuned for different General Language Understanding Evaluation (GLUE) tasks.3.1 Fine-tuning resultsin-scope: LLM extractor confirmed direction matchpass; full-text verified; report=audit_report.md
2305.02582v2LayerNorm is crucial to the expressivity of the multi-head attention layer that follows it.supportedhas evidence rowfull-textWith projection, the model converges faster compared to the model without projection, which required 3x more steps.4.1in-scope: LLM extractor confirmed direction matchpass; full-text verified; report=audit_report.md
2509.21042v4LayerNorm induces recency bias in Transformer decoders without positional encoding.supportedhas evidence rowfull-textCompared to row 3, the recency bias observed in row 4 is less pronounced, as reflected by lower RP values (RP = 0.5931 ford= 16 andRP = 0.5457 for d= 64 in layer 2).3.2 LayerNormin-scope: LLM extractor confirmed direction matchpass; full-text verified; report=audit_report.md
2405.04134v1We have derived an alternative expression (Eq. (6)) for LayerNorm that makes these features more evident.supportedhas evidence rowfull-textCONCLUSION We have investigated LayerNorm as a composition of simpler functions—projection, scaling, and then affine transformation—and have derived an alternative expres- sion (Eq.3in-scope: LLM extractor confirmed direction matchpass; full-text verified; report=audit_report.md