Automated Evidence-Ledger Production of Research Papers

paper_id	claim	claim_status	evidence_status	source_depth	source_quote	page_or_section	taxonomy_fit	audit_status
2405.18781v2	We establish that with pure self-attention, the exponential convergence of tokens to a common representation holds for a broad class of attention masks.	supported	has evidence row	full-text	(26) 19 C.1.1 Exponential convergence rate Having established the convergence in (26), we then establish the exponential convergence rate.	4.1	in-scope: LLM extractor confirmed direction match	pass; full-text verified; report=audit_report.md
2510.17189v1	We propose Efficient log2 quantized Softmax (E2Softmax), a hardware-friendly softmax algorithm based on log2 quantization of exponent function.	supported	has evidence row	full-text	In comparison to state-of-the-art custom hardware, SOLE provides 2.82x and 3.32x area-efficiency improvements and 3.04x and 3.86x energy-efficiency improvements for Softmax and LayerNorm, respectively II.	Abstract	in-scope: LLM extractor confirmed direction match	pass; full-text verified; report=audit_report.md
2403.20284v1	LayerNorm changes more than any other components when fine-tuned for different General Language Understanding Evaluation (GLUE) tasks.	supported	has evidence row	full-text	We find that output LayerNorm changes more than any other components when fine-tuned for different General Language Understanding Evaluation (GLUE) tasks.	3.1 Fine-tuning results	in-scope: LLM extractor confirmed direction match	pass; full-text verified; report=audit_report.md
2305.02582v2	LayerNorm is crucial to the expressivity of the multi-head attention layer that follows it.	supported	has evidence row	full-text	With projection, the model converges faster compared to the model without projection, which required 3x more steps.	4.1	in-scope: LLM extractor confirmed direction match	pass; full-text verified; report=audit_report.md
2509.21042v4	LayerNorm induces recency bias in Transformer decoders without positional encoding.	supported	has evidence row	full-text	Compared to row 3, the recency bias observed in row 4 is less pronounced, as reflected by lower RP values (RP = 0.5931 ford= 16 andRP = 0.5457 for d= 64 in layer 2).	3.2 LayerNorm	in-scope: LLM extractor confirmed direction match	pass; full-text verified; report=audit_report.md
2405.04134v1	We have derived an alternative expression (Eq. (6)) for LayerNorm that makes these features more evident.	supported	has evidence row	full-text	CONCLUSION We have investigated LayerNorm as a composition of simpler functions—projection, scaling, and then affine transformation—and have derived an alternative expres- sion (Eq.	3	in-scope: LLM extractor confirmed direction match	pass; full-text verified; report=audit_report.md