| 2405.18781v2 | We establish that with pure self-attention, the exponential convergence of tokens to a common representation holds for a broad class of attention masks. | supported | has evidence row | full-text | (26) 19 C.1.1 Exponential convergence rate Having established the convergence in (26), we then establish the exponential convergence rate. | 4.1 | in-scope: LLM extractor confirmed direction match | pass; full-text verified; report=audit_report.md |
| 2510.17189v1 | We propose Efficient log2 quantized Softmax (E2Softmax), a hardware-friendly softmax algorithm based on log2 quantization of exponent function. | supported | has evidence row | full-text | In comparison to state-of-the-art custom hardware, SOLE provides 2.82x and 3.32x area-efficiency improvements and 3.04x and 3.86x energy-efficiency improvements for Softmax and LayerNorm, respectively II. | Abstract | in-scope: LLM extractor confirmed direction match | pass; full-text verified; report=audit_report.md |
| 2403.20284v1 | LayerNorm changes more than any other components when fine-tuned for different General Language Understanding Evaluation (GLUE) tasks. | supported | has evidence row | full-text | We find that output LayerNorm changes more than any other components when fine-tuned for different General Language Understanding Evaluation (GLUE) tasks. | 3.1 Fine-tuning results | in-scope: LLM extractor confirmed direction match | pass; full-text verified; report=audit_report.md |
| 2305.02582v2 | LayerNorm is crucial to the expressivity of the multi-head attention layer that follows it. | supported | has evidence row | full-text | With projection, the model converges faster compared to the model without projection, which required 3x more steps. | 4.1 | in-scope: LLM extractor confirmed direction match | pass; full-text verified; report=audit_report.md |
| 2509.21042v4 | LayerNorm induces recency bias in Transformer decoders without positional encoding. | supported | has evidence row | full-text | Compared to row 3, the recency bias observed in row 4 is less pronounced, as reflected by lower RP values (RP = 0.5931 ford= 16 andRP = 0.5457 for d= 64 in layer 2). | 3.2 LayerNorm | in-scope: LLM extractor confirmed direction match | pass; full-text verified; report=audit_report.md |
| 2405.04134v1 | We have derived an alternative expression (Eq. (6)) for LayerNorm that makes these features more evident. | supported | has evidence row | full-text | CONCLUSION We have investigated LayerNorm as a composition of simpler functions—projection, scaling, and then affine transformation—and have derived an alternative expres- sion (Eq. | 3 | in-scope: LLM extractor confirmed direction match | pass; full-text verified; report=audit_report.md |