Evidence-ledger draft

Evidence-Ledger Synthesis of FFN and Mixture-of-Experts in LLMs — Claim ledger

CSV-backed claim ledger tying paper claims to paper IDs and evidence status.

paper_idclaimclaim_statusevidence_statussource_depthsource_quotepage_or_sectiontaxonomy_fitaudit_status
2503.23007v1Existing Sparse Mixture of Experts (SMoE) models provide the same input to the top K-Experts in a TopK setting.supportedhas evidence rowfull-textS2MoE consistently outperforms both baselines, regardless of backbone size or the number of ex- perts activated, demonstrating its potential to scale up effectively in large language models. 4.3 Fine-tuning Result Pre-training weights.p1in-scope: taxonomy category matchpass; full-text verified; report=audit_report.md
2510.16411v1Inspired by this observation, we revise the graphical model of MoE in Figure 1A to incorporate relationships between experts and propose the novel SymphonySMoE, which leverages expert-to-expert interactions to enhance its token routing.supportedhas evidence rowfull-textThe improvements hold for both high-performing tasks such as QNLI (94.93 vs. 94.62) and SST2 (96.22 vs. 95.64), as well as more challenging ones such as WNLI (66.20 vs. 61.97).p4in-scope: taxonomy category matchpass; full-text verified; report=audit_report.md
2509.10025v13 Method 3.1 SMoE-V AE Architecture Our approach combines Variational Autoencoders with Sparse Mixture of Experts to enable interpretable analysis of expert specialization patterns.supportedhas evidence rowfull-textUnsupervised peaks around 7 experts and outperforms the supervised baseline constrained to 5 experts.p4in-scope: taxonomy category matchpass; full-text verified; report=audit_report.md
2204.09179v36 Conclusion In this work, we point out the representation collapse issue in sparse mixture-of-experts (SMoE) models, and propose a routing algorithm that estimates the routing scores on a low-dimensional hypersphere.supportedhas evidence rowfull-textExperimental results show that our model consistently outperforms the baseline SMoE models in terms of both language modeling and fine-tuning performance.p10in-scope: taxonomy category matchpass; full-text verified; report=audit_report.md
2410.14574v1We then propose to integrate heavy-ball momentum into the dynamics of SMoE, which results in the Momentum Sparse Mixture-of-Experts (MomentumSMoE).supportedhas evidence rowfull-textWe observe that across all 15 corruption types, except for motion blur, Robust MomentumV-MoE outperforms the baseline V-MoE, with as high as a 6.5% increase in top-1 accuracy and 8 mCE decrease on fog corruption.p2in-scope: taxonomy category matchpass; full-text verified; report=audit_report.md