Evidence-Ledger Synthesis of FFN and Mixture-of-Experts in LLMs

paper_id	claim	claim_status	evidence_status	source_depth	source_quote	page_or_section	taxonomy_fit	audit_status
2503.23007v1	Existing Sparse Mixture of Experts (SMoE) models provide the same input to the top K-Experts in a TopK setting.	supported	has evidence row	full-text	S2MoE consistently outperforms both baselines, regardless of backbone size or the number of ex- perts activated, demonstrating its potential to scale up effectively in large language models. 4.3 Fine-tuning Result Pre-training weights.	p1	in-scope: taxonomy category match	pass; full-text verified; report=audit_report.md
2510.16411v1	Inspired by this observation, we revise the graphical model of MoE in Figure 1A to incorporate relationships between experts and propose the novel SymphonySMoE, which leverages expert-to-expert interactions to enhance its token routing.	supported	has evidence row	full-text	The improvements hold for both high-performing tasks such as QNLI (94.93 vs. 94.62) and SST2 (96.22 vs. 95.64), as well as more challenging ones such as WNLI (66.20 vs. 61.97).	p4	in-scope: taxonomy category match	pass; full-text verified; report=audit_report.md
2509.10025v1	3 Method 3.1 SMoE-V AE Architecture Our approach combines Variational Autoencoders with Sparse Mixture of Experts to enable interpretable analysis of expert specialization patterns.	supported	has evidence row	full-text	Unsupervised peaks around 7 experts and outperforms the supervised baseline constrained to 5 experts.	p4	in-scope: taxonomy category match	pass; full-text verified; report=audit_report.md
2204.09179v3	6 Conclusion In this work, we point out the representation collapse issue in sparse mixture-of-experts (SMoE) models, and propose a routing algorithm that estimates the routing scores on a low-dimensional hypersphere.	supported	has evidence row	full-text	Experimental results show that our model consistently outperforms the baseline SMoE models in terms of both language modeling and ﬁne-tuning performance.	p10	in-scope: taxonomy category match	pass; full-text verified; report=audit_report.md
2410.14574v1	We then propose to integrate heavy-ball momentum into the dynamics of SMoE, which results in the Momentum Sparse Mixture-of-Experts (MomentumSMoE).	supported	has evidence row	full-text	We observe that across all 15 corruption types, except for motion blur, Robust MomentumV-MoE outperforms the baseline V-MoE, with as high as a 6.5% increase in top-1 accuracy and 8 mCE decrease on fog corruption.	p2	in-scope: taxonomy category match	pass; full-text verified; report=audit_report.md