| 2503.23007v1 | Existing Sparse Mixture of Experts (SMoE) models provide the same input to the top K-Experts in a TopK setting. | supported | has evidence row | full-text | S2MoE consistently outperforms both baselines, regardless of backbone size or the number of ex- perts activated, demonstrating its potential to scale up effectively in large language models. 4.3 Fine-tuning Result Pre-training weights. | p1 | in-scope: taxonomy category match | pass; full-text verified; report=audit_report.md |
| 2510.16411v1 | Inspired by this observation, we revise the graphical model of MoE in Figure 1A to incorporate relationships between experts and propose the novel SymphonySMoE, which leverages expert-to-expert interactions to enhance its token routing. | supported | has evidence row | full-text | The improvements hold for both high-performing tasks such as QNLI (94.93 vs. 94.62) and SST2 (96.22 vs. 95.64), as well as more challenging ones such as WNLI (66.20 vs. 61.97). | p4 | in-scope: taxonomy category match | pass; full-text verified; report=audit_report.md |
| 2509.10025v1 | 3 Method 3.1 SMoE-V AE Architecture Our approach combines Variational Autoencoders with Sparse Mixture of Experts to enable interpretable analysis of expert specialization patterns. | supported | has evidence row | full-text | Unsupervised peaks around 7 experts and outperforms the supervised baseline constrained to 5 experts. | p4 | in-scope: taxonomy category match | pass; full-text verified; report=audit_report.md |
| 2204.09179v3 | 6 Conclusion In this work, we point out the representation collapse issue in sparse mixture-of-experts (SMoE) models, and propose a routing algorithm that estimates the routing scores on a low-dimensional hypersphere. | supported | has evidence row | full-text | Experimental results show that our model consistently outperforms the baseline SMoE models in terms of both language modeling and fine-tuning performance. | p10 | in-scope: taxonomy category match | pass; full-text verified; report=audit_report.md |
| 2410.14574v1 | We then propose to integrate heavy-ball momentum into the dynamics of SMoE, which results in the Momentum Sparse Mixture-of-Experts (MomentumSMoE). | supported | has evidence row | full-text | We observe that across all 15 corruption types, except for motion blur, Robust MomentumV-MoE outperforms the baseline V-MoE, with as high as a 6.5% increase in top-1 accuracy and 8 mCE decrease on fog corruption. | p2 | in-scope: taxonomy category match | pass; full-text verified; report=audit_report.md |