GigaMoE: Sparsity-Guided Mixture of Experts for Efficient Gigapixel Object Detection

Authors

  • Xiang Li Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, China Department of Automation,Tsinghua University, Beijing, China
  • Wenxi Li KLATASDS-MOE, School of Statistics, East China Normal University, Shanghai, China Zhuoxi Lab, Hangzhou, China
  • Yuetong Wang Xiuzhong College, Tsinghua University, Beijing, China
  • Chenyang Lyu AI Business, Alibaba International Digital Commerce
  • Haozhe Lin Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, China
  • Guiguang Ding Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, China School of Software, Tsinghua University, Beijing, China
  • Yuchen Guo Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v40i21.38810

Abstract

Object detection in High-Resolution Wide (HRW) shots, or gigapixel images, presents unique challenges due to extreme object sparsity and vast scale variations. State-of-the-art methods like SparseFormer have pioneered sparse processing by selectively focusing on important regions, yet they apply a uniform computational model to all selected regions, overlooking their intrinsic complexity differences. This leads to a suboptimal trade-off between performance and efficiency. In this paper, we introduce GigaMoE, a novel backbone architecture that pioneers adaptive computation for this domain by replacing the standard Feed-Forward Networks (FFNs) with a Mixture-of-Experts (MoE) module. Our architecture first employs a shared expert to provide a robust feature baseline for all selected regions. Upon this foundation, our core innovation---a novel Sparsity-Guided Routing mechanism---insightfully repurposes importance scores from the sparse backbone to provide a "computational bonus,'' dynamically engaging a variable number of specialized experts based on content complexity. The entire system is trained efficiently via a loss-free load-balancing technique, eliminating the need for cumbersome auxiliary losses. Extensive experiments show that GigaMoE sets a new state-of-the-art on the PANDA benchmark, improving detection accuracy by 1.1% over SparseFormer while simultaneously reducing the computational cost (FLOPs) by a remarkable 32.3%.

Downloads

Published

2026-03-14

How to Cite

Li, X., Li, W., Wang, Y., Lyu, C., Lin, H., Ding, G., & Guo, Y. (2026). GigaMoE: Sparsity-Guided Mixture of Experts for Efficient Gigapixel Object Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 40(21), 17553–17561. https://doi.org/10.1609/aaai.v40i21.38810

Issue

Section

AAAI Technical Track on Humans and AI