GigaMoE: Sparsity-Guided Mixture of Experts for Efficient Gigapixel Object Detection

Xiang Li; Wenxi Li; Yuetong Wang; Chenyang Lyu; Haozhe Lin; Guiguang Ding; Yuchen Guo

doi:10.1609/aaai.v40i21.38810

Authors

Xiang Li Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, China Department of Automation,Tsinghua University, Beijing, China
Wenxi Li KLATASDS-MOE, School of Statistics, East China Normal University, Shanghai, China Zhuoxi Lab, Hangzhou, China
Yuetong Wang Xiuzhong College, Tsinghua University, Beijing, China
Chenyang Lyu AI Business, Alibaba International Digital Commerce
Haozhe Lin Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, China
Guiguang Ding Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, China School of Software, Tsinghua University, Beijing, China
Yuchen Guo Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v40i21.38810

Abstract

Object detection in High-Resolution Wide (HRW) shots, or gigapixel images, presents unique challenges due to extreme object sparsity and vast scale variations. State-of-the-art methods like SparseFormer have pioneered sparse processing by selectively focusing on important regions, yet they apply a uniform computational model to all selected regions, overlooking their intrinsic complexity differences. This leads to a suboptimal trade-off between performance and efficiency. In this paper, we introduce GigaMoE, a novel backbone architecture that pioneers adaptive computation for this domain by replacing the standard Feed-Forward Networks (FFNs) with a Mixture-of-Experts (MoE) module. Our architecture first employs a shared expert to provide a robust feature baseline for all selected regions. Upon this foundation, our core innovation---a novel Sparsity-Guided Routing mechanism---insightfully repurposes importance scores from the sparse backbone to provide a "computational bonus,'' dynamically engaging a variable number of specialized experts based on content complexity. The entire system is trained efficiently via a loss-free load-balancing technique, eliminating the need for cumbersome auxiliary losses. Extensive experiments show that GigaMoE sets a new state-of-the-art on the PANDA benchmark, improving detection accuracy by 1.1% over SparseFormer while simultaneously reducing the computational cost (FLOPs) by a remarkable 32.3%.

GigaMoE: Sparsity-Guided Mixture of Experts for Efficient Gigapixel Object Detection

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information