Efficient Deweahter Mixture-of-Experts with Uncertainty-Aware Feature-Wise Linear Modulation

Authors

  • Rongyu Zhang Nanjing University National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University
  • Yulin Luo National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University
  • Jiaming Liu National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University
  • Huanrui Yang University of California, Berkeley
  • Zhen Dong University of California, Berkeley
  • Denis Gudovskiy Panasonic
  • Tomoyuki Okuno Panasonic
  • Yohei Nakata Panasonic
  • Kurt Keutzer University of California, Berkeley
  • Yuan Du Nanjing University
  • Shanghang Zhang National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University

DOI:

https://doi.org/10.1609/aaai.v38i15.29622

Keywords:

ML: Learning on the Edge & Model Compression, CV: Applications, CV: Low Level & Physics-based Vision, CV: Representation Learning for Vision

Abstract

The Mixture-of-Experts (MoE) approach has demonstrated outstanding scalability in multi-task learning including low-level upstream tasks such as concurrent removal of multiple adverse weather effects. However, the conventional MoE architecture with parallel Feed Forward Network (FFN) experts leads to significant parameter and computational overheads that hinder its efficient deployment. In addition, the naive MoE linear router is suboptimal in assigning task-specific features to multiple experts which limits its further scalability. In this work, we propose an efficient MoE architecture with weight sharing across the experts. Inspired by the idea of linear feature modulation (FM), our architecture implicitly instantiates multiple experts via learnable activation modulations on a single shared expert block. The proposed Feature Modulated Expert (FME) serves as a building block for the novel Mixture-of-Feature-Modulation-Experts (MoFME) architecture, which can scale up the number of experts with low overhead. We further propose an Uncertainty-aware Router (UaR) to assign task-specific features to different FM modules with well-calibrated weights. This enables MoFME to effectively learn diverse expert functions for multiple tasks. The conducted experiments on the multi-deweather task show that our MoFME outperforms the state-of-the-art in the image restoration quality by 0.1-0.2 dB while saving more than 74% of parameters and 20% inference time over the conventional MoE counterpart. Experiments on the downstream segmentation and classification tasks further demonstrate the generalizability of MoFME to real open-world applications.

Published

2024-03-24

How to Cite

Zhang, R., Luo, Y., Liu, J., Yang, H., Dong, Z., Gudovskiy, D., Okuno, T., Nakata, Y., Keutzer, K., Du, Y., & Zhang, S. (2024). Efficient Deweahter Mixture-of-Experts with Uncertainty-Aware Feature-Wise Linear Modulation. Proceedings of the AAAI Conference on Artificial Intelligence, 38(15), 16812-16820. https://doi.org/10.1609/aaai.v38i15.29622

Issue

Section

AAAI Technical Track on Machine Learning VI