Reconcile Gradient Modulation for Harmony Multimodal Learning

Authors

  • Xiyuan Gao Tianjin University Xiong'an National Innovation Center Xiong'an Guochuang Lantian Technology Co., Ltd.
  • Bing Cao Tianjin University Haihe Lab of ITAI
  • Baoquan Gong Tianjin University
  • Pengfei Zhu Tianjin University Xiong'an National Innovation Center Xiong'an Guochuang Lantian Technology Co., Ltd.

DOI:

https://doi.org/10.1609/aaai.v40i25.39267

Abstract

Multimodal learning frequently faces two coupled challenges: modality imbalance, where dominant modalities suppress others during training, and modality conflict, where opposing gradient directions hinder optimization. Existing methods typically address these issues in isolation, yet they are intrinsically correlated and most fundamentally reflected in the gradient space—severe imbalance may obscure conflicts, while suppressing conflict may homogenize features and worsen imbalance, affecting fusion performance. To jointly address this coupled challenge, we propose Reconcile Gradient Modulation (RGM), a unified framework that adaptively adjusts gradient magnitude and direction for harmony multimodal learning. The core of RGM is SynOrth Grad, which minimizes Dirichlet energy to perform minimal-gradient surgery. It enhances cooperation synergy when modalities are aligned and enforces orthogonality to preserve uniqueness in conflict situations, thus promoting stable and balanced learning. To guide this modulation, we propose Cumulative Gradient Energy (CGE) as a convergence-guaranteed measure of modality-wise progress, and construct a Balance-nonConflict Plane (BCP) for real-time diagnosis and control of training dynamics. Experiments on diverse benchmarks validate our effectiveness and generalizability, consistently outperforming counterparts that are designed to handle multimodal imbalance or conflict independently.

Downloads

Published

2026-03-14

How to Cite

Gao, X., Cao, B., Gong, B., & Zhu, P. (2026). Reconcile Gradient Modulation for Harmony Multimodal Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(25), 21225–21233. https://doi.org/10.1609/aaai.v40i25.39267

Issue

Section

AAAI Technical Track on Machine Learning II