MULTIBENCH++: A Unified and Comprehensive Multimodal Fusion Benchmarking Across Specialized Domains

Authors

  • Leyan Xue School of Artificial Intelligence, Tianjin University
  • Changqing Zhang School of Artificial Intelligence, Tianjin University
  • Kecheng Xue State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications
  • Xiaohong Liu Institute of Medical Artificial Intelligence, South China Hospital, Medical School, Shenzhen University
  • Guangyu Wang State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications
  • Zongbo Han State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications

DOI:

https://doi.org/10.1609/aaai.v40i32.39963

Abstract

Although multimodal fusion has made significant progress, its advancement is severely hindered by the lack of adequate evaluation benchmarks. Current fusion methods are typically evaluated on a small selection of public datasets, a limited scope that inadequately represents the complexity and diversity of real-world scenarios, potentially leading to biased evaluations. This issue presents a twofold challenge. On one hand, models may overfit to the biases of specific datasets, hindering their generalization to broader practical applications. On the other hand, the absence of a unified evaluation standard makes fair and objective comparisons between different fusion methods difficult. Consequently, a truly universal and high-performance fusion model has yet to emerge. To address these challenges, we have developed a large-scale, domain-adaptive benchmark for multimodal evaluation. This benchmark integrates over 30 datasets, encompassing 15 modalities and 20 predictive tasks across key application domains. To complement this, we have also developed an open-source, unified, and automated evaluation pipeline that includes standardized implementations of state-of-the-art models and diverse fusion paradigms. Leveraging this platform, we have conducted large-scale experiments, successfully establishing new performance baselines across multiple tasks. This work provides the academic community with a crucial platform for rigorous and reproducible assessment of multimodal models, aiming to propel the field of multimodal artificial intelligence to new heights.

Downloads

Published

2026-03-14

How to Cite

Xue, L., Zhang, C., Xue, K., Liu, X., Wang, G., & Han, Z. (2026). MULTIBENCH++: A Unified and Comprehensive Multimodal Fusion Benchmarking Across Specialized Domains. Proceedings of the AAAI Conference on Artificial Intelligence, 40(32), 27450–27458. https://doi.org/10.1609/aaai.v40i32.39963

Issue

Section

AAAI Technical Track on Machine Learning IX