Multimodal Graph Neural Architecture Search under Distribution Shifts

Authors

  • Jie Cai Department of Computer Science and Technology, Tsinghua University
  • Xin Wang Department of Computer Science and Technology, Tsinghua University Beijing National Research Center for Information Science and Technology, Tsinghua University
  • Haoyang Li Department of Computer Science and Technology, Tsinghua University
  • Ziwei Zhang Department of Computer Science and Technology, Tsinghua University
  • Wenwu Zhu Department of Computer Science and Technology, Tsinghua University Beijing National Research Center for Information Science and Technology, Tsinghua University

DOI:

https://doi.org/10.1609/aaai.v38i8.28663

Keywords:

DMKM: Mining of Visual, Multimedia & Multimodal Data, ML: Auto ML and Hyperparameter Tuning, ML: Graph-based Machine Learning

Abstract

Multimodal graph neural architecture search (MGNAS) has shown great success for automatically designing the optimal multimodal graph neural network (MGNN) architecture by leveraging multimodal representation, crossmodal information and graph structure in one unified framework. However, existing MGNAS fails to handle distribution shifts that naturally exist in multimodal graph data, since the searched architectures inevitably capture spurious statistical correlations under distribution shifts. To solve this problem, we propose a novel Out-of-distribution Generalized Multimodal Graph Neural Architecture Search (OMG-NAS) method which optimizes the MGNN architecture with respect to its performance on decorrelated OOD data. Specifically, we propose a multimodal graph representation decorrelation strategy, which encourages the searched MGNN model to output representations that eliminate spurious correlations through iteratively optimizing the feature weights and controller. In addition, we propose a global sample weight estimator that facilitates the sharing of optimal sample weights learned from existing architectures. This design promotes the effective estimation of the sample weights for candidate MGNN architectures to generate decorrelated multimodal graph representations, concentrating more on the truly predictive relations between invariant features and ground-truth labels. Extensive experiments on real-world multimodal graph datasets demonstrate the superiority of our proposed method over SOTA baselines.

Downloads

Published

2024-03-24

How to Cite

Cai, J., Wang, X., Li, H., Zhang, Z., & Zhu, W. (2024). Multimodal Graph Neural Architecture Search under Distribution Shifts. Proceedings of the AAAI Conference on Artificial Intelligence, 38(8), 8227-8235. https://doi.org/10.1609/aaai.v38i8.28663

Issue

Section

AAAI Technical Track on Data Mining & Knowledge Management