Multi-Head Modularization to Leverage Generalization Capability in Multi-Modal Networks

Jun-Tae Lee; Hyunsin Park; Sungrack Yun; Simyung Chang

doi:10.1609/aaai.v36i7.20698

Authors

Jun-Tae Lee Qualcomm AI Research
Hyunsin Park Qualcomm AI Research
Sungrack Yun Qualcomm AI Research
Simyung Chang Qualcomm Korea YH

DOI:

https://doi.org/10.1609/aaai.v36i7.20698

Keywords:

Machine Learning (ML)

Abstract

It has been crucial to leverage the rich information of multiple modalities in many tasks. Existing works have tried to design multi-modal networks with descent multi-modal fusion modules. Instead, we focus on improving generalization capability of multi-modal networks, especially the fusion module. Viewing the multi-modal data as different projections of information, we first observe that bad projection can cause poor generalization behaviors of multi-modal networks. Then, motivated by well-generalized network's low sensitivity to perturbation, we propose a novel multi-modal training method, multi-head modularization (MHM). We modularize a multi-modal network as a series of uni-modal embedding, multi-modal embedding, and task-specific head modules. Also, for training, we exploit multiple head modules learned with different datasets, swapping each other. From this, we can make the multi-modal embedding module robust to all the heads with different generalization behaviors. In testing phase, we select one of the head modules not to increase the computational cost. Owing to the perturbation of head modules, though including one selected head, the deployed network is more well-generalized compared to the simply end-to-end learned. We verify the effectiveness of MHM on various multi-modal tasks. We use the state-of-the-art methods as baselines, and show notable performance gain for all the baselines.

Multi-Head Modularization to Leverage Generalization Capability in Multi-Modal Networks

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information