MUFASA: Multimodal Fusion Architecture Search for Electronic Health Records

Zhen Xu; David R. So; Andrew  M. Dai

doi:10.1609/aaai.v35i12.17260

Authors

Zhen Xu Google Research
David R. So Google Research
Andrew M. Dai Google Research

DOI:

https://doi.org/10.1609/aaai.v35i12.17260

Keywords:

Multimodal Learning, Healthcare, Medicine & Wellness

Abstract

One important challenge of applying deep learning to electronic health records (EHR) is the complexity of their multimodal structure. EHR usually contains a mixture of structured (codes) and unstructured (free-text) data with sparse and irregular longitudinal features -- all of which doctors utilize when making decisions. In the deep learning regime, determining how different modality representations should be fused together is a difficult problem, which is often addressed by handcrafted modeling and intuition. In this work, we extend state-of-the-art neural architecture search (NAS) methods and propose MUltimodal Fusion Architecture SeArch (MUFASA) to simultaneously search across multimodal fusion strategies and modality-specific architectures for the first time. We demonstrate empirically that our MUFASA method outperforms established unimodal NAS on public EHR data with comparable computation costs. In addition, MUFASA produces architectures that outperform Transformer and Evolved Transformer. Compared with these baselines on CCS diagnosis code prediction, our discovered models improve top-5 recall from 0.88 to 0.91 and demonstrate the ability to generalize to other EHR tasks. Studying our top architecture in depth, we provide empirical evidence that MUFASA's improvements are derived from its ability to both customize modeling for each modality and find effective fusion strategies.

MUFASA: Multimodal Fusion Architecture Search for Electronic Health Records

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information