Revisiting Multimodal Fusion for 3D Anomaly Detection from an Architectural Perspective
DOI:
https://doi.org/10.1609/aaai.v39i12.33337Abstract
Existing efforts to boost multimodal fusion of 3D anomaly detection (3D-AD) primarily concentrate on devising more effective multimodal fusion strategies. However, little attention was devoted to analyzing the role of multimodal fusion architecture (topology) design in contributing to 3D-AD. In this paper, we aim to bridge this gap and present a systematic study on the impact of multimodal fusion architecture design on 3D-AD. This work considers the multimodal fusion architecture design at the intra-module fusion level, i.e., independent modality-specific modules, involving early, middle or late multimodal features with specific fusion operations, and also at the inter-module fusion level, i.e., the strategies to fuse those modules. In both cases, we first derive insights through theoretically and experimentally exploring how architectural designs influence 3D-AD. Then, we extend SOTA neural architecture search (NAS) paradigm and propose 3D-ADNAS to simultaneously search across multimodal fusion strategies and modality-specific modules for the first time. Extensive experiments show that 3D-ADNAS obtains consistent improvements in 3D-AD across various model capacities in terms of accuracy, frame rate, and memory usage, and it exhibits great potential in dealing with few-shot 3D-AD tasks.Downloads
Published
2025-04-11
How to Cite
Long, K., Xie, G., Ma, L., Liu, J., & Lu, Z. (2025). Revisiting Multimodal Fusion for 3D Anomaly Detection from an Architectural Perspective. Proceedings of the AAAI Conference on Artificial Intelligence, 39(12), 12273–12281. https://doi.org/10.1609/aaai.v39i12.33337
Issue
Section
AAAI Technical Track on Data Mining & Knowledge Management II