AerialFusion: Co-Motion-Driven Unified Registration and Fusion on Multi-modal Data Streams from Aerial View

Junhui Qiu; Xiang Xiang; Hongyun Wang; Jiaqi Gui

doi:10.1609/aaai.v40i10.37810

Authors

Junhui Qiu School of Artificial Intelligence and Automation, Huazhong University of Science and Technology
Xiang Xiang School of Computer Science and Technology, Huazhong University of Science and Technology School of Artificial Intelligence and Automation, Huazhong University of Science and Technology
Hongyun Wang School of Artificial Intelligence and Automation, Huazhong University of Science and Technology
Jiaqi Gui School of Artificial Intelligence and Automation, Huazhong University of Science and Technology

DOI:

https://doi.org/10.1609/aaai.v40i10.37810

Abstract

Aerial multi-modal visual streams registration and fusion can generate more comprehensive scene information representations for UAVs' cross-modal perception. However, current challenges lie primarily in the essential difficulty of joint spatiotemporal representation learning from dynamic background and moving targets, and a critical shortage exists in large-scale, well-annotated multi-modal visual streams benchmark for UAV platforms. In this paper, we propose AerialFusion, a co-motion-driven unified UAVs visual streams registration and fusion that fully mines modality-invariant common features based on motion-aware, enabling spatiotemporally coherent registration and fusion. Specifically, 1) a Skewed Motion Distribution Field Co-Motion-Driven Image Registration, 2) a Co-Motion Generative Fusion, 3) a Streams-based Unified Learning. Furthermore, we introduce EUM3D, a registration and fusion benchmark for UAVs cross-modal perception. This benchmark contains 60 synchronized visible-infrared visual streams, or 122k spatially and temporally aligned pairs, most of which were taken at low-light scenes. And EUM3D provides pixel-level alignment guarantees via perspective-transform ground-truth. Extensive experiments reveal that AerialFusion surpasses current focus on image and static background fusion methods in aerial sequence scenarios, addressing spatiotemporal mismatches while suppressing cross-modal interference.

AerialFusion: Co-Motion-Driven Unified Registration and Fusion on Multi-modal Data Streams from Aerial View

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information