Motion-adaptive Transformer for Event-based Image Deblurring

Senyan Xu; Zhijing Sun; Mingchen Zhong; Chengzhi Cao; Yidi Liu; Xueyang Fu; Yan Chen

doi:10.1609/aaai.v39i9.32967

Authors

Senyan Xu University of Science and Technology of China
Zhijing Sun University of Science and Technology of China
Mingchen Zhong University of Science and Technology of China
Chengzhi Cao University of Science and Technology of China
Yidi Liu University of Science and Technology of China
Xueyang Fu University of Science and Technology of China
Yan Chen University of Science and Technology of China

DOI:

https://doi.org/10.1609/aaai.v39i9.32967

Abstract

Event cameras, which capture pixel-level brightness changes asynchronously, provide rich motion information that is often missed during traditional frame-based camera exposures, thereby offering fresh perspectives for motion deblurring. Although current approaches incorporate event intensity, they neglect essential spatial motion information. Unlike their CNN architectures, Transformers excel in modeling long-range dependencies but struggle with establishing relevant non-local connections in sparse events and fail to highlight significant interactions in dense images. To address these limitations, we introduce a Motion-Adaptive Transformer network (MAT) that utilizes spatial motion information to forge robust global connections. The core design is an Adaptive Motion Mask Predictor (AMMP) that identifies key motion regions, guiding the Motion-Sparse Attention (MSA) to eliminate irrelevant event tokens and enabling the Motion-Aware Attention (MAA) to focus on relevant ones, thereby enhancing long-range dependency modeling. Additionally, we elaborately design a Cross-Modal Intensity Gating mechanism that efficiently merges intensity data across modalities while minimizing parameter use. The learnable Expansion-Controlled Spatial Gating further optimizes the transmission of event features. Comprehensive testing confirms that our approach sets a new benchmark in image deblurring, surpassing previous methods by up to 0.60dB on the GoPro dataset, 1.04dB on the HS-ERGB dataset, and achieving an average improvement of 0.52dB across two real-world datasets.

Motion-adaptive Transformer for Event-based Image Deblurring

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information