Learning Multi-Modal Cross-Scale Deformable Transformer Network for Unregistered Hyperspectral Image Super-resolution

Wenqian Dong; Yang Xu; Jiahui Qu; Shaoxiong Hou

doi:10.1609/aaai.v38i2.27923

Authors

Wenqian Dong State Key Laboratory of Integrated Service Network, Xidian University, Xi'an 710071, China
Yang Xu State Key Laboratory of Integrated Service Network, Xidian University, Xi'an 710071, China
Jiahui Qu State Key Laboratory of Integrated Service Network, Xidian University, Xi'an 710071, China
Shaoxiong Hou State Key Laboratory of Integrated Service Network, Xidian University, Xi'an 710071, China

DOI:

https://doi.org/10.1609/aaai.v38i2.27923

Keywords:

CV: Computational Photography, Image & Video Synthesis, CV: Other Foundations of Computer Vision

Abstract

Hyperspectral image super-resolution (HSI-SR) is a technology to improve the spatial resolution of HSI. Existing fusion-based SR methods have shown great performance, but still have some problems as follows: 1) existing methods assume that the auxiliary image providing spatial information is strictly registered with the HSI, but images are difficult to be registered finely due to the shooting platforms, shooting viewpoints and the influence of atmospheric turbulence; 2) most of the methods are based on convolutional neural networks (CNNs), which is effective for local features but cannot utilize the global features. To this end, we propose a multi-modal cross-scale deformable transformer network (M2DTN) to achieve unregistered HSI-SR. Specifically, we formulate a spectrum-preserving based spatial-guided registration-SR unified model (SSRU) from the view of the realistic degradation scenarios. According to SSRU, we propose multi-modal registration deformable module (MMRD) to align features between different modalities by deformation field. In order to efficiently utilize the unique information between different modals, we design multi-scale feature transformer (MSFT) to emphasize the spatial-spectral features at different scales. In addition, we propose the cross-scale feature aggregation module (CSFA) to accurately reconstruct the HSI by aggregating feature information at different scales. Experiments show that M2DTN outperforms the-state-of-the-art HSI-SR methods. Code is obtainable at https://github.com/Jiahuiqu/M2DTN.

Learning Multi-Modal Cross-Scale Deformable Transformer Network for Unregistered Hyperspectral Image Super-resolution

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information