Learning Multi-Modal Cross-Scale Deformable Transformer Network for Unregistered Hyperspectral Image Super-resolution

Authors

  • Wenqian Dong State Key Laboratory of Integrated Service Network, Xidian University, Xi'an 710071, China
  • Yang Xu State Key Laboratory of Integrated Service Network, Xidian University, Xi'an 710071, China
  • Jiahui Qu State Key Laboratory of Integrated Service Network, Xidian University, Xi'an 710071, China
  • Shaoxiong Hou State Key Laboratory of Integrated Service Network, Xidian University, Xi'an 710071, China

DOI:

https://doi.org/10.1609/aaai.v38i2.27923

Keywords:

CV: Computational Photography, Image & Video Synthesis, CV: Other Foundations of Computer Vision

Abstract

Hyperspectral image super-resolution (HSI-SR) is a technology to improve the spatial resolution of HSI. Existing fusion-based SR methods have shown great performance, but still have some problems as follows: 1) existing methods assume that the auxiliary image providing spatial information is strictly registered with the HSI, but images are difficult to be registered finely due to the shooting platforms, shooting viewpoints and the influence of atmospheric turbulence; 2) most of the methods are based on convolutional neural networks (CNNs), which is effective for local features but cannot utilize the global features. To this end, we propose a multi-modal cross-scale deformable transformer network (M2DTN) to achieve unregistered HSI-SR. Specifically, we formulate a spectrum-preserving based spatial-guided registration-SR unified model (SSRU) from the view of the realistic degradation scenarios. According to SSRU, we propose multi-modal registration deformable module (MMRD) to align features between different modalities by deformation field. In order to efficiently utilize the unique information between different modals, we design multi-scale feature transformer (MSFT) to emphasize the spatial-spectral features at different scales. In addition, we propose the cross-scale feature aggregation module (CSFA) to accurately reconstruct the HSI by aggregating feature information at different scales. Experiments show that M2DTN outperforms the-state-of-the-art HSI-SR methods. Code is obtainable at https://github.com/Jiahuiqu/M2DTN.

Published

2024-03-24

How to Cite

Dong, W., Xu, Y., Qu, J., & Hou, S. (2024). Learning Multi-Modal Cross-Scale Deformable Transformer Network for Unregistered Hyperspectral Image Super-resolution. Proceedings of the AAAI Conference on Artificial Intelligence, 38(2), 1573-1581. https://doi.org/10.1609/aaai.v38i2.27923

Issue

Section

AAAI Technical Track on Computer Vision I