ParaFormer: Parallel Attention Transformer for Efficient Feature Matching

Authors

  • Xiaoyong Lu Southeast University
  • Yaping Yan Southeast University
  • Bin Kang Nanjing University of Posts and Telecommunication
  • Songlin Du Southeast University

DOI:

https://doi.org/10.1609/aaai.v37i2.25275

Keywords:

CV: 3D Computer Vision

Abstract

Heavy computation is a bottleneck limiting deep-learning-based feature matching algorithms to be applied in many real-time applications. However, existing lightweight networks optimized for Euclidean data cannot address classical feature matching tasks, since sparse keypoint based descriptors are expected to be matched. This paper tackles this problem and proposes two concepts: 1) a novel parallel attention model entitled ParaFormer and 2) a graph based U-Net architecture with attentional pooling. First, ParaFormer fuses features and keypoint positions through the concept of amplitude and phase, and integrates self- and cross-attention in a parallel manner which achieves a win-win performance in terms of accuracy and efficiency. Second, with U-Net architecture and proposed attentional pooling, the ParaFormer-U variant significantly reduces computational complexity, and minimize performance loss caused by downsampling. Sufficient experiments on various applications, including homography estimation, pose estimation, and image matching, demonstrate that ParaFormer achieves state-of-the-art performance while maintaining high efficiency. The efficient ParaFormer-U variant achieves comparable performance with less than 50% FLOPs of the existing attention-based models.

Downloads

Published

2023-06-26

How to Cite

Lu, X., Yan, Y., Kang, B., & Du, S. (2023). ParaFormer: Parallel Attention Transformer for Efficient Feature Matching. Proceedings of the AAAI Conference on Artificial Intelligence, 37(2), 1853-1860. https://doi.org/10.1609/aaai.v37i2.25275

Issue

Section

AAAI Technical Track on Computer Vision II