ConvMatch: Rethinking Network Design for Two-View Correspondence Learning
DOI:
https://doi.org/10.1609/aaai.v37i3.25456Keywords:
CV: Motion & Tracking, CV: 3D Computer Vision, CV: Image and Video RetrievalAbstract
Multilayer perceptron (MLP) has been widely used in two-view correspondence learning for only unordered correspondences provided, and it extracts deep features from individual correspondence effectively. However, the problem of lacking context information limits its performance and hence, many extra complex blocks are designed to capture such information in the follow-up studies. In this paper, from a novel perspective, we design a correspondence learning network called ConvMatch that for the first time can leverage convolutional neural network (CNN) as the backbone to capture better context, thus avoiding the complex design of extra blocks. Specifically, with the observation that sparse motion vectors and dense motion field can be converted into each other with interpolating and sampling, we regularize the putative motion vectors by estimating dense motion field implicitly, then rectify the errors caused by outliers in local areas with CNN, and finally obtain correct motion vectors from the rectified motion field. Extensive experiments reveal that ConvMatch with a simple CNN backbone consistently outperforms state-of-the-arts including MLP-based methods for relative pose estimation and homography estimation, and shows promising generalization ability to different datasets and descriptors. Our code is publicly available at https://github.com/SuhZhang/ConvMatch.Downloads
Published
2023-06-26
How to Cite
Zhang, S., & Ma, J. (2023). ConvMatch: Rethinking Network Design for Two-View Correspondence Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 37(3), 3472-3479. https://doi.org/10.1609/aaai.v37i3.25456
Issue
Section
AAAI Technical Track on Computer Vision III