Focus on Local: Finding Reliable Discriminative Regions for Visual Place Recognition

Authors

  • Changwei Wang Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center, Qilu University of Technology (Shandong Academy of Sciences) Shandong Provincial Key Laboratory of Computing Power Internet and Service Computing, Shandong Fundamental Research Center for Computer Science
  • Shunpeng Chen School of Artificial Intelligence, Beijing University of Posts and Telecommunications
  • Yukun Song School of Artificial Intelligence, Beijing University of Posts and Telecommunications
  • Rongtao Xu MAIS, Institute of Automation, Chinese Academy of Sciences
  • Zherui Zhang School of Artificial Intelligence, Beijing University of Posts and Telecommunications
  • Jiguang Zhang MAIS, Institute of Automation, Chinese Academy of Sciences
  • Haoran Yang Tongji University
  • Yu Zhang Tongji University
  • Kexue Fu Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center, Qilu University of Technology (Shandong Academy of Sciences) Shandong Provincial Key Laboratory of Computing Power Internet and Service Computing, Shandong Fundamental Research Center for Computer Science
  • Shide Du Fuzhou University
  • Zhiwei Xu Shandong University
  • Longxiang Gao Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center, Qilu University of Technology (Shandong Academy of Sciences) Shandong Provincial Key Laboratory of Computing Power Internet and Service Computing, Shandong Fundamental Research Center for Computer Science
  • Li Guo School of Artificial Intelligence, Beijing University of Posts and Telecommunications
  • Shibiao Xu School of Artificial Intelligence, Beijing University of Posts and Telecommunications

DOI:

https://doi.org/10.1609/aaai.v39i7.32811

Abstract

Visual Place Recognition (VPR) is aimed at predicting the location of a query image by referencing a database of geotagged images. For VPR task, often fewer discriminative local regions in an image produce important effects while mundane background regions do not contribute or even cause perceptual aliasing because of easy overlap. However, existing methods lack precisely modeling and full exploitation of these discriminative regions. In addition, the lack of pixel-level correspondence supervision in the VPR dataset hinders further improvement of the local feature matching capability in the re-ranking stage. In this paper, we propose the Focus on Local (FoL) approach to stimulate the performance of image retrieval and re-ranking in VPR simultaneously by mining and exploiting reliable discriminative local regions in images and introducing pseudo-correlation supervision. First, we design two losses, Extraction-Aggregation Spatial Alignment Loss (SAL) and Foreground-Background Contrast Enhancement Loss (CEL), to explicitly model reliable discriminative local regions and use them to guide the generation of global representations and efficient re-ranking. Second, we introduce a weakly-supervised local feature training strategy based on pseudo-correspondences obtained from aggregating global features to alleviate the lack of local correspondences ground truth for the VPR task. Third, we suggest an efficient re-ranking pipeline that is efficiently and precisely based on discriminative region guidance. Finally, experimental results show that our FoL achieves the state-of-the-art on multiple VPR benchmarks in both image retrieval and re-ranking stages and also significantly outperforms existing two-stage VPR methods in terms of computational efficiency.

Downloads

Published

2025-04-11

How to Cite

Wang, C., Chen, S., Song, Y., Xu, R., Zhang, Z., Zhang, J., … Xu, S. (2025). Focus on Local: Finding Reliable Discriminative Regions for Visual Place Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 39(7), 7536–7544. https://doi.org/10.1609/aaai.v39i7.32811

Issue

Section

AAAI Technical Track on Computer Vision VI