Debiased Multiplex Tokenizer for Efficient Map-Free Visual Relocalization

Wenshuai Wang; Hong Liu; Shengquan Li; Peifeng Jiang; Runwei Ding

doi:10.1609/aaai.v40i12.37982

Authors

Wenshuai Wang Peking University Pengcheng Laboratory
Hong Liu Peking University
Shengquan Li Pengcheng Laboratory
Peifeng Jiang Peking University
Runwei Ding Pengcheng Laboratory

DOI:

https://doi.org/10.1609/aaai.v40i12.37982

Abstract

Image-based feature representation plays a critical role in visual localization, enabling robots to estimate their position and orientation in GPS-denied environments. However, this task is often undermined by significant variations in camera viewpoints and scene appearances. Recently, map-free visual relocalization (MFVR) has emerged as a promising paradigm due to its compatibility with lightweight deployment and privacy isolation on mobile devices. In this paper, we propose the Debiased Multiplex Tokenizer (DeMT) as a novel method for versatile and efficient MFVR. Specifically, DeMT performs relative pose regression through an integrated framework built upon a pretrained vision Mamba encoder, comprising three key modules: First, Multiplex Interactive Tokenization yields robust image tokens with non-local affinities and cross-domain descriptions; Second, Debiased Anchor Registration facilitates anchor token matching through proximity graph retrieval and causal pointer attribution; Third, Geometry-Informed Pose Regression empowers multi-layer perceptrons with a gating mechanism and spectral normalization to support both pair-wise and multi-view modes. Extensive evaluations across nine public datasets demonstrate that DeMT substantially outperforms existing baselines and ablation variants in diverse indoor and outdoor environments.

Debiased Multiplex Tokenizer for Efficient Map-Free Visual Relocalization

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information