Debiased Multiplex Tokenizer for Efficient Map-Free Visual Relocalization
DOI:
https://doi.org/10.1609/aaai.v40i12.37982Abstract
Image-based feature representation plays a critical role in visual localization, enabling robots to estimate their position and orientation in GPS-denied environments. However, this task is often undermined by significant variations in camera viewpoints and scene appearances. Recently, map-free visual relocalization (MFVR) has emerged as a promising paradigm due to its compatibility with lightweight deployment and privacy isolation on mobile devices. In this paper, we propose the Debiased Multiplex Tokenizer (DeMT) as a novel method for versatile and efficient MFVR. Specifically, DeMT performs relative pose regression through an integrated framework built upon a pretrained vision Mamba encoder, comprising three key modules: First, Multiplex Interactive Tokenization yields robust image tokens with non-local affinities and cross-domain descriptions; Second, Debiased Anchor Registration facilitates anchor token matching through proximity graph retrieval and causal pointer attribution; Third, Geometry-Informed Pose Regression empowers multi-layer perceptrons with a gating mechanism and spectral normalization to support both pair-wise and multi-view modes. Extensive evaluations across nine public datasets demonstrate that DeMT substantially outperforms existing baselines and ablation variants in diverse indoor and outdoor environments.Downloads
Published
2026-03-14
How to Cite
Wang, W., Liu, H., Li, S., Jiang, P., & Ding, R. (2026). Debiased Multiplex Tokenizer for Efficient Map-Free Visual Relocalization. Proceedings of the AAAI Conference on Artificial Intelligence, 40(12), 10145–10153. https://doi.org/10.1609/aaai.v40i12.37982
Issue
Section
AAAI Technical Track on Computer Vision IX