MHED-SLAM: Multi-Scale Hybrid Encoding-Based Decoupled SLAM

Dengfang Feng; Wenyang Qin; Zhongchen Shi; Wei Chen; Yanhui Duan; Liang Xie; Erwei Yin

doi:10.1609/aaai.v40i22.38887

Authors

Dengfang Feng School of Systems Science and Engineering, Sun Yat-sen University‌, China
Wenyang Qin School of Systems Science and Engineering, Sun Yat-sen University‌, China
Zhongchen Shi Defense Innovation Institute, Academy of Military Sciences (AMS), China Intelligent Game and Decision Laboratory, China Tianjin Artificial Intelligence Innovation Center (TAIIC), China
Wei Chen Defense Innovation Institute, Academy of Military Sciences (AMS), China Intelligent Game and Decision Laboratory, China Tianjin Artificial Intelligence Innovation Center (TAIIC), China
Yanhui Duan School of Systems Science and Engineering, Sun Yat-sen University‌, China
Liang Xie Defense Innovation Institute, Academy of Military Sciences (AMS), China Intelligent Game and Decision Laboratory, China Tianjin Artificial Intelligence Innovation Center (TAIIC), China
Erwei Yin Defense Innovation Institute, Academy of Military Sciences (AMS), China Intelligent Game and Decision Laboratory, China Tianjin Artificial Intelligence Innovation Center (TAIIC), China

DOI:

https://doi.org/10.1609/aaai.v40i22.38887

Abstract

Neural Radiance Fields (NeRF)-based Visual Simultaneous Localization and Mapping (SLAM) achieve superior scene geometric modeling and robust camera tracking by leveraging neural representations. Existing methods typically relied on multi-resolution hash encoding with truncated signed distance fields (TSDF) to achieve high frame rates. However, unavoidable hash collisions can lead to artifacts, and multi-view color inconsistencies in indoor scenes can result in shape-radiance ambiguity, adversely affecting geometric quality and tracking accuracy. To address these issues, we propose a novel Multi-scale Hybrid Encoding-based Decoupled SLAM (MHED-SLAM). First, to mitigate the adverse effects of hash collisions and reduce the number of learnable parameters, we innovatively fuse a coarse-scale hash tri-plane with a fine-scale hash grid within a single latent volume. Second, to enable precise geometric reconstruction and camera tracking, we decouple the reconstruction and rendering processes, independently learning a TSDF field for reconstruction and a density field for rendering. Third, we devise a Symmetric Kullback-Leibler (SKL) strategy based on ray termination distributions to align the probability distributions derived from the TSDF and density fields for their synchronous convergence. Extensive experimental evaluations demonstrate that our approach surpasses the state-of-the-art (SOTA) methods by utilizing a faster frame rate of 20 Hz and fewer parameters, while achieving higher tracking and reconstruction accuracy.

MHED-SLAM: Multi-Scale Hybrid Encoding-Based Decoupled SLAM

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information