MHED-SLAM: Multi-Scale Hybrid Encoding-Based Decoupled SLAM

Authors

  • Dengfang Feng School of Systems Science and Engineering, Sun Yat-sen University‌, China
  • Wenyang Qin School of Systems Science and Engineering, Sun Yat-sen University‌, China
  • Zhongchen Shi Defense Innovation Institute, Academy of Military Sciences (AMS), China Intelligent Game and Decision Laboratory, China Tianjin Artificial Intelligence Innovation Center (TAIIC), China
  • Wei Chen Defense Innovation Institute, Academy of Military Sciences (AMS), China Intelligent Game and Decision Laboratory, China Tianjin Artificial Intelligence Innovation Center (TAIIC), China
  • Yanhui Duan School of Systems Science and Engineering, Sun Yat-sen University‌, China
  • Liang Xie Defense Innovation Institute, Academy of Military Sciences (AMS), China Intelligent Game and Decision Laboratory, China Tianjin Artificial Intelligence Innovation Center (TAIIC), China
  • Erwei Yin Defense Innovation Institute, Academy of Military Sciences (AMS), China Intelligent Game and Decision Laboratory, China Tianjin Artificial Intelligence Innovation Center (TAIIC), China

DOI:

https://doi.org/10.1609/aaai.v40i22.38887

Abstract

Neural Radiance Fields (NeRF)-based Visual Simultaneous Localization and Mapping (SLAM) achieve superior scene geometric modeling and robust camera tracking by leveraging neural representations. Existing methods typically relied on multi-resolution hash encoding with truncated signed distance fields (TSDF) to achieve high frame rates. However, unavoidable hash collisions can lead to artifacts, and multi-view color inconsistencies in indoor scenes can result in shape-radiance ambiguity, adversely affecting geometric quality and tracking accuracy. To address these issues, we propose a novel Multi-scale Hybrid Encoding-based Decoupled SLAM (MHED-SLAM). First, to mitigate the adverse effects of hash collisions and reduce the number of learnable parameters, we innovatively fuse a coarse-scale hash tri-plane with a fine-scale hash grid within a single latent volume. Second, to enable precise geometric reconstruction and camera tracking, we decouple the reconstruction and rendering processes, independently learning a TSDF field for reconstruction and a density field for rendering. Third, we devise a Symmetric Kullback-Leibler (SKL) strategy based on ray termination distributions to align the probability distributions derived from the TSDF and density fields for their synchronous convergence. Extensive experimental evaluations demonstrate that our approach surpasses the state-of-the-art (SOTA) methods by utilizing a faster frame rate of 20 Hz and fewer parameters, while achieving higher tracking and reconstruction accuracy.

Downloads

Published

2026-03-14

How to Cite

Feng, D., Qin, W., Shi, Z., Chen, W., Duan, Y., Xie, L., & Yin, E. (2026). MHED-SLAM: Multi-Scale Hybrid Encoding-Based Decoupled SLAM. Proceedings of the AAAI Conference on Artificial Intelligence, 40(22), 18243–18252. https://doi.org/10.1609/aaai.v40i22.38887

Issue

Section

AAAI Technical Track on Intelligent Robotics