xMHashSeg: Cross-modal Hash Learning for Training-free Unsupervised LiDAR Semantic Segmentation

Authors

  • Jialong Zhang Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Institute of Artificial Intelligence, Xiamen University, Xiamen, China
  • Yachao Zhang School of Informatics, Xiamen University, Xiamen, China
  • Yao Wu School of Informatics, Xiamen University, Xiamen, China Fuzhou University, Fuzhou, China
  • Jiangming Shi Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Institute of Artificial Intelligence, Xiamen University, Xiamen, China
  • Fangyong Wang Hanjiang National Laboratory
  • Yanyun Qu Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Institute of Artificial Intelligence, Xiamen University, Xiamen, China School of Informatics, Xiamen University, Xiamen, China

DOI:

https://doi.org/10.1609/aaai.v40i15.38248

Abstract

3D semantic segmentation serves as a fundamental component in many applications, such as autonomous driving and medical image analysis. Although recent methods have advanced the field, adapting these methods to new environments or object categories without extensive retraining remains a significant challenge. To address this, we introduce xMHashSeg, a novel training-free cross-modal LiDAR semantic segmentation framework. xMHashSeg leverages foundation models and non-parametric network to extract features from 2D images and 3D point clouds, subsequently integrating these features through hash learning. Specifically, We develop point-SANN, a novel self-adaption non-parametric network that can extract robust 3D features from raw point clouds, while 2D features are directly extracted through the foundation model DINOv2. To reconcile inconsistencies across different modals, we introduce a Hash Code Learning Module that projects all information into a common hash space, learning a consistent hash code that enhances feature integration. Additionally, depth maps are utilized as an intermediary form between 2D and 3D data to facilitate convergence during hash code learning. Our experimental results on various multi-modality datasets demonstrate that xMHashSeg outperforms zero-shot learning approaches and achieve performance close to that of unsupervised domain adaptation and test-time adaptation methods, without requiring any annotations or additional training.

Downloads

Published

2026-03-14

How to Cite

Zhang, J., Zhang, Y., Wu, Y., Shi, J., Wang, F., & Qu, Y. (2026). xMHashSeg: Cross-modal Hash Learning for Training-free Unsupervised LiDAR Semantic Segmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(15), 12538–12546. https://doi.org/10.1609/aaai.v40i15.38248

Issue

Section

AAAI Technical Track on Computer Vision XII