xMHashSeg: Cross-modal Hash Learning for Training-free Unsupervised LiDAR Semantic Segmentation

Jialong Zhang; Yachao Zhang; Yao Wu; Jiangming Shi; Fangyong Wang; Yanyun Qu

doi:10.1609/aaai.v40i15.38248

Authors

Jialong Zhang Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Institute of Artificial Intelligence, Xiamen University, Xiamen, China
Yachao Zhang School of Informatics, Xiamen University, Xiamen, China
Yao Wu School of Informatics, Xiamen University, Xiamen, China Fuzhou University, Fuzhou, China
Jiangming Shi Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Institute of Artificial Intelligence, Xiamen University, Xiamen, China
Fangyong Wang Hanjiang National Laboratory
Yanyun Qu Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Institute of Artificial Intelligence, Xiamen University, Xiamen, China School of Informatics, Xiamen University, Xiamen, China

DOI:

https://doi.org/10.1609/aaai.v40i15.38248

Abstract

3D semantic segmentation serves as a fundamental component in many applications, such as autonomous driving and medical image analysis. Although recent methods have advanced the field, adapting these methods to new environments or object categories without extensive retraining remains a significant challenge. To address this, we introduce xMHashSeg, a novel training-free cross-modal LiDAR semantic segmentation framework. xMHashSeg leverages foundation models and non-parametric network to extract features from 2D images and 3D point clouds, subsequently integrating these features through hash learning. Specifically, We develop point-SANN, a novel self-adaption non-parametric network that can extract robust 3D features from raw point clouds, while 2D features are directly extracted through the foundation model DINOv2. To reconcile inconsistencies across different modals, we introduce a Hash Code Learning Module that projects all information into a common hash space, learning a consistent hash code that enhances feature integration. Additionally, depth maps are utilized as an intermediary form between 2D and 3D data to facilitate convergence during hash code learning. Our experimental results on various multi-modality datasets demonstrate that xMHashSeg outperforms zero-shot learning approaches and achieve performance close to that of unsupervised domain adaptation and test-time adaptation methods, without requiring any annotations or additional training.

xMHashSeg: Cross-modal Hash Learning for Training-free Unsupervised LiDAR Semantic Segmentation

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information