Cross-Modal Contrastive Learning for Domain Adaptation in 3D Semantic Segmentation
DOI:
https://doi.org/10.1609/aaai.v37i3.25400Keywords:
CV: 3D Computer Vision, CV: Applications, CV: Multi-modal Vision, CV: Segmentation, ML: Transfer, Domain Adaptation, Multi-Task LearningAbstract
Domain adaptation for 3D point cloud has attracted a lot of interest since it can avoid the time-consuming labeling process of 3D data to some extent. A recent work named xMUDA leveraged multi-modal data to domain adaptation task of 3D semantic segmentation by mimicking the predictions between 2D and 3D modalities, and outperformed the previous single modality methods only using point clouds. Based on it, in this paper, we propose a novel cross-modal contrastive learning scheme to further improve the adaptation effects. By employing constraints from the correspondences between 2D pixel features and 3D point features, our method not only facilitates interaction between the two different modalities, but also boosts feature representations in both labeled source domain and unlabeled target domain. Meanwhile, to sufficiently utilize 2D context information for domain adaptation through cross-modal learning, we introduce a neighborhood feature aggregation module to enhance pixel features. The module employs neighborhood attention to aggregate nearby pixels in the 2D image, which relieves the mismatching between the two different modalities, arising from projecting relative sparse point cloud to dense image pixels. We evaluate our method on three unsupervised domain adaptation scenarios, including country-to-country, day-to-night, and dataset-to-dataset. Experimental results show that our approach outperforms existing methods, which demonstrates the effectiveness of the proposed method.Downloads
Published
2023-06-26
How to Cite
Xing, B., Ying, X., Wang, R., Yang, J., & Chen, T. (2023). Cross-Modal Contrastive Learning for Domain Adaptation in 3D Semantic Segmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 37(3), 2974-2982. https://doi.org/10.1609/aaai.v37i3.25400
Issue
Section
AAAI Technical Track on Computer Vision III