ECO-3D: Equivariant Contrastive Learning for Pre-training on Perturbed 3D Point Cloud

Authors

  • Ruibin Wang Peking University
  • Xianghua Ying Peking University
  • Bowei Xing Peking University
  • Jinfa Yang Peking University

DOI:

https://doi.org/10.1609/aaai.v37i2.25361

Keywords:

CV: 3D Computer Vision, CV: Object Detection & Categorization, CV: Representation Learning for Vision, ML: Unsupervised & Self-Supervised Learning

Abstract

In this work, we investigate contrastive learning on perturbed point clouds and find that the contrasting process may widen the domain gap caused by random perturbations, making the pre-trained network fail to generalize on testing data. To this end, we propose the Equivariant COntrastive framework which closes the domain gap before contrasting, further introduces the equivariance property, and enables pre-training networks under more perturbation types to obtain meaningful features. Specifically, to close the domain gap, a pre-trained VAE is adopted to convert perturbed point clouds into less perturbed point embedding of similar domains and separated perturbation embedding. The contrastive pairs can then be generated by mixing the point embedding with different perturbation embedding. Moreover, to pursue the equivariance property, a Vector Quantizer is adopted during VAE training, discretizing the perturbation embedding into one-hot tokens which indicate the perturbation labels. By correctly predicting the perturbation labels from the perturbed point cloud, the property of equivariance can be encouraged in the learned features. Experiments on synthesized and real-world perturbed datasets show that ECO-3D outperforms most existing pre-training strategies under various downstream tasks, achieving SOTA performance for lots of perturbations.

Downloads

Published

2023-06-26

How to Cite

Wang, R., Ying, X., Xing, B., & Yang, J. (2023). ECO-3D: Equivariant Contrastive Learning for Pre-training on Perturbed 3D Point Cloud. Proceedings of the AAAI Conference on Artificial Intelligence, 37(2), 2626-2634. https://doi.org/10.1609/aaai.v37i2.25361

Issue

Section

AAAI Technical Track on Computer Vision II