Point2Real: Bridging the Gap between Point Cloud and Realistic Image for Open-World 3D Recognition

Hanxuan Li; Bin Fu; Ruiping Wang; Xilin Chen

doi:10.1609/aaai.v38i4.28088

Authors

Hanxuan Li Institute of Computing Technology, Chinese Academy of Sciences University of Chinese Academy of Sciences
Bin Fu Institute of Computing Technology, Chinese Academy of Sciences University of Chinese Academy of Sciences
Ruiping Wang Institute of Computing Technology, Chinese Academy of Sciences University of Chinese Academy of Sciences
Xilin Chen Institute of Computing Technology, Chinese Academy of Sciences University of Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v38i4.28088

Keywords:

CV: 3D Computer Vision, CV: Large Vision Models, CV: Object Detection & Categorization

Abstract

Recognition in open-world scenarios is an important and challenging field, where Vision-Language Pre-training paradigms have greatly impacted the 2D domain. This inspires a growing interest in introducing 2D pre-trained models, such as CLIP, into the 3D domain to enhance the ability of point cloud understanding. Considering the difference between discrete 3D point clouds and real-world 2D images, reducing the domain gap is crucial. Some recent works project point clouds onto a 2D plane to enable 3D zero-shot capabilities without training. However, this simplistic approach leads to an unclear or even distorted geometric structure, limiting the potential of 2D pre-trained models in 3D. To address the domain gap, we propose Point2Real, a training-free framework based on the realistic rendering technique to automate the transformation of the 3D point cloud domain into the Vision-Language domain. Specifically, Point2Real leverages a shape recovery module that devises an iterative ball-pivoting algorithm to convert point clouds into meshes, narrowing the gap in shape at first. To simulate photo-realistic images, a set of refined textures as candidates is applied for rendering, where the CLIP confidence is utilized to select the suitable one. Moreover, to tackle the viewpoint challenge, a heuristic multi-view adapter is implemented for feature aggregation, which exploits the depth surface as an effective indicator of view-specific discriminability for recognition. We conduct experiments on ModelNet10, ModelNet40, and ScanObjectNN datasets, and the results demonstrate that Point2Real outperforms other approaches in zero-shot and few-shot tasks by a large margin.

Point2Real: Bridging the Gap between Point Cloud and Realistic Image for Open-World 3D Recognition

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information