Point2Real: Bridging the Gap between Point Cloud and Realistic Image for Open-World 3D Recognition
DOI:
https://doi.org/10.1609/aaai.v38i4.28088Keywords:
CV: 3D Computer Vision, CV: Large Vision Models, CV: Object Detection & CategorizationAbstract
Recognition in open-world scenarios is an important and challenging field, where Vision-Language Pre-training paradigms have greatly impacted the 2D domain. This inspires a growing interest in introducing 2D pre-trained models, such as CLIP, into the 3D domain to enhance the ability of point cloud understanding. Considering the difference between discrete 3D point clouds and real-world 2D images, reducing the domain gap is crucial. Some recent works project point clouds onto a 2D plane to enable 3D zero-shot capabilities without training. However, this simplistic approach leads to an unclear or even distorted geometric structure, limiting the potential of 2D pre-trained models in 3D. To address the domain gap, we propose Point2Real, a training-free framework based on the realistic rendering technique to automate the transformation of the 3D point cloud domain into the Vision-Language domain. Specifically, Point2Real leverages a shape recovery module that devises an iterative ball-pivoting algorithm to convert point clouds into meshes, narrowing the gap in shape at first. To simulate photo-realistic images, a set of refined textures as candidates is applied for rendering, where the CLIP confidence is utilized to select the suitable one. Moreover, to tackle the viewpoint challenge, a heuristic multi-view adapter is implemented for feature aggregation, which exploits the depth surface as an effective indicator of view-specific discriminability for recognition. We conduct experiments on ModelNet10, ModelNet40, and ScanObjectNN datasets, and the results demonstrate that Point2Real outperforms other approaches in zero-shot and few-shot tasks by a large margin.Downloads
Published
2024-03-24
How to Cite
Li, H., Fu, B., Wang, R., & Chen, X. (2024). Point2Real: Bridging the Gap between Point Cloud and Realistic Image for Open-World 3D Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 38(4), 3055–3063. https://doi.org/10.1609/aaai.v38i4.28088
Issue
Section
AAAI Technical Track on Computer Vision III