Point Deformable Network with Enhanced Normal Embedding for Point Cloud Analysis

Authors

  • Xingyilang Yin Xidian University
  • Xi Yang Xidian University
  • Liangchen Liu Xidian University
  • Nannan Wang Xidian University
  • Xinbo Gao Chongqing University of Posts and Telecommunications

DOI:

https://doi.org/10.1609/aaai.v38i7.28497

Keywords:

CV: 3D Computer Vision, CV: Scene Analysis & Understanding, CV: Segmentation

Abstract

Recently MLP-based methods have shown strong performance in point cloud analysis. Simple MLP architectures are able to learn geometric features in local point groups yet fail to model long-range dependencies directly. In this paper, we propose Point Deformable Network (PDNet), a concise MLP-based network that can capture long-range relations with strong representation ability. Specifically, we put forward Point Deformable Aggregation Module (PDAM) to improve representation capability in both long-range dependency and adaptive aggregation among points. For each query point, PDAM aggregates information from deformable reference points rather than points in limited local areas. The deformable reference points are generated data-dependent, and we initialize them according to the input point positions. Additional offsets and modulation scalars are learned on the whole point features, which shift the deformable reference points to the regions of interest. We also suggest estimating the normal vector for point clouds and applying Enhanced Normal Embedding (ENE) to the geometric extractors to improve the representation ability of single-point. Extensive experiments and ablation studies on various benchmarks demonstrate the effectiveness and superiority of our PDNet.

Published

2024-03-24

How to Cite

Yin, X., Yang, X., Liu, L., Wang, N., & Gao, X. (2024). Point Deformable Network with Enhanced Normal Embedding for Point Cloud Analysis. Proceedings of the AAAI Conference on Artificial Intelligence, 38(7), 6738-6746. https://doi.org/10.1609/aaai.v38i7.28497

Issue

Section

AAAI Technical Track on Computer Vision VI