Improved MLP Point Cloud Processing with High-Dimensional Positional Encoding

Authors

  • Yanmei Zou Hunan University
  • Hongshan Yu Hunan University
  • Zhengeng Yang Hunan Normal University
  • Zechuan Li Hunan University
  • Naveed Akhtar The University of Melbourne

DOI:

https://doi.org/10.1609/aaai.v38i7.28625

Keywords:

CV: 3D Computer Vision, CV: Scene Analysis & Understanding, CV: Segmentation

Abstract

Multi-Layer Perceptron (MLP) models are the bedrock of contemporary point cloud processing. However, their complex network architectures obscure the source of their strength. We first develop an “abstraction and refinement” (ABS-REF) view for the neural modeling of point clouds. This view elucidates that whereas the early models focused on the ABS stage, the more recent techniques devise sophisticated REF stages to attain performance advantage in point cloud processing. We then borrow the concept of “positional encoding” from transformer literature, and propose a High-dimensional Positional Encoding (HPE) module, which can be readily deployed to MLP based architectures. We leverage our module to develop a suite of HPENet, which are MLP networks that follow ABS-REF paradigm, albeit with a sophisticated HPE based REF stage. The developed technique is extensively evaluated for 3D object classification, object part segmentation, semantic segmentation and object detection. We establish new state-of-the-art results of 87.6 mAcc on ScanObjectNN for object classification, and 85.5 class mIoU on ShapeNetPart for object part segmentation, and 72.7 and 78.7 mIoU on Area-5 and 6-fold experiments with S3DIS for semantic segmentation. The source code for this work is available at https://github.com/zouyanmei/HPENet.

Published

2024-03-24

How to Cite

Zou, Y., Yu, H., Yang, Z., Li, Z., & Akhtar, N. (2024). Improved MLP Point Cloud Processing with High-Dimensional Positional Encoding. Proceedings of the AAAI Conference on Artificial Intelligence, 38(7), 7891-7899. https://doi.org/10.1609/aaai.v38i7.28625

Issue

Section

AAAI Technical Track on Computer Vision VI