PointMC: Multi-view Consistent Encoding and Center-Global Feature Fusion for Point Clouds Understanding

Authors

  • Xinxing Yu Faculty of Innovation Engineering, Macau University of Science and Technology
  • Ajian Liu Faculty of Innovation Engineering, Macau University of Science and Technology MAIS, Institute of Automation, Chinese Academy of Sciences
  • Sunyuan Qiang Southwest Institute of Technical Physics
  • Yuzhong Wang Faculty of Innovation Engineering, Macau University of Science and Technology
  • Hui Ma Faculty of Innovation Engineering, Macau University of Science and Technology School of Computing and Information Technology, Great Bay University
  • Yanyan Liang Faculty of Innovation Engineering, Macau University of Science and Technology

DOI:

https://doi.org/10.1609/aaai.v40i14.38207

Abstract

Point cloud tasks have recently benefited from Mamba-based architecture, which leverage state space modeling to achieve strong performance. Previous studies have primarily focused on network design while overlooking the importance of position encoding and relying on coarse-grained geometric feature aggregation. The former leads to semantic ambiguity due to inconsistent spatial relationships, while the latter results in geometric feature dispersion by overlooking fine-grained local geometric details. To tackle the above problem, we propose a novel framework, PointMC, including Multi-view Consistent Learnable Position Encoding (MCLPE) and Center-Global Feature Fusion (CGFF), to provide semantically coherent positional guidance for inter-patch and enable fine-grained geometric structure aggregation within intra-patch regions. Specifically, the proposed MCLPE module is inspired by a spatial structure modeling mechanism guided by physical constraints, leverages multi-view virtual reconstruction and a learnable strategy to dynamically constrain spatial relationships along patch boundaries, thereby enhancing the semantic consistency and representational clarity across inter-patch regions. Furthermore, considering the lack of local structural information within each patch, the CGFF module employs a dual-guidance mechanism based on center and global structures to effectively promote the aggregation of local geometric features. Extensive experiments on multiple benchmark datasets validate the effectiveness of PointMC, consistently outperforming existing state-of-the-art methods, and demonstrating superior capability in capturing both inter-patch semantic consistency and intra-patch geometric details.

Downloads

Published

2026-03-14

How to Cite

Yu, X., Liu, A., Qiang, S., Wang, Y., Ma, H., & Liang, Y. (2026). PointMC: Multi-view Consistent Encoding and Center-Global Feature Fusion for Point Clouds Understanding. Proceedings of the AAAI Conference on Artificial Intelligence, 40(14), 12169–12177. https://doi.org/10.1609/aaai.v40i14.38207

Issue

Section

AAAI Technical Track on Computer Vision XI