PointChain: Learning Generalizable Point Cloud Representations via Structural Chain Modeling
DOI:
https://doi.org/10.1609/aaai.v40i12.37961Abstract
Recent advances in point cloud analysis have increasingly leveraged large-scale unlabeled data through self-supervised representation learning. Autoregressive models based on next-token prediction have shown strong performance, but they usually model point clouds as linear sequences, ignoring their inherent spatial structure. To address this limitation, we propose PointChain, a novel autoregressive paradigm inspired by human perception mechanisms, designed to better align with the structural properties of point cloud. Specifically, we introduce structural chain encoding, which models the understanding process as a global-to-local structural chain inference, preserving spatial relationships throughout the prediction sequence. During pre-training, we design two auxiliary tasks: a next-scale prediction task that encourages cross-scale reasoning, and a scale-level contrastive learning task that promotes semantic consistency across scales. These components guide the model to learn more discriminative and generalizable point cloud representations. Experiments on multiple benchmarks, using both Transformer and Mamba backbones, validate the effectiveness of our approach. PointChain achieves state-of-the-art performance on several downstream tasks, including 93.75% accuracy on the hardest split of ScanObjectNN without voting strategy.Published
2026-03-14
How to Cite
Wang, L., Wang, C., Li, Q., & Zhang, T. (2026). PointChain: Learning Generalizable Point Cloud Representations via Structural Chain Modeling. Proceedings of the AAAI Conference on Artificial Intelligence, 40(12), 9957-9965. https://doi.org/10.1609/aaai.v40i12.37961
Issue
Section
AAAI Technical Track on Computer Vision IX