Modify Self-Attention via Skeleton Decomposition for Effective Point Cloud Transformer

Authors

  • Jiayi Han Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China, Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, China, MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China, Zhangjiang Fudan International Innovation Center.
  • Longbin Zeng Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China, Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, China, MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China, Zhangjiang Fudan International Innovation Center.
  • Liang Du Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China, Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, China, MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China, Zhangjiang Fudan International Innovation Center. Interactive Entertainment Group, Tencent Inc., China.
  • Xiaoqing Ye Baidu Inc., China.
  • Weiyang Ding Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China, Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, China, MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China, Zhangjiang Fudan International Innovation Center.
  • Jianfeng Feng Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China, Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, China, MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China, Zhangjiang Fudan International Innovation Center.

DOI:

https://doi.org/10.1609/aaai.v36i1.19962

Keywords:

Computer Vision (CV)

Abstract

Although considerable progress has been achieved regarding the transformers in recent years, the large number of parameters, quadratic computational complexity, and memory cost conditioned on long sequences make the transformers hard to train and implement, especially in edge computing configurations. In this case, a dizzying number of works have sought to make improvements around computational and memory efficiency upon the original transformer architecture. Nevertheless, many of them restrict the context in the attention to seek a trade-off between cost and performance with prior knowledge of orderly stored data. It is imperative to dig deep into an efficient feature extractor for point clouds due to their irregularity and a large number of points. In this paper, we propose a novel skeleton decomposition-based self-attention (SD-SA) which has no sequence length limit and exhibits favorable scalability in long-sequence models. Due to the numerical low-rank nature of self-attention, we approximate it by the skeleton decomposition method while maintaining its effectiveness. At this point, we have shown that the proposed method works for the proposed approach on point cloud classification, segmentation, and detection tasks on the ModelNet40, ShapeNet, and KITTI datasets, respectively. Our approach significantly improves the efficiency of the point cloud transformer and exceeds other efficient transformers on point cloud tasks in terms of the speed at comparable performance.

Downloads

Published

2022-06-28

How to Cite

Han, J., Zeng, L., Du, L., Ye, X., Ding, W., & Feng, J. (2022). Modify Self-Attention via Skeleton Decomposition for Effective Point Cloud Transformer. Proceedings of the AAAI Conference on Artificial Intelligence, 36(1), 808-816. https://doi.org/10.1609/aaai.v36i1.19962

Issue

Section

AAAI Technical Track on Computer Vision I