Context-Aware Transformer for 3D Point Cloud Automatic Annotation


  • Xiaoyan Qian The University of Hong Kong
  • Chang Liu The University of Hong Kong
  • Xiaojuan Qi The University of Hong Kong
  • Siew-Chong Tan The University of Hong Kong
  • Edmund Lam The University of Hong Kong
  • Ngai Wong The University of Hong Kong



CV: 3D Computer Vision, CV: Object Detection & Categorization, ML: Classification and Regression


3D automatic annotation has received increased attention since manually annotating 3D point clouds is laborious. However, existing methods are usually complicated, e.g., pipelined training for 3D foreground/background segmentation, cylindrical object proposals, and point completion. Furthermore, they often overlook the inter-object feature correlation that is particularly informative to hard samples for 3D annotation. To this end, we propose a simple yet effective end-to-end Context-Aware Transformer (CAT) as an automated 3D-box labeler to generate precise 3D box annotations from 2D boxes, trained with a small number of human annotations. We adopt the general encoder-decoder architecture, where the CAT encoder consists of an intra-object encoder (local) and an inter-object encoder (global), performing self-attention along the sequence and batch dimensions, respectively. The former models intra-object interactions among points and the latter extracts feature relations among different objects, thus boosting scene-level understanding. Via local and global encoders, CAT can generate high-quality 3D box annotations with a streamlined workflow, allowing it to outperform existing state-of-the-arts by up to 1.79% 3D AP on the hard task of the KITTI test set.




How to Cite

Qian, X., Liu, C., Qi, X., Tan, S.-C., Lam, E., & Wong, N. (2023). Context-Aware Transformer for 3D Point Cloud Automatic Annotation. Proceedings of the AAAI Conference on Artificial Intelligence, 37(2), 2082-2090.



AAAI Technical Track on Computer Vision II