Context-Aware Transformer for 3D Point Cloud Automatic Annotation
DOI:
https://doi.org/10.1609/aaai.v37i2.25301Keywords:
CV: 3D Computer Vision, CV: Object Detection & Categorization, ML: Classification and RegressionAbstract
3D automatic annotation has received increased attention since manually annotating 3D point clouds is laborious. However, existing methods are usually complicated, e.g., pipelined training for 3D foreground/background segmentation, cylindrical object proposals, and point completion. Furthermore, they often overlook the inter-object feature correlation that is particularly informative to hard samples for 3D annotation. To this end, we propose a simple yet effective end-to-end Context-Aware Transformer (CAT) as an automated 3D-box labeler to generate precise 3D box annotations from 2D boxes, trained with a small number of human annotations. We adopt the general encoder-decoder architecture, where the CAT encoder consists of an intra-object encoder (local) and an inter-object encoder (global), performing self-attention along the sequence and batch dimensions, respectively. The former models intra-object interactions among points and the latter extracts feature relations among different objects, thus boosting scene-level understanding. Via local and global encoders, CAT can generate high-quality 3D box annotations with a streamlined workflow, allowing it to outperform existing state-of-the-arts by up to 1.79% 3D AP on the hard task of the KITTI test set.Downloads
Published
2023-06-26
How to Cite
Qian, X., Liu, C., Qi, X., Tan, S.-C., Lam, E., & Wong, N. (2023). Context-Aware Transformer for 3D Point Cloud Automatic Annotation. Proceedings of the AAAI Conference on Artificial Intelligence, 37(2), 2082-2090. https://doi.org/10.1609/aaai.v37i2.25301
Issue
Section
AAAI Technical Track on Computer Vision II