CatFormer: Category-Level 6D Object Pose Estimation with Transformer
DOI:
https://doi.org/10.1609/aaai.v38i7.28505Keywords:
CV: 3D Computer Vision, CV: ApplicationsAbstract
Although there has been significant progress in category-level object pose estimation in recent years, there is still considerable room for improvement. In this paper, we propose a novel transformer-based category-level 6D pose estimation method called CatFormer to enhance the accuracy pose estimation. CatFormer comprises three main parts: a coarse deformation part, a fine deformation part, and a recurrent refinement part. In the coarse and fine deformation sections, we introduce a transformer-based deformation module that performs point cloud deformation and completion in the feature space. Additionally, after each deformation, we incorporate a transformer-based graph module to adjust fused features and establish geometric and topological relationships between points based on these features. Furthermore, we present an end-to-end recurrent refinement module that enables the prior point cloud to deform multiple times according to real scene features. We evaluate CatFormer's performance by training and testing it on CAMERA25 and REAL275 datasets. Experimental results demonstrate that CatFormer surpasses state-of-the-art methods. Moreover, we extend the usage of CatFormer to instance-level object pose estimation on the LINEMOD dataset, as well as object pose estimation in real-world scenarios. The experimental results validate the effectiveness and generalization capabilities of CatFormer. Our code and the supplemental materials are avaliable at https://github.com/BIT-robot-group/CatFormer.Downloads
Published
2024-03-24
How to Cite
Yu, S., Zhai, D.-H., & Xia, Y. (2024). CatFormer: Category-Level 6D Object Pose Estimation with Transformer. Proceedings of the AAAI Conference on Artificial Intelligence, 38(7), 6808-6816. https://doi.org/10.1609/aaai.v38i7.28505
Issue
Section
AAAI Technical Track on Computer Vision VI