CatFormer: Category-Level 6D Object Pose Estimation with Transformer

Authors

  • Sheng Yu School of Automation, Beijing Institute of Technology, Beijing, China
  • Di-Hua Zhai School of Automation, Beijing Institute of Technology, Beijing, China Yangtze Delta Region Academy of Beijing Institute of Technology, Jiaxing, China
  • Yuanqing Xia School of Automation, Beijing Institute of Technology, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v38i7.28505

Keywords:

CV: 3D Computer Vision, CV: Applications

Abstract

Although there has been significant progress in category-level object pose estimation in recent years, there is still considerable room for improvement. In this paper, we propose a novel transformer-based category-level 6D pose estimation method called CatFormer to enhance the accuracy pose estimation. CatFormer comprises three main parts: a coarse deformation part, a fine deformation part, and a recurrent refinement part. In the coarse and fine deformation sections, we introduce a transformer-based deformation module that performs point cloud deformation and completion in the feature space. Additionally, after each deformation, we incorporate a transformer-based graph module to adjust fused features and establish geometric and topological relationships between points based on these features. Furthermore, we present an end-to-end recurrent refinement module that enables the prior point cloud to deform multiple times according to real scene features. We evaluate CatFormer's performance by training and testing it on CAMERA25 and REAL275 datasets. Experimental results demonstrate that CatFormer surpasses state-of-the-art methods. Moreover, we extend the usage of CatFormer to instance-level object pose estimation on the LINEMOD dataset, as well as object pose estimation in real-world scenarios. The experimental results validate the effectiveness and generalization capabilities of CatFormer. Our code and the supplemental materials are avaliable at https://github.com/BIT-robot-group/CatFormer.

Published

2024-03-24

How to Cite

Yu, S., Zhai, D.-H., & Xia, Y. (2024). CatFormer: Category-Level 6D Object Pose Estimation with Transformer. Proceedings of the AAAI Conference on Artificial Intelligence, 38(7), 6808-6816. https://doi.org/10.1609/aaai.v38i7.28505

Issue

Section

AAAI Technical Track on Computer Vision VI