R^2-Art: Category-Level Articulation Pose Estimation from Single RGB Image via Cascade Render Strategy

Authors

  • Li Zhang Hefei Institute of Physical Science, Chinese Academy of Sciences, China University of Science and Technology of China, Hefei, China Astribot, Shenzhen, China
  • Haonan Jiang Zhejiang University of Technology, Zhejiang, China
  • Yukang Huo China Agricultural University. Beijing, China
  • Yan Zhong School of Mathematical Sciences, Peking University. Beijing, China
  • Jianan Wang Astribot, Shenzhen, China
  • Xue Wang Hefei Institute of Physical Science, Chinese Academy of Sciences, China
  • Rujing Wang Hefei Institute of Physical Science, Chinese Academy of Sciences, China
  • Liu Liu Hefei University of Technology, Hefei, China

DOI:

https://doi.org/10.1609/aaai.v39i9.33083

Abstract

Human life is filled with articulated objects. Previous works for estimating the pose of category-level articulated objects rely on costly 3D point clouds or RGB-D images. In this paper, our goal is to estimate category-level articulation poses from a single RGB image, where we propose R2-Art, a novel category-level Articulation pose estimation framework from a single RGB image and a cascade Render strategy. Given an RGB image as input, R2-Art estimates per-part 6D pose for the articulation. Specifically, we design parallel regression branches tailored to generate camera-to-root translation and rotation. Using the predicted joint states, we perform PC prior transformation and deformation with a joint-centric modeling approach. For further refinement, a cascade render strategy is proposed for projecting the 3D deformed prior onto the 2D mask. Extensive experiments are provided to validate our R2-Art on various datasets ranging from synthetic datasets to real-world scenarios, demonstrating the superior performance and robustness of the R2-Art. We believe that this work has the potential to be applied in many fields including robotics, embodied intelligence, and augmented reality.

Downloads

Published

2025-04-11

How to Cite

Zhang, L., Jiang, H., Huo, Y., Zhong, Y., Wang, J., Wang, X., … Liu, L. (2025). R^2-Art: Category-Level Articulation Pose Estimation from Single RGB Image via Cascade Render Strategy. Proceedings of the AAAI Conference on Artificial Intelligence, 39(9), 9985–9993. https://doi.org/10.1609/aaai.v39i9.33083

Issue

Section

AAAI Technical Track on Computer Vision VIII