R^2-Art: Category-Level Articulation Pose Estimation from Single RGB Image via Cascade Render Strategy

Li Zhang; Haonan Jiang; Yukang Huo; Yan Zhong; Jianan Wang; Xue Wang; Rujing Wang; Liu Liu

doi:10.1609/aaai.v39i9.33083

Authors

Li Zhang Hefei Institute of Physical Science, Chinese Academy of Sciences, China University of Science and Technology of China, Hefei, China Astribot, Shenzhen, China
Haonan Jiang Zhejiang University of Technology, Zhejiang, China
Yukang Huo China Agricultural University. Beijing, China
Yan Zhong School of Mathematical Sciences, Peking University. Beijing, China
Jianan Wang Astribot, Shenzhen, China
Xue Wang Hefei Institute of Physical Science, Chinese Academy of Sciences, China
Rujing Wang Hefei Institute of Physical Science, Chinese Academy of Sciences, China
Liu Liu Hefei University of Technology, Hefei, China

DOI:

https://doi.org/10.1609/aaai.v39i9.33083

Abstract

Human life is filled with articulated objects. Previous works for estimating the pose of category-level articulated objects rely on costly 3D point clouds or RGB-D images. In this paper, our goal is to estimate category-level articulation poses from a single RGB image, where we propose R2-Art, a novel category-level Articulation pose estimation framework from a single RGB image and a cascade Render strategy. Given an RGB image as input, R2-Art estimates per-part 6D pose for the articulation. Specifically, we design parallel regression branches tailored to generate camera-to-root translation and rotation. Using the predicted joint states, we perform PC prior transformation and deformation with a joint-centric modeling approach. For further refinement, a cascade render strategy is proposed for projecting the 3D deformed prior onto the 2D mask. Extensive experiments are provided to validate our R2-Art on various datasets ranging from synthetic datasets to real-world scenarios, demonstrating the superior performance and robustness of the R2-Art. We believe that this work has the potential to be applied in many fields including robotics, embodied intelligence, and augmented reality.

R^2-Art: Category-Level Articulation Pose Estimation from Single RGB Image via Cascade Render Strategy

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information