Synthetic Depth Transfer for Monocular 3D Object Pose Estimation in the Wild


  • Yueying Kao Samsung Research China - Beijing (SRC-B)
  • Weiming Li Samsung Research China - Beijing (SRC-B)
  • Qiang Wang Samsung Research China - Beijing (SRC-B)
  • Zhouchen Lin Peking University
  • Wooshik Kim Samsung Advanced Institute of Technology (SAIT)
  • Sunghoon Hong Samsung Advanced Institute of Technology (SAIT)



Monocular object pose estimation is an important yet challenging computer vision problem. Depth features can provide useful information for pose estimation. However, existing methods rely on real depth images to extract depth features, leading to its difficulty on various applications. In this paper, we aim at extracting RGB and depth features from a single RGB image with the help of synthetic RGB-depth image pairs for object pose estimation. Specifically, a deep convolutional neural network is proposed with an RGB-to-Depth Embedding module and a Synthetic-Real Adaptation module. The embedding module is trained with synthetic pair data to learn a depth-oriented embedding space between RGB and depth images optimized for object pose estimation. The adaptation module is to further align distributions from synthetic to real data. Compared to existing methods, our method does not need any real depth images and can be trained easily with large-scale synthetic data. Extensive experiments and comparisons show that our method achieves best performance on a challenging public PASCAL 3D+ dataset in all the metrics, which substantiates the superiority of our method and the above modules.




How to Cite

Kao, Y., Li, W., Wang, Q., Lin, Z., Kim, W., & Hong, S. (2020). Synthetic Depth Transfer for Monocular 3D Object Pose Estimation in the Wild. Proceedings of the AAAI Conference on Artificial Intelligence, 34(07), 11221-11228.



AAAI Technical Track: Vision