A Cascaded Inception of Inception Network With Attention Modulated Feature Fusion for Human Pose Estimation

Authors

  • Wentao Liu Tsinghua University;¬†SenseTime Group Limited
  • Jie Chen SenseTime Group Limited
  • Cheng Li SenseTime Group Limited
  • Chen Qian SenseTime Group Limited
  • Xiao Chu The Chinese University of Hong Kong
  • Xiaolin Hu Tsinghua University

Keywords:

Inception of Inception, Cascade Joint Network

Abstract

Accurate keypoint localization of human pose needs diversified features: the high level for contextual dependencies and the low level for detailed refinement of joints. However, the importance of the two factors varies from case to case, but how to efficiently use the features is still an open problem. Existing methods have limitations in preserving low level features, adaptively adjusting the importance of different levels of features, and modeling the human perception process. This paper presents three novel techniques step by step to efficiently utilize different levels of features for human pose estimation. Firstly, an inception of inception (IOI) block is designed to emphasize the low level features. Secondly, an attention mechanism is proposed to adjust the importance of individual levels according to the context. Thirdly, a cascaded network is proposed to sequentially localize the joints to enforce message passing from joints of stand-alone parts like head and torso to remote joints like wrist or ankle. Experimental results demonstrate that the proposed method achieves the state-of-the-art performance on both MPII and LSP benchmarks.

Downloads

Published

2018-04-27

How to Cite

Liu, W., Chen, J., Li, C., Qian, C., Chu, X., & Hu, X. (2018). A Cascaded Inception of Inception Network With Attention Modulated Feature Fusion for Human Pose Estimation. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1). Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/12334