A Cascaded Inception of Inception Network With Attention Modulated Feature Fusion for Human Pose Estimation

Wentao Liu; Jie Chen; Cheng Li; Chen Qian; Xiao Chu; Xiaolin Hu

doi:10.1609/aaai.v32i1.12334

Authors

Wentao Liu Tsinghua University; SenseTime Group Limited
Jie Chen SenseTime Group Limited
Cheng Li SenseTime Group Limited
Chen Qian SenseTime Group Limited
Xiao Chu The Chinese University of Hong Kong
Xiaolin Hu Tsinghua University

DOI:

https://doi.org/10.1609/aaai.v32i1.12334

Keywords:

Inception of Inception, Cascade Joint Network

Abstract

Accurate keypoint localization of human pose needs diversified features: the high level for contextual dependencies and the low level for detailed refinement of joints. However, the importance of the two factors varies from case to case, but how to efficiently use the features is still an open problem. Existing methods have limitations in preserving low level features, adaptively adjusting the importance of different levels of features, and modeling the human perception process. This paper presents three novel techniques step by step to efficiently utilize different levels of features for human pose estimation. Firstly, an inception of inception (IOI) block is designed to emphasize the low level features. Secondly, an attention mechanism is proposed to adjust the importance of individual levels according to the context. Thirdly, a cascaded network is proposed to sequentially localize the joints to enforce message passing from joints of stand-alone parts like head and torso to remote joints like wrist or ankle. Experimental results demonstrate that the proposed method achieves the state-of-the-art performance on both MPII and LSP benchmarks.

A Cascaded Inception of Inception Network With Attention Modulated Feature Fusion for Human Pose Estimation

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information