Context-Guided Adaptive Network for Efficient Human Pose Estimation
Keywords:Biometrics, Face, Gesture & Pose
AbstractAlthough recent work has achieved great progress in human pose estimation (HPE), most methods show limitations in either inference speed or accuracy. In this paper, we propose a fast and accurate end-to-end HPE method, which is specifically designed to overcome the commonly encountered jitter box, defective box and ambiguous box problems of box-based methods, e.g. Mask R-CNN. Concretely, 1) we propose the ROIGuider to aggregate box instance features from all feature levels under the guidance of global context instance information. Further, 2) the proposed Center Line Branch is equipped with a Dichotomy Extended Area algorithm to adaptively expand each instance box area, and Ambiguity Alleviation strategy to eliminate duplicated keypoints. Finally, 3) to achieve efficient multi-scale feature fusion and real-time inference, we design a novel Trapezoidal Network (TNet) backbone. Experimenting on the COCO dataset, our method achieves 68.1 AP at 25.4 fps, and outperforms Mask-RCNN by 8.9 AP at a similar speed. The competitive performance on the HPE and person instance segmentation tasks over the state-of-the-art models show the promise of the proposed method. The source code will be made available at https://github.com/zlcnup/CGANet.
How to Cite
Zhao, L., Wen, J., Wang, P., & Zheng, N. (2021). Context-Guided Adaptive Network for Efficient Human Pose Estimation. Proceedings of the AAAI Conference on Artificial Intelligence, 35(4), 3492-3499. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/16463
AAAI Technical Track on Computer Vision III