RSGNet: Relation based Skeleton Graph Network for Crowded Scenes Pose Estimation

Authors

  • Yan Dai University of Electronic Science and Technology of China
  • Xuanhan Wang University of Electronic Science and Technology of China
  • Lianli Gao University of Electronic Science and Technology of China
  • Jingkuan Song University of Electronic Science and Technology of China, Key Laboratory of Artificial Intelligence, Ministry of Education
  • Heng Tao Shen University of Electronic Science and Technology of China

DOI:

https://doi.org/10.1609/aaai.v35i2.16206

Keywords:

Biometrics, Face, Gesture & Pose, (Deep) Neural Network Algorithms

Abstract

Despite of the recent great progress on multi-person pose estimation, existing solutions still remain challenging under the condition of "crowded scenes'', where RGB images capture complex real-world scenes with highly-overlapped people, severe occlusions and diverse postures. In this work, we focus on two main problems: 1) how to design an effective pipeline for crowded scenes pose estimation; and 2) how to equip this pipeline with the ability of relation modeling for interference resolving. To tackle these problems, we propose a new pipeline named Relation based Skeleton Graph Network (RSGNet). Unlike existing works that directly predict joints-of-target by labeling joints-of-interference as false positive, we first encourage all joints to be predicted. And then, a Target-aware Relation Parser (TRP) is designed to model the relation over all predicted joints, resulting in a target-aware encoding. This new pipeline will largely relieve the confusion of the joints estimation model when seeing identical joints with totally distinct labels (e.g., the identical hand exists in two bounding boxes). Furthermore, we introduce a Skeleton Graph Machine (SGM) to model the skeleton-based commonsense knowledge, aiming to estimate the target pose with the constraint of human body structure. Such skeleton-based constraint can help to deal with the challenges in crowded scenes from a reasoning perspective. Solid experiments on pose estimation benchmarks demonstrate that our method outperforms existing state-of-the-art methods.

Downloads

Published

2021-05-18

How to Cite

Dai, Y., Wang, X., Gao, L., Song, J., & Shen, H. T. (2021). RSGNet: Relation based Skeleton Graph Network for Crowded Scenes Pose Estimation. Proceedings of the AAAI Conference on Artificial Intelligence, 35(2), 1193-1200. https://doi.org/10.1609/aaai.v35i2.16206

Issue

Section

AAAI Technical Track on Computer Vision I