DiffusionPose: Markov-Optimized Diffusion Model for Human Pose Estimation

Zhigang Wang; Zhenguang Liu; Shaojing Fan; Sifan Wu; Yingying Jiao

doi:10.1609/aaai.v40i12.38012

Authors

Zhigang Wang The State Key Laboratory of Blockchain and Data Security, Zhejiang University
Zhenguang Liu The State Key Laboratory of Blockchain and Data Security, Zhejiang University Shandong Rendui Network Co., Ltd. Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security
Shaojing Fan Department of Electrical and Computer Engineering, National University of Singapore
Sifan Wu College of Computer Science and Technology, Jilin University Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University
Yingying Jiao College of Computer Science and Technology, Zhejiang University of Technology

DOI:

https://doi.org/10.1609/aaai.v40i12.38012

Abstract

Video-based human pose estimation has long been a nontrivial task due to its dynamic nature and challenging detection scenarios such as occlusion and defocus. Inspired by the success of diffusion models, researchers have applied them to video pose estimation, outperforming traditional joint detection methods. However, existing diffusion model-based methods still face challenges like slow convergence and unstable pose generation. To tackle these issues, we propose DiffusionPose, a novel framework for video pose estimation that integrates diffusion models with optimization strategies: (1) We combine the emerging Mamba with Transformers to balance global and local spatio-temporal modeling. (2) We integrate Markov Random Fields into the reverse diffusion process to enhance the denoising of pose heatmaps, particularly addressing the issue of confused generation of occluded joints. (3) We mathematically formulate a Markov objective to supervise the heatmap denoising process, enabling the model to generate anatomically plausible skeletons. Our method achieves state-of-the-art performance on three large-scale benchmark datasets. Interestingly, it shows surprising robustness in challenging video scenarios, improving the accuracy of the most difficult ankle joint by 16.9% compared to the previous best diffusion model-based method on the Challenging-PoseTrack dataset.

DiffusionPose: Markov-Optimized Diffusion Model for Human Pose Estimation

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information