Diffusion Implicit Policy for Unpaired Scene-aware Motion Synthesis

Jingyu Gong; Chong Zhang; Fengqi Liu; Ke Fan; Qianyu Zhou; Xin Tan; Zhizhong Zhang; Yuan Xie

doi:10.1609/aaai.v40i6.42422

Authors

Jingyu Gong School of Computer Science and Technology, East China Normal University, Shanghai, China Chongqing Key Laboratory of Precision Optics, Chongqing Institute of East China Normal University, Chongqing, China Shanghai Key Laboratory of Computer Software Evaluating and Testing, Shanghai, China
Chong Zhang School of Computer Science and Technology, East China Normal University, Shanghai, China
Fengqi Liu School of Computer Science, Shanghai Jiao Tong University, Shanghai, China
Ke Fan School of Computer Science, Shanghai Jiao Tong University, Shanghai, China
Qianyu Zhou College of Computer Science and Technology, Jilin University, Jilin, China
Xin Tan School of Computer Science and Technology, East China Normal University, Shanghai, China Chongqing Key Laboratory of Precision Optics, Chongqing Institute of East China Normal University, Chongqing, China
Zhizhong Zhang School of Computer Science and Technology, East China Normal University, Shanghai, China Shanghai Key Laboratory of Computer Software Evaluating and Testing, Shanghai, China
Yuan Xie School of Computer Science and Technology, East China Normal University, Shanghai, China Chongqing Key Laboratory of Precision Optics, Chongqing Institute of East China Normal University, Chongqing, China

DOI:

https://doi.org/10.1609/aaai.v40i6.42422

Abstract

Scene-aware motion synthesis has been widely researched recently due to its numerous applications. Prevailing methods rely heavily on paired motion-scene data, while it is difficult to generalize to diverse scenes when trained only on a few specific ones. Thus, we propose a unified framework, termed Diffusion Implicit Policy (DIP), for scene-aware motion synthesis, where paired motion-scene data are no longer necessary. In this paper, we disentangle human-scene interaction from motion synthesis during training, and then introduce an interaction-based implicit policy into motion diffusion during inference. Synthesized motion can be derived through iterative diffusion denoising and implicit policy optimization, thus motion naturalness and interaction plausibility can be maintained simultaneously. For long-term motion synthesis, we introduce motion blending in joint rotation power space. The proposed method is evaluated on synthesized scenes with ShapeNet furniture, and real scenes from PROX and Replica. Results show that our framework presents better motion naturalness and interaction plausibility than cutting-edge methods. This also indicates the feasibility of utilizing the DIP for motion synthesis in more general tasks and versatile scenes.

Diffusion Implicit Policy for Unpaired Scene-aware Motion Synthesis

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information