Follow Your Pose: Pose-Guided Text-to-Video Generation Using Pose-Free Videos

Yue Ma; Yingqing He; Xiaodong Cun; Xintao Wang; Siran Chen; Xiu Li; Qifeng Chen

doi:10.1609/aaai.v38i5.28206

Authors

Yue Ma Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
Yingqing He The Hong Kong University of Science and Technology, Hong Kong
Xiaodong Cun Tencent AI Lab, Shenzhen, China
Xintao Wang Tencent AI Lab, Shenzhen, China
Siran Chen Shenzhen Institute of Advanced Technology, Chinese Academy of Science, Shenzhen, China
Xiu Li Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
Qifeng Chen The Hong Kong University of Science and Technology, Hong Kong

DOI:

https://doi.org/10.1609/aaai.v38i5.28206

Keywords:

CV: Computational Photography, Image & Video Synthesis, CV: Language and Vision

Abstract

Generating text-editable and pose-controllable character videos have an imperious demand in creating various digital human. Nevertheless, this task has been restricted by the absence of a comprehensive dataset featuring paired video-pose captions and the generative prior models for videos. In this work, we design a novel two-stage training scheme that can utilize easily obtained datasets (i.e., image pose pair and pose-free video) and the pre-trained text-to-image (T2I) model to obtain the pose-controllable character videos. Specifically, in the first stage, only the keypoint image pairs are used only for a controllable text-to-image generation. We learn a zero-initialized convolutional encoder to encode the pose information. In the second stage, we finetune the motion of the above network via a pose-free video dataset by adding the learnable temporal self-attention and reformed cross-frame self-attention blocks. Powered by our new designs, our method successfully generates continuously pose-controllable character videos while keeps the editing and concept composition ability of the pre-trained T2I model. The code and models are available on https://follow-your-pose.github.io/.

Follow Your Pose: Pose-Guided Text-to-Video Generation Using Pose-Free Videos

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription