VIDM: Video Implicit Diffusion Models

Authors

  • Kangfu Mei Johns Hopkins University
  • Vishal Patel Johns Hopkins University

DOI:

https://doi.org/10.1609/aaai.v37i8.26094

Keywords:

ML: Deep Generative Models & Autoencoders

Abstract

Diffusion models have emerged as a powerful generative method for synthesizing high-quality and diverse set of images. In this paper, we propose a video generation method based on diffusion models, where the effects of motion are modeled in an implicit condition manner, i.e. one can sample plausible video motions according to the latent feature of frames. We improve the quality of the generated videos by proposing multiple strategies such as sampling space truncation, robustness penalty, and positional group normalization. Various experiments are conducted on datasets consisting of videos with different resolutions and different number of frames. Results show that the proposed method outperforms the state-of-the-art generative adversarial network-based methods by a significant margin in terms of FVD scores as well as perceptible visual quality.

Downloads

Published

2023-06-26

How to Cite

Mei, K., & Patel, V. (2023). VIDM: Video Implicit Diffusion Models. Proceedings of the AAAI Conference on Artificial Intelligence, 37(8), 9117-9125. https://doi.org/10.1609/aaai.v37i8.26094

Issue

Section

AAAI Technical Track on Machine Learning III