Latent Knowledge-Guided Video Diffusion for Scientific Phenomena Generation from a Single Initial Frame

Qinglong Cao; Xirui Li; Ding Wang; Chao Ma; Yuntian Chen; Xiaokang Yang

doi:10.1609/aaai.v40i4.37250

Authors

Qinglong Cao Shanghai Jiao Tong University, Shanghai Eastern Institute of Technology, Ningbo
Xirui Li Shanghai Jiao Tong University, Shanghai
Ding Wang Shanghai Jiao Tong University, Shanghai Eastern Institute of Technology, Ningbo
Chao Ma Shanghai Jiao Tong University, Shanghai
Yuntian Chen Eastern Institute of Technology, Ningbo
Xiaokang Yang Shanghai Jiao Tong University, Shanghai

DOI:

https://doi.org/10.1609/aaai.v40i4.37250

Abstract

Video diffusion models have achieved impressive results in natural scene generation, yet they struggle to generalize to scientific phenomena such as fluid simulations and meteorological processes, where underlying dynamics are governed by scientific laws. These tasks pose unique challenges, including severe domain gaps, limited training data, and the lack of descriptive language annotations. To handle this dilemma, we extracted the latent scientific phenomena knowledge and further proposed a fresh framework that teaches video diffusion models to generate scientific phenomena from a single initial frame. Particularly, static knowledge is extracted via pre-trained masked autoencoders, while dynamic knowledge is derived from pre-trained optical flow prediction. Subsequently, based on the aligned spatial relations between the CLIP vision and language encoders, the visual embeddings of scientific phenomena, guided by latent scientific phenomena knowledge, are projected to generate the pseudo-language prompt embeddings in both spatial and frequency domains. By incorporating these prompts and fine-tuning the video diffusion model, we enable the generation of videos that better adhere to scientific laws. Extensive experiments on both computational fluid dynamics simulations and real-world typhoon observations demonstrate the effectiveness of our approach, achieving superior fidelity and consistency across diverse scientific scenarios.

Latent Knowledge-Guided Video Diffusion for Scientific Phenomena Generation from a Single Initial Frame

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information