DFDNet: Disentangling and Filtering Dynamics for Enhanced Video Prediction

Authors

  • Lianqiang Gan School of Aeronautics and Astronautics, University of Electronic Science and Technology of China
  • Junyu Lai School of Aeronautics and Astronautics, University of Electronic Science and Technology of China Aircraft Swarm Intelligent Sensing and Cooperative Control Key Laboratory of Sichuan Province
  • Jingze Ju School of Aeronautics and Astronautics, University of Electronic Science and Technology of China
  • Lianli Gao School of Computer Science and Engineering, University of Electronic Science and Technology of China
  • Yi Bin School of Computer Science and Technology, Tongji University

DOI:

https://doi.org/10.1609/aaai.v39i3.32314

Abstract

Videos inherently contain complex temporal dynamics across various spatial directions, often entangled in ways that obscure effective dynamic extraction. Previous studies typically process video spatiotemporal features without disentangling, which hampers their ability to extract dynamic information. Additionally, the extraction of dynamics is disrupted by transient high-dynamic information in video sequences, e.g., noise or flicker, which has received limited attention in the literature. To tackle those problems, this paper proposes the Disentangling and Filtering Dynamics Network (DFDNet). Firstly, to disentangle the interwoven dynamics, DFDNet decomposes the spatially encoded video sequences into lower dimensional sequences. Secondly, a learnable threshold filter is proposed to eliminate the transient high-dynamic information. Thirdly, the model incorporates an MLP to extract the temporal dependencies from the disentangled and filtered sequences. DFDNet demonstrates competitive performance across four chosen datasets, including both low and high-resolution videos. Specifically, on the low-resolution Moving MNIST dataset, DFDNet achieves a 19% improvement on MSE over the previous state-of-the-art model. On the high-resolution SJTU4K dataset, it outperforms the previous state-of-the-art model by 10% on the LPIPS metric under similar inference time.

Downloads

Published

2025-04-11

How to Cite

Gan, L., Lai, J., Ju, J., Gao, L., & Bin, Y. (2025). DFDNet: Disentangling and Filtering Dynamics for Enhanced Video Prediction. Proceedings of the AAAI Conference on Artificial Intelligence, 39(3), 3059–3067. https://doi.org/10.1609/aaai.v39i3.32314

Issue

Section

AAAI Technical Track on Computer Vision II