SwiftVideo: A Unified Framework for Few-Step Video Generation Through Trajectory-Distribution Alignment

Yanxiao Sun; Jiafu Wu; Yun Cao; Chengming Xu; Yabiao Wang; Weijian Cao; Donghao Luo; Chengjie Wang; Yanwei Fu

doi:10.1609/aaai.v40i11.37881

Authors

Yanxiao Sun Fudan University
Jiafu Wu Tencent Youtu Lab
Yun Cao Tencent Youtu Lab
Chengming Xu Tencent Youtu Lab
Yabiao Wang Tencent Youtu Lab
Weijian Cao Tencent Youtu Lab
Donghao Luo Tencent Youtu Lab
Chengjie Wang Tencent Youtu Lab
Yanwei Fu Fudan University Shanghai Innovation Institute

DOI:

https://doi.org/10.1609/aaai.v40i11.37881

Abstract

Diffusion-based or flow-based models have achieved significant progress in video synthesis but require multiple iterative sampling steps, which incurs substantial computational overhead. While many distillation methods that are solely based on trajectory-preserving or distribution-matching have been developed to accelerate video generation models, these approaches often suffer from performance breakdown or increased artifacts in few-step settings. To address these limitations, we propose SwiftVideo, a unified and stable distillation framework that combines the advantages of trajectory-preserving and distribution-matching strategies. Our approach introduces continuous-time consistency distillation to ensure precise preservation of ODE trajectories. Subsequently, We propose a dual-perspective alignment encompassing distribution alignment between synthetic and real data along with trajectory alignment across different inference steps. Our method maintains high-quality video generation while substantially reducing the number of inference steps. Quantitative evaluations on the OpenVid-1M benchmark demonstrate that our method significantly outperforms existing approaches in few-step video generation.

SwiftVideo: A Unified Framework for Few-Step Video Generation Through Trajectory-Distribution Alignment

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information