SwiftVideo: A Unified Framework for Few-Step Video Generation Through Trajectory-Distribution Alignment
DOI:
https://doi.org/10.1609/aaai.v40i11.37881Abstract
Diffusion-based or flow-based models have achieved significant progress in video synthesis but require multiple iterative sampling steps, which incurs substantial computational overhead. While many distillation methods that are solely based on trajectory-preserving or distribution-matching have been developed to accelerate video generation models, these approaches often suffer from performance breakdown or increased artifacts in few-step settings. To address these limitations, we propose SwiftVideo, a unified and stable distillation framework that combines the advantages of trajectory-preserving and distribution-matching strategies. Our approach introduces continuous-time consistency distillation to ensure precise preservation of ODE trajectories. Subsequently, We propose a dual-perspective alignment encompassing distribution alignment between synthetic and real data along with trajectory alignment across different inference steps. Our method maintains high-quality video generation while substantially reducing the number of inference steps. Quantitative evaluations on the OpenVid-1M benchmark demonstrate that our method significantly outperforms existing approaches in few-step video generation.Published
2026-03-14
How to Cite
Sun, Y., Wu, J., Cao, Y., Xu, C., Wang, Y., Cao, W., Luo, D., Wang, C., & Fu, Y. (2026). SwiftVideo: A Unified Framework for Few-Step Video Generation Through Trajectory-Distribution Alignment. Proceedings of the AAAI Conference on Artificial Intelligence, 40(11), 9233-9241. https://doi.org/10.1609/aaai.v40i11.37881
Issue
Section
AAAI Technical Track on Computer Vision VIII