PipeDiT: Accelerating Diffusion Transformers in Video Generation with Task Pipelining and Model Decoupling

Sijie Wang; Qiang Wang; Shaohuai Shi

doi:10.1609/aaai.v40i12.37976

Authors

Sijie Wang Harbin Institute of Technology，Shenzhen
Qiang Wang Harbin Institute of Technology，Shenzhen
Shaohuai Shi Harbin Institute of Technology，Shenzhen

DOI:

https://doi.org/10.1609/aaai.v40i12.37976

Abstract

Video generation has been advancing rapidly, and diffusion transformer (DiT) based models have demonstrated remarkable capabilities. However, their practical deployment is often hindered by slow inference speeds and high memory consumption. In this paper, we propose a novel pipelining framework named PipeDiT to accelerate video generation, which is equipped with three main innovations. First, we design a pipelining algorithm (PipeSP) for sequence parallelism (SP) to enable the computation of latent generation and communication among multiple GPUs to be pipelined, thus reducing the inference latency. Second, we propose DeDiVAE to decouple the diffusion module and the VAE module into two GPU groups whose executions can also be pipelined to reduce the memory consumption and inference latency. Third, to better utilize the GPU resources in the VAE group, we propose an attention co-processing (Aco) method to further reduce the overall video generation latency. We integrate our PipeDiT into both OpenSoraPlan and HunyuanVideo, two state-of-the-art open-source video generation frameworks, and conduct extensive experiments on two 8-GPU systems. Experimental results show that, under many common resolution and timestep configurations, our PipeDiT achieves 1.06× to 4.02× speedups over OpenSoraPlan and HunyuanVideo.

PipeDiT: Accelerating Diffusion Transformers in Video Generation with Task Pipelining and Model Decoupling

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information