T-GVC: Trajectory-Guided Generative Video Coding at Ultra-Low Bitrates
DOI:
https://doi.org/10.1609/aaai.v40i12.38014Abstract
Recent advances in video generation techniques have given rise to an emerging paradigm of generative video coding for Ultra-Low Bitrate (ULB) scenarios by leveraging powerful generative priors. However, most existing methods are limited by domain specificity (e.g., facial or human videos) or excessive dependence on high-level text guidance, which tend to inadequately capture fine-grained motion details, leading to unrealistic or incoherent reconstructions. To address these challenges, we propose Trajectory-Guided Generative Video Coding (dubbed T-GVC), a novel framework that bridges low-level motion tracking with high-level semantic understanding. T-GVC features a semantic-aware sparse motion sampling pipeline that extracts pixel-wise motion as sparse trajectory points based on their semantic importance, significantly reducing the bitrate while preserving critical temporal semantic information. In addition, by integrating trajectory-aligned loss constraints into diffusion processes, we introduce a training-free guidance mechanism in latent space to ensure physically plausible motion patterns without sacrificing the inherent capabilities of generative models. Experimental results demonstrate that T-GVC outperforms both traditional and neural video codecs under ULB conditions. Furthermore, additional experiments confirm that our framework achieves more precise motion control than existing text-guided methods, paving the way for a novel direction of generative video coding guided by geometric motion modeling.Published
2026-03-14
How to Cite
Wang, Z., Man, H., Li, W., Wang, X., Fan, X., & Zhao, D. (2026). T-GVC: Trajectory-Guided Generative Video Coding at Ultra-Low Bitrates. Proceedings of the AAAI Conference on Artificial Intelligence, 40(12), 10430–10438. https://doi.org/10.1609/aaai.v40i12.38014
Issue
Section
AAAI Technical Track on Computer Vision IX