T-GVC: Trajectory-Guided Generative Video Coding at Ultra-Low Bitrates

Authors

  • Zhitao Wang Harbin Institute of Technology
  • Hengyu Man Harbin Institute of Technology
  • Wenrui Li Harbin Institute of Technology
  • Xingtao Wang Harbin Institute of Technology
  • Xiaopeng Fan Harbin Institute of Technology Peng Cheng Laboratory Harbin Institute of Technology Suzhou Research Institute
  • Debin Zhao Harbin Institute of Technology

DOI:

https://doi.org/10.1609/aaai.v40i12.38014

Abstract

Recent advances in video generation techniques have given rise to an emerging paradigm of generative video coding for Ultra-Low Bitrate (ULB) scenarios by leveraging powerful generative priors. However, most existing methods are limited by domain specificity (e.g., facial or human videos) or excessive dependence on high-level text guidance, which tend to inadequately capture fine-grained motion details, leading to unrealistic or incoherent reconstructions. To address these challenges, we propose Trajectory-Guided Generative Video Coding (dubbed T-GVC), a novel framework that bridges low-level motion tracking with high-level semantic understanding. T-GVC features a semantic-aware sparse motion sampling pipeline that extracts pixel-wise motion as sparse trajectory points based on their semantic importance, significantly reducing the bitrate while preserving critical temporal semantic information. In addition, by integrating trajectory-aligned loss constraints into diffusion processes, we introduce a training-free guidance mechanism in latent space to ensure physically plausible motion patterns without sacrificing the inherent capabilities of generative models. Experimental results demonstrate that T-GVC outperforms both traditional and neural video codecs under ULB conditions. Furthermore, additional experiments confirm that our framework achieves more precise motion control than existing text-guided methods, paving the way for a novel direction of generative video coding guided by geometric motion modeling.

Downloads

Published

2026-03-14

How to Cite

Wang, Z., Man, H., Li, W., Wang, X., Fan, X., & Zhao, D. (2026). T-GVC: Trajectory-Guided Generative Video Coding at Ultra-Low Bitrates. Proceedings of the AAAI Conference on Artificial Intelligence, 40(12), 10430–10438. https://doi.org/10.1609/aaai.v40i12.38014

Issue

Section

AAAI Technical Track on Computer Vision IX