Promptus: Can Prompt Streaming Replace Video Streaming

Authors

  • Jiangkai Wu Wangxuan Institute of Computer Technology, Peking University
  • Liming Liu Wangxuan Institute of Computer Technology, Peking University
  • Yunpeng Tan Wangxuan Institute of Computer Technology, Peking University
  • Junlin Hao Wangxuan Institute of Computer Technology, Peking University
  • Liang Zhang Huawei Technologies Ltd.
  • Xinggong Zhang Wangxuan Institute of Computer Technology, Peking University

DOI:

https://doi.org/10.1609/aaai.v40i13.38040

Abstract

With the exponential growth of video traffic, traditional video streaming systems are approaching their limits in communication capacity. To further reduce bitrate while maintaining quality, we propose Promptus, a disruptive semantic communication system that streams prompts instead of videos. Promptus represents the real-world video with a series of "prompts" for delivery and employs Stable Diffusion to generate the same video at the receiver. To ensure that the generated video is pixel-aligned with the original video, a gradient descent-based prompt fitting framework is proposed. Further, a low-rank decomposition-based bitrate control algorithm is introduced to achieve adaptive bitrate. For inter-frame compression, an interpolation-aware fitting algorithm is proposed. Evaluations across various video genres demonstrate that, compared to H.265, Promptus can achieve more than a 4x bandwidth reduction while preserving the same perceptual quality. On the other hand, at extremely low bitrates, Promptus can enhance the perceptual quality by 0.139 and 0.118 (in LPIPS) compared to VAE and H.265, respectively, and decreases the ratio of severely distorted frames by 89.3% and 91.7%. Our work opens up a new paradigm for efficient video communication.

Published

2026-03-14

How to Cite

Wu, J., Liu, L., Tan, Y., Hao, J., Zhang, L., & Zhang, X. (2026). Promptus: Can Prompt Streaming Replace Video Streaming. Proceedings of the AAAI Conference on Artificial Intelligence, 40(13), 10664-10672. https://doi.org/10.1609/aaai.v40i13.38040

Issue

Section

AAAI Technical Track on Computer Vision X