Compressing Streamable Free-Viewpoint Videos to 0.1 MB per Frame

Authors

  • Luyang Tang Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, Shenzhen Graduate School, Peking University, China Pengcheng Laboratory, China
  • Jiayu Yang Pengcheng Laboratory, China
  • Rui Peng Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, Shenzhen Graduate School, Peking University, China Pengcheng Laboratory, China
  • Yongqi Zhai Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, Shenzhen Graduate School, Peking University, China Pengcheng Laboratory, China
  • Shihe Shen Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, Shenzhen Graduate School, Peking University, China
  • Ronggang Wang Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, Shenzhen Graduate School, Peking University, China Pengcheng Laboratory, China

DOI:

https://doi.org/10.1609/aaai.v39i7.32780

Abstract

The success of 3D Gaussian Splatting (3DGS) in static scenes has inspired numerous attempts to construct Free-Viewpoint Videos (FVVs) of dynamic scenes from multi-view videos. Despite advancements in current techniques, simultaneously achieving photo-realistic view synthesis results, fast on-the-fly training, real-time rendering, and low storage costs remains a formidable problem. To address these challenges, we propose the first Gaussian-based streamable FVV intelligent compression framework named iFVC. Specifically, we utilize an anchor-based Gaussian representation to model the scene. To achieve on-the-fly training, we propose a Binary Transformation Cache (BTC) to model the dynamic changes between adjacent timesteps, which not only ensures compactness but also supports precise bit rate estimation. Furthermore, we carefully design a high-resolution transformation tri-plane assisted by a saliency grid as our BTC, allowing for accurate dynamic capture. The entire pipeline is regarded as a joint optimization of rate and distortion to achieve optimal compression performance. Experiments on widely used datasets demonstrate the state-of-the-art performance of our framework in both synthesis quality and efficiency, i.e., achieving per-frame training in 13 seconds with a storage cost of 0.1 MB and real-time rendering at 120 FPS.

Downloads

Published

2025-04-11

How to Cite

Tang, L., Yang, J., Peng, R., Zhai, Y., Shen, S., & Wang, R. (2025). Compressing Streamable Free-Viewpoint Videos to 0.1 MB per Frame. Proceedings of the AAAI Conference on Artificial Intelligence, 39(7), 7257–7265. https://doi.org/10.1609/aaai.v39i7.32780

Issue

Section

AAAI Technical Track on Computer Vision VI