Compressing Streamable Free-Viewpoint Videos to 0.1 MB per Frame

Luyang Tang; Jiayu Yang; Rui Peng; Yongqi Zhai; Shihe Shen; Ronggang Wang

doi:10.1609/aaai.v39i7.32780

Authors

Luyang Tang Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, Shenzhen Graduate School, Peking University, China Pengcheng Laboratory, China
Jiayu Yang Pengcheng Laboratory, China
Rui Peng Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, Shenzhen Graduate School, Peking University, China Pengcheng Laboratory, China
Yongqi Zhai Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, Shenzhen Graduate School, Peking University, China Pengcheng Laboratory, China
Shihe Shen Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, Shenzhen Graduate School, Peking University, China
Ronggang Wang Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, Shenzhen Graduate School, Peking University, China Pengcheng Laboratory, China

DOI:

https://doi.org/10.1609/aaai.v39i7.32780

Abstract

The success of 3D Gaussian Splatting (3DGS) in static scenes has inspired numerous attempts to construct Free-Viewpoint Videos (FVVs) of dynamic scenes from multi-view videos. Despite advancements in current techniques, simultaneously achieving photo-realistic view synthesis results, fast on-the-fly training, real-time rendering, and low storage costs remains a formidable problem. To address these challenges, we propose the first Gaussian-based streamable FVV intelligent compression framework named iFVC. Specifically, we utilize an anchor-based Gaussian representation to model the scene. To achieve on-the-fly training, we propose a Binary Transformation Cache (BTC) to model the dynamic changes between adjacent timesteps, which not only ensures compactness but also supports precise bit rate estimation. Furthermore, we carefully design a high-resolution transformation tri-plane assisted by a saliency grid as our BTC, allowing for accurate dynamic capture. The entire pipeline is regarded as a joint optimization of rate and distortion to achieve optimal compression performance. Experiments on widely used datasets demonstrate the state-of-the-art performance of our framework in both synthesis quality and efficiency, i.e., achieving per-frame training in 13 seconds with a storage cost of 0.1 MB and real-time rendering at 120 FPS.

Compressing Streamable Free-Viewpoint Videos to 0.1 MB per Frame

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information