Learned Video Compression via Joint Spatial-Temporal Correlation Exploration


  • Haojie Liu Nanjing University
  • Han Shen Horizon Robotics
  • Lichao Huang Horizon Robotics
  • Ming Lu Nanjing University
  • Tong Chen Nanjing University
  • Zhan Ma Nanjing University




Traditional video compression technologies have been developed over decades in pursuit of higher coding efficiency. Efficient temporal information representation plays a key role in video coding. Thus, in this paper, we propose to exploit the temporal correlation using both first-order optical flow and second-order flow prediction. We suggest an one-stage learning approach to encapsulate flow as quantized features from consecutive frames which is then entropy coded with adaptive contexts conditioned on joint spatial-temporal priors to exploit second-order correlations. Joint priors are embedded in autoregressive spatial neighbors, co-located hyper elements and temporal neighbors using ConvLSTM recurrently. We evaluate our approach for the low-delay scenario with High-Efficiency Video Coding (H.265/HEVC), H.264/AVC and another learned video compression method, following the common test settings. Our work offers the state-of-the-art performance, with consistent gains across all popular test sequences.




How to Cite

Liu, H., Shen, H., Huang, L., Lu, M., Chen, T., & Ma, Z. (2020). Learned Video Compression via Joint Spatial-Temporal Correlation Exploration. Proceedings of the AAAI Conference on Artificial Intelligence, 34(07), 11580-11587. https://doi.org/10.1609/aaai.v34i07.6825



AAAI Technical Track: Vision