Two-Stage Octave Residual Network for End-to-End Image Compression

Fangdong Chen; Yumeng Xu; Li Wang

doi:10.1609/aaai.v36i4.20308

Authors

Fangdong Chen Hikvision Research Institute
Yumeng Xu Hikvision Research Institute
Li Wang Hikvision Research Institute

DOI:

https://doi.org/10.1609/aaai.v36i4.20308

Keywords:

Data Mining & Knowledge Management (DMKM), Computer Vision (CV)

Abstract

Octave Convolution (OctConv) is a generic convolutional unit that has already achieved good performances in many computer vision tasks. Recent studies also have shown the potential of applying the OctConv in end-to-end image compression. However, considering the characteristic of image compression task, current works of OctConv may limit the performance of the image compression network due to the loss of spatial information caused by the sampling operations of inter-frequency communication. Besides, the correlation between multi-frequency latents produced by OctConv is not utilized in current architectures. In this paper, to address these problems, we propose a novel Two-stage Octave Residual (ToRes) block which strips the sampling operation from OctConv to strengthen the capability of preserving useful information. Moreover, to capture the redundancy between the multi-frequency latents, a context transfer module is designed. The results show that both ToRes block and the incorporation of context transfer module help to improve the Rate-Distortion performance, and the combination of these two strategies makes our model achieve the state-of-the-art performance and outperform the latest compression standard Versatile Video Coding (VVC) in terms of both PSNR and MS-SSIM.

Two-Stage Octave Residual Network for End-to-End Image Compression

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information