Two-Stage Octave Residual Network for End-to-End Image Compression


  • Fangdong Chen Hikvision Research Institute
  • Yumeng Xu Hikvision Research Institute
  • Li Wang Hikvision Research Institute



Data Mining & Knowledge Management (DMKM), Computer Vision (CV)


Octave Convolution (OctConv) is a generic convolutional unit that has already achieved good performances in many computer vision tasks. Recent studies also have shown the potential of applying the OctConv in end-to-end image compression. However, considering the characteristic of image compression task, current works of OctConv may limit the performance of the image compression network due to the loss of spatial information caused by the sampling operations of inter-frequency communication. Besides, the correlation between multi-frequency latents produced by OctConv is not utilized in current architectures. In this paper, to address these problems, we propose a novel Two-stage Octave Residual (ToRes) block which strips the sampling operation from OctConv to strengthen the capability of preserving useful information. Moreover, to capture the redundancy between the multi-frequency latents, a context transfer module is designed. The results show that both ToRes block and the incorporation of context transfer module help to improve the Rate-Distortion performance, and the combination of these two strategies makes our model achieve the state-of-the-art performance and outperform the latest compression standard Versatile Video Coding (VVC) in terms of both PSNR and MS-SSIM.




How to Cite

Chen, F., Xu, Y., & Wang, L. (2022). Two-Stage Octave Residual Network for End-to-End Image Compression. Proceedings of the AAAI Conference on Artificial Intelligence, 36(4), 3922-3929.



AAAI Technical Track on Data Mining and Knowledge Management