End-to-End RGB-D Image Compression via Exploiting Channel-Modality Redundancy

Authors

  • Huiming Zheng School of Electronic and Computer Engineering, Shenzhen Graduate School, Peking University, Shenzhen, China Peng Cheng Laboratory, China
  • Wei Gao School of Electronic and Computer Engineering, Shenzhen Graduate School, Peking University, Shenzhen, China Peng Cheng Laboratory, China

DOI:

https://doi.org/10.1609/aaai.v38i7.28588

Keywords:

CV: Low Level & Physics-based Vision, CV: Multi-modal Vision

Abstract

As a kind of 3D data, RGB-D images have been extensively used in object tracking, 3D reconstruction, remote sensing mapping, and other tasks. In the realm of computer vision, the significance of RGB-D images is progressively growing. However, the existing learning-based image compression methods usually process RGB images and depth images separately, which cannot entirely exploit the redundant information between the modalities, limiting the further improvement of the Rate-Distortion performance. With the goal of overcoming the defect, in this paper, we propose a learning-based dual-branch RGB-D image compression framework. Compared with traditional RGB domain compression scheme, a YUV domain compression scheme is presented for spatial redundancy removal. In addition, Intra-Modality Attention (IMA) and Cross-Modality Attention (CMA) are introduced for modal redundancy removal. For the sake of benefiting from cross-modal prior information, Context Prediction Module (CPM) and Context Fusion Module (CFM) are raised in the conditional entropy model which makes the context probability prediction more accurate. The experimental results demonstrate our method outperforms existing image compression methods in two RGB-D image datasets. Compared with BPG, our proposed framework can achieve up to 15% bit rate saving for RGB images.

Published

2024-03-24

How to Cite

Zheng, H., & Gao, W. (2024). End-to-End RGB-D Image Compression via Exploiting Channel-Modality Redundancy. Proceedings of the AAAI Conference on Artificial Intelligence, 38(7), 7562-7570. https://doi.org/10.1609/aaai.v38i7.28588

Issue

Section

AAAI Technical Track on Computer Vision VI