CAMSIC: Content-aware Masked Image Modeling Transformer for Stereo Image Compression

Authors

  • Xinjie Zhang The Hong Kong University of Science and Technology, SenseTime Research
  • Shenyuan Gao The Hong Kong University of Science and Technology
  • Zhening Liu The Hong Kong University of Science and Technology
  • Jiawei Shao The Hong Kong University of Science and Technology, Institute of Artificial Intelligence (TeleAI), China Telecom
  • Xingtong Ge Sensetime Research
  • Dailan He The Chinese University of Hong Kong
  • Tongda Xu Institute for AI Industry Research (AIR), Tsinghua University
  • Yan Wang Institute for AI Industry Research (AIR), Tsinghua University
  • Jun Zhang The Hong Kong University of Science and Technology

DOI:

https://doi.org/10.1609/aaai.v39i10.33111

Abstract

Existing learning-based stereo image codec adopt sophisticated transformation with simple entropy models derived from single image codecs to encode latent representations. However, those entropy models struggle to effectively capture the spatial-disparity characteristics inherent in stereo images, which leads to suboptimal rate-distortion results. In this paper, we propose a stereo image compression framework, named CAMSIC. CAMSIC independently transforms each image to latent representation and employs a powerful decoder-free Transformer entropy model to capture both spatial and disparity dependencies, by introducing a novel content-aware masked image modeling (MIM) technique. Our content-aware MIM facilitates efficient bidirectional interaction between prior information and estimated tokens, which naturally obviates the need for an extra Transformer decoder. Experiments show that our stereo image codec achieves state-of-the-art rate-distortion performance on two stereo image datasets Cityscapes and InStereo2K with fast encoding and decoding speed.

Published

2025-04-11

How to Cite

Zhang, X., Gao, S., Liu, Z., Shao, J., Ge, X., He, D., … Zhang, J. (2025). CAMSIC: Content-aware Masked Image Modeling Transformer for Stereo Image Compression. Proceedings of the AAAI Conference on Artificial Intelligence, 39(10), 10239–10247. https://doi.org/10.1609/aaai.v39i10.33111

Issue

Section

AAAI Technical Track on Computer Vision IX