SSCL: Adversarially Guided Image Compression via Semantic and Spectral Consistency Learning

Authors

  • Wei Jiang Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, Shenzhen Graduate School, Peking University
  • Yongqi Zhai Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, Shenzhen Graduate School, Peking University
  • Jiayu Yang Pengcheng Laboratory
  • Bohao Feng Alibaba Cloud Computing
  • Wenqiang Wang Alibaba Cloud Computing
  • Bo Huang Alibaba Cloud Computing
  • Lin Ding Alibaba Cloud Computing
  • Ronggang Wang Shenzhen Graduate School, Peking University Pengcheng Laboratory

DOI:

https://doi.org/10.1609/aaai.v40i17.38515

Abstract

Perceptual image compression has recently gained increasing attention, as it aims to reconstruct visually realistic images using generative models. Most existing methods adopt patch-based generative adversarial networks (PatchGAN) for one-step image generation, where adversarial training helps the decoder learn the distribution of natural images. However, this strategy is often coarse-grained, as it focuses mainly on patch-level consistency and overlooks global structural and semantic details. To address this limitation, we propose a simple yet effective Semantic and Spectral Consistency Learning (SSCL) strategy, which complements existing patch-based approaches for more accurate distribution alignment. For semantic consistency, we leverage semantic vision models to extract semantic features. The semantic discriminator, aware of the specific semantics of each image, provides more adaptive and precise feedback. This guides the encoder to retain meaningful information and helps the decoder synthesize detailed textures, without requiring explicit semantic transmission or additional modules. For spectral consistency, we introduce a frequency discriminator that focuses on high-frequency components, helping to reduce artifacts based on spectral priors. Experiments show that SSCL outperforms existing perceptual codecs in terms of visual quality. Compared to MS-ILLM, SSCL achieves 45% to 60% bit-rate savings on CLIC2020 and Kodak datasets, measured by FID and DISTS.

Published

2026-03-14

How to Cite

Jiang, W., Zhai, Y., Yang, J., Feng, B., Wang, W., Huang, B., … Wang, R. (2026). SSCL: Adversarially Guided Image Compression via Semantic and Spectral Consistency Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(17), 14937–14945. https://doi.org/10.1609/aaai.v40i17.38515

Issue

Section

AAAI Technical Track on Data Mining & Knowledge Management I