Prompt-SID: Learning Structural Representation Prompt via Latent Diffusion for Single Image Denoising

Authors

  • Huaqiu Li Tsinghua University
  • Wang Zhang Tsinghua University
  • Xiaowan Hu Tsinghua University
  • Tao Jiang Tsinghua University
  • Zikang Chen Tsinghua University
  • Haoqian Wang Tsinghua University

DOI:

https://doi.org/10.1609/aaai.v39i5.32500

Abstract

Many studies have concentrated on constructing supervised models utilizing paired datasets for image denoising, which proves to be expensive and time-consuming. Current self-supervised and unsupervised approaches typically rely on blind-spot networks or sub-image pairs sampling, resulting in pixel information loss and destruction of detailed structural information, thereby significantly constraining the efficacy of such methods. In this paper, we introduce Prompt-SID, a prompt-learning-based single image denoising framework that emphasizes the preservation of structural details. This approach is trained in a self-supervised manner using downsampled image pairs. It captures original-scale image information through structural encoding and integrates this prompt into the denoiser. To achieve this, we propose a structural representation generation model based on the latent diffusion process and design a structural attention module within the transformer-based denoiser architecture to decode the prompt. Additionally, we introduce a scale replay training mechanism, which effectively mitigates the scale gap from images of different resolutions. We conduct comprehensive experiments on synthetic, real-world, and fluorescence imaging datasets, showcasing the remarkable effectiveness of Prompt-SID.

Downloads

Published

2025-04-11

How to Cite

Li, H., Zhang, W., Hu, X., Jiang, T., Chen, Z., & Wang, H. (2025). Prompt-SID: Learning Structural Representation Prompt via Latent Diffusion for Single Image Denoising. Proceedings of the AAAI Conference on Artificial Intelligence, 39(5), 4734–4742. https://doi.org/10.1609/aaai.v39i5.32500

Issue

Section

AAAI Technical Track on Computer Vision IV