Realism Control One-step Diffusion for Real-world Image Super Resolution
DOI:
https://doi.org/10.1609/aaai.v40i13.38067Abstract
Pre-trained diffusion models have shown great potential in real-world image super-resolution (Real-ISR) tasks by enabling high-resolution reconstructions. While one-step diffusion (OSD) methods significantly improve efficiency compared to traditional multi-step approaches, they still have limitations in balancing fidelity and realism across diverse scenarios. Since the OSDs for SR are usually trained or distilled by a single timestep, they lack flexible control mechanisms to adaptively prioritize these competing objectives, which are inherently manageable in multi-step methods through adjusting sampling steps. To address this challenge, we propose a Realism Controlled One-step Diffusion (RCOD) framework for Real-ISR. RCOD provides a latent domain grouping strategy that enables explicit control over fidelity-realism trade-offs during the noise prediction phase with minimal training paradigm modifications and original training data. A degradation-aware sampling strategy is also introduced to align distillation regularization with the grouping strategy and enhance the controlling of trade-offs. Moreover, a visual prompt injection module is used to replace conventional text prompts with degradation-aware visual tokens, enhancing both restoration accuracy and semantic consistency. Our method achieves superior fidelity and perceptual quality while maintaining computational efficiency. Extensive experiments demonstrate that RCOD outperforms state-of-the-art OSD methods in both quantitative metrics and visual qualities, with flexible realism control capabilities in the inference stage.Downloads
Published
2026-03-14
How to Cite
Wu, Z., Zheng, S., Jiang, P.-T., & Yuan, X. (2026). Realism Control One-step Diffusion for Real-world Image Super Resolution. Proceedings of the AAAI Conference on Artificial Intelligence, 40(13), 10906-10914. https://doi.org/10.1609/aaai.v40i13.38067
Issue
Section
AAAI Technical Track on Computer Vision X