Zero-Reference Joint Low-Light Enhancement and Deblurring via Visual Autoregressive Modeling with VLM-Derived Modulation

Authors

  • Wei Dong McMaster University
  • Han Zhou McMaster University
  • Junwei Lin McMaster University
  • Jun Chen McMaster University

DOI:

https://doi.org/10.1609/aaai.v40i5.37363

Abstract

Real-world dark images commonly exhibit not only low visibility and contrast but also complex noise and blur, posing significant restoration challenges. Existing methods often rely on paired data or fail to model dynamic illumination and blur characteristics, leading to poor generalization. To tackle this, we propose a generative framework based on visual autoregressive (VAR) modeling, guided by perceptual priors from the vision-language model (VLM). Specifically, to supply informative conditioning cues for VAR models, we deploy an adaptive curve estimation scheme to modulate the diverse illumination based on VLM-derived visibility scores. In addition, we integrate dynamic and spatial-frequency-aware Rotary Positional Encodings (SF-RoPE) into VAR to enhance its ability to model structures degraded by blur. Furthermore, we propose a recursive phase-domain modulation strategy that mitigates blur-induced artifacts in the phase domain via bounded iterative refinement guided by VLM-assessed blur scores. Our framework is fully unsupervised and achieves state-of-the-art performance on benchmark datasets.

Downloads

Published

2026-03-14

How to Cite

Dong, W., Zhou, H., Lin, J., & Chen, J. (2026). Zero-Reference Joint Low-Light Enhancement and Deblurring via Visual Autoregressive Modeling with VLM-Derived Modulation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(5), 3641–3649. https://doi.org/10.1609/aaai.v40i5.37363

Issue

Section

AAAI Technical Track on Computer Vision II