BokehCrafter: Taming Video Diffusion Models for Controllable Bokeh Rendering

Authors

  • Qiwen Wang School of AIA, Huazhong University of Science and Technology
  • Liao Shen School of AIA, Huazhong University of Science and Technology
  • Jiaqi Li School of AIA, Huazhong University of Science and Technology
  • Tianqi Liu School of AIA, Huazhong University of Science and Technology
  • Huiqiang Sun School of AIA, Huazhong University of Science and Technology
  • Zihao Huang School of AIA, Huazhong University of Science and Technology
  • Yachuan Huang School of AIA, Huazhong University of Science and Technology
  • Xianrui Luo School of AIA, Huazhong University of Science and Technology
  • Zhiguo Cao School of AIA, Huazhong University of Science and Technology

DOI:

https://doi.org/10.1609/aaai.v40i12.37969

Abstract

Bokeh is used in photography to emphasize the selected subject by smoothly blurring the out-of-focus region with appealing highlights. While recent advances have achieved impressive results in rendering realistic blur, existing frameworks typically rely on disparity maps and bokeh-relevant inputs (e.g., focal distance and blur size), and face significant challenges in video bokeh rendering due to limited temporal consistency. In this paper, we propose BokehCrafter, the first video diffusion framework that generates temporally coherent and visually pleasing bokeh effects from all-in-focus video inputs under user-friendly input conditions. Specifically, we leverage a dual-stream attention mechanism, integrating a reference image branch and a rendering instruction branch. We propose a Bokeh Image Extraction (BIE) module and a CLIP-based text encoder to extract image and text features, respectively, whose outputs are fused via a Text-Image Fusion (TIF) module to enable fine-grained and controllable bokeh rendering. To support the novel capabilities of our model, we construct Video Bokeh Scenes (VBS), a large-scale dataset containing a wide variety of bokeh videos with corresponding rendering instructions, across various scenes and rendering settings. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art methods in both bokeh rendering quality and temporal consistency.

Downloads

Published

2026-03-14

How to Cite

Wang, Q., Shen, L., Li, J., Liu, T., Sun, H., Huang, Z., … Cao, Z. (2026). BokehCrafter: Taming Video Diffusion Models for Controllable Bokeh Rendering. Proceedings of the AAAI Conference on Artificial Intelligence, 40(12), 10029–10037. https://doi.org/10.1609/aaai.v40i12.37969

Issue

Section

AAAI Technical Track on Computer Vision IX