BokehCrafter: Taming Video Diffusion Models for Controllable Bokeh Rendering

Qiwen Wang; Liao Shen; Jiaqi Li; Tianqi Liu; Huiqiang Sun; Zihao Huang; Yachuan Huang; Xianrui Luo; Zhiguo Cao

doi:10.1609/aaai.v40i12.37969

Authors

Qiwen Wang School of AIA, Huazhong University of Science and Technology
Liao Shen School of AIA, Huazhong University of Science and Technology
Jiaqi Li School of AIA, Huazhong University of Science and Technology
Tianqi Liu School of AIA, Huazhong University of Science and Technology
Huiqiang Sun School of AIA, Huazhong University of Science and Technology
Zihao Huang School of AIA, Huazhong University of Science and Technology
Yachuan Huang School of AIA, Huazhong University of Science and Technology
Xianrui Luo School of AIA, Huazhong University of Science and Technology
Zhiguo Cao School of AIA, Huazhong University of Science and Technology

DOI:

https://doi.org/10.1609/aaai.v40i12.37969

Abstract

Bokeh is used in photography to emphasize the selected subject by smoothly blurring the out-of-focus region with appealing highlights. While recent advances have achieved impressive results in rendering realistic blur, existing frameworks typically rely on disparity maps and bokeh-relevant inputs (e.g., focal distance and blur size), and face significant challenges in video bokeh rendering due to limited temporal consistency. In this paper, we propose BokehCrafter, the first video diffusion framework that generates temporally coherent and visually pleasing bokeh effects from all-in-focus video inputs under user-friendly input conditions. Specifically, we leverage a dual-stream attention mechanism, integrating a reference image branch and a rendering instruction branch. We propose a Bokeh Image Extraction (BIE) module and a CLIP-based text encoder to extract image and text features, respectively, whose outputs are fused via a Text-Image Fusion (TIF) module to enable fine-grained and controllable bokeh rendering. To support the novel capabilities of our model, we construct Video Bokeh Scenes (VBS), a large-scale dataset containing a wide variety of bokeh videos with corresponding rendering instructions, across various scenes and rendering settings. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art methods in both bokeh rendering quality and temporal consistency.

BokehCrafter: Taming Video Diffusion Models for Controllable Bokeh Rendering

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information