Rethinking Direct Preference Optimization in Diffusion Models

Junyong Kang; Seohyun Lim; Kyungjune Baek; Hyunjung Shim

doi:10.1609/aaai.v40i7.37480

Authors

Junyong Kang Korea Advanced Institute of Science & Technology
Seohyun Lim Korea Advanced Institute of Science & Technology
Kyungjune Baek Sejong University
Hyunjung Shim Korea Advanced Institute of Science & Technology

DOI:

https://doi.org/10.1609/aaai.v40i7.37480

Abstract

Aligning text-to-image (T2I) diffusion models with human preferences has emerged as a critical research challenge. While Direct Preference Optimization (DPO) has established a foundation for preference learning in large language models (LLMs), its extension to diffusion models remains limited in alignment performance. In this work, we propose an enhanced version of Diffusion-DPO by introducing a stable reference model update strategy. This strategy facilitates the exploration of better alignment solutions while maintaining training stability. Moreover, we design a timestep-aware optimization strategy that further boosts performance by addressing preference learning imbalance across timesteps. Through the synergistic combination of our exploration and timestep-aware optimization, our method significantly improves the alignment performance of Diffusion-DPO on human preference evaluation benchmarks, achieving state-of-the-art results.

Rethinking Direct Preference Optimization in Diffusion Models

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information