FocusDPO: Dynamic Preference Optimization for Multi-Subject Personalized Image Generation via Adaptive Focus

Authors

  • Qiaoqiao Jin ByteDance Inc.
  • Siming Fu ByteDance Inc.
  • Dong She ByteDance Inc.
  • Weinan Jia University of Science and Technology of China
  • Hualiang Wang Hong Kong University of Science and Technology
  • Mu Liu ByteDance Inc.
  • Jidong Jiang ByteDance Inc.

DOI:

https://doi.org/10.1609/aaai.v40i7.37469

Abstract

Multi-subject personalized image generation aims to synthesize customized images containing multiple specified subjects without requiring test-time optimization. However, achieving fine-grained independent control over multiple subjects remains challenging due to difficulties in preserving subject fidelity and preventing cross-subject attribute leakage. We present FocusDPO, a framework that adaptively identifies focus regions based on dynamic semantic correspondence and supervision image complexity. During training, our method progressively adjusts these focal areas across noise timesteps, implementing a weighted strategy that rewards information-rich patches while penalizing regions with low prediction confidence. The framework dynamically adjusts focus allocation during the DPO process according to the semantic complexity of reference images and establishes robust correspondence mappings between generated and reference subjects. Extensive experiments demonstrate that our method substantially enhances the performance of existing pre-trained personalized generation models, achieving state-of-the-art results on both single-subject and multi-subject personalized image synthesis benchmarks. Our method effectively mitigates attribute leakage while preserving superior subject fidelity across diverse generation scenarios, advancing the frontier of controllable multi-subject image synthesis.

Downloads

Published

2026-03-14

How to Cite

Jin, Q., Fu, S., She, D., Jia, W., Wang, H., Liu, M., & Jiang, J. (2026). FocusDPO: Dynamic Preference Optimization for Multi-Subject Personalized Image Generation via Adaptive Focus. Proceedings of the AAAI Conference on Artificial Intelligence, 40(7), 5512–5520. https://doi.org/10.1609/aaai.v40i7.37469

Issue

Section

AAAI Technical Track on Computer Vision IV