IdentityStory: Taming Your Identity-Preserving Generator for Human-Centric Story Generation

Authors

  • Donghao Zhou The Chinese University of Hong Kong
  • Jingyu Lin Monash University
  • Guibao Shen The Hong Kong University of Science and Technology (Guangzhou)
  • Quande Liu Kling Team, Kuaishou Technology
  • Jialin Gao The Chinese University of Hong Kong
  • Lihao Liu Amazon
  • Lan Du Monash University
  • Cunjian Chen Monash University
  • Chi-Wing Fu The Chinese University of Hong Kong
  • Xiaowei Hu South China University of Technology
  • Pheng-Ann Heng The Chinese University of Hong Kong

DOI:

https://doi.org/10.1609/aaai.v40i16.38365

Abstract

Recent visual generative models enable story generation with consistent characters from text, but human-centric story generation faces additional challenges, such as maintaining detailed and diverse human face consistency and coordinating multiple characters across different images. This paper presents IdentityStory, a framework for human-centric story generation that ensures consistent character identity across multiple sequential images. By taming identity-preserving generators, the framework features two key components: Iterative Identity Discovery, which extracts cohesive character identities, and Re-denoising Identity Injection, which re-denoises images to inject identities while preserving desired context. Experiments on the ConsiStory-Human benchmark demonstrate that IdentityStory outperforms existing methods, particularly in face consistency, and supports multi-character combinations. The framework also shows strong potential for applications such as infinite-length story generation and dynamic character composition.

Downloads

Published

2026-03-14

How to Cite

Zhou, D., Lin, J., Shen, G., Liu, Q., Gao, J., Liu, L., Du, L., Chen, C., Fu, C.-W., Hu, X., & Heng, P.-A. (2026). IdentityStory: Taming Your Identity-Preserving Generator for Human-Centric Story Generation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(16), 13593-13601. https://doi.org/10.1609/aaai.v40i16.38365

Issue

Section

AAAI Technical Track on Computer Vision XIII