FreeMem: Enhancing Consistency in Long Video Generation via Tuning-Free Memory

Authors

  • Jibin Peng Tianjin University
  • Di Lin Tianjin University
  • Zhecheng Xu Tianjin University
  • Haoran Lu Tianjin University
  • Ruonan Liu Shanghai Jiao Tong University
  • Wuyuan Xie Shenzhen University
  • Miaohui Wang Shenzhen University
  • Lingyu Liang South China University of Technology
  • Yi Wang Shenzhen University
  • Qing Guo Nankai University

DOI:

https://doi.org/10.1609/aaai.v40i10.37783

Abstract

Text-to-Video (T2V) generation has advanced greatly, yet maintaining consistency remains challenging, especially for tuning-free long video generation. We attribute the consistency problem to cumulative deviations for long video generation at three levels: the random noise lacking correlation results initial deviation between frames; discrepancy in semantic feature tokens between denoising network blocks gradually accumulates as the frame count grows, leading to greater deviations; attention mechanisms struggle to capture global relationships across distant frames in long videos. To address these, we propose FreeMem, a tuning-free framework leveraging hierarchical memory update and injection: the noise memory stabilizes consistency by manipulating low and high frequency components in the initial noise space; the token memory combats inconsistency through adaptive fusion of historical and current semantic feature tokens between denoising network blocks; and the attention memory establishes persistent cache to model long-range relationships within self attention layers. Evaluated on VBench, FreeMem improves subject and background consistency matrics across various methods, offering a practical solution for low-cost, high-consistency long video generation.

Downloads

Published

2026-03-14

How to Cite

Peng, J., Lin, D., Xu, Z., Lu, H., Liu, R., Xie, W., Wang, M., Liang, L., Wang, Y., & Guo, Q. (2026). FreeMem: Enhancing Consistency in Long Video Generation via Tuning-Free Memory. Proceedings of the AAAI Conference on Artificial Intelligence, 40(10), 8340-8348. https://doi.org/10.1609/aaai.v40i10.37783

Issue

Section

AAAI Technical Track on Computer Vision VII