Rewind and Render: Towards Factually Accurate Text-to-Video Generation with Distilled Knowledge Retrieval

Daniel Lee; Arjun Chandra; Yang Zhou; Yunyao Li; Simone Conia

doi:10.1609/aaai.v39i28.35356

Rewind and Render: Towards Factually Accurate Text-to-Video Generation with Distilled Knowledge Retrieval

Authors

Daniel Lee Adobe
Arjun Chandra Boston University
Yang Zhou Adobe Research
Yunyao Li Adobe
Simone Conia Sapienza University of Rome

DOI:

https://doi.org/10.1609/aaai.v39i28.35356

Abstract

Text-to-Video (T2V) models, despite recent advancements, struggle with factual accuracy, especially for knowledge-dense content. We introduce FACT-V (Factual Accuracy in Content Translation to Video), a system integrating multi-source knowledge retrieval into T2V pipelines. FACT-V offers two key benefits: i) improved factual accuracy of generated videos through dynamically retrieved information, and ii) increased interpretability by providing users with the augmented prompt information. A preliminary evaluation demonstrates the potential of knowledge-augmented approaches in improving the accuracy and reliability of T2V systems, particularly for entity-specific or time-sensitive prompts.

AAAI-25 / IAAI-25 / EAAI-25 Proceedings Cover

Downloads

Published

2025-04-11

How to Cite

Lee, D., Chandra, A., Zhou, Y., Li, Y., & Conia, S. (2025). Rewind and Render: Towards Factually Accurate Text-to-Video Generation with Distilled Knowledge Retrieval. Proceedings of the AAAI Conference on Artificial Intelligence, 39(28), 29652–29654. https://doi.org/10.1609/aaai.v39i28.35356

Download Citation

Issue

Vol. 39 No. 28: IAAI-25, EAAI-25, AAAI-25 Student Abstracts, Undergraduate Consortium and Demonstrations

Section

AAAI Demonstration Track

Rewind and Render: Towards Factually Accurate Text-to-Video Generation with Distilled Knowledge Retrieval

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information