Yariv, Guy, Itai Gat, Sagie Benaim, Lior Wolf, Idan Schwartz, and Yossi Adi. “Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation”. Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 7 (March 24, 2024): 6639–6647. Accessed July 24, 2026. https://ojs.aaai.org/index.php/AAAI/article/view/28486.