Yariv, G., Gat, I., Benaim, S., Wolf, L., Schwartz, I., & Adi, Y. (2024). Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation. Proceedings of the AAAI Conference on Artificial Intelligence, 38(7), 6639–6647. https://doi.org/10.1609/aaai.v38i7.28486