[1]

G. Yariv, I. Gat, S. Benaim, L. Wolf, I. Schwartz, and Y. Adi, “Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation”, AAAI, vol. 38, no. 7, pp. 6639–6647, Mar. 2024.