[1]

H. Zhou, X. Guo, Y. Zhu, and A. W.-K. Kong, “MACS: Multi-source Audio-to-image Generation with Contextual Significance and Semantic Alignment”, AAAI, vol. 40, no. 16, pp. 13620–13628, Mar. 2026.