Zhou, H., Guo, X., Zhu, Y., & Kong, A. W.-K. (2026). MACS: Multi-source Audio-to-image Generation with Contextual Significance and Semantic Alignment. Proceedings of the AAAI Conference on Artificial Intelligence, 40(16), 13620–13628. https://doi.org/10.1609/aaai.v40i16.38368