ZHAO, T.; DU, J.; XUE, Z.; LIANG, M.; LI, A.; MENG, X.; LIU, D. ST-VLM: A Spatial-to-Image Multimodal Spatial-Temporal Prediction Framework with Vision-Language Model. Proceedings of the AAAI Conference on Artificial Intelligence, [S. l.], v. 40, n. 19, p. 16441-16449, 2026. DOI: 10.1609/aaai.v40i19.38683. Disponível em: https://ojs.aaai.org/index.php/AAAI/article/view/38683. Acesso em: 3 may. 2026.