Ilaslan, M. F., Köksal, A., Lin, K. Q., Satar, B., Shou, M. Z., & Xu, Q. (2025). VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video Prompting. Proceedings of the AAAI Conference on Artificial Intelligence, 39(4), 3886–3894. https://doi.org/10.1609/aaai.v39i4.32406