Schumann, R., Zhu, W., Feng, W., Fu, T.-J., Riezler, S., & Wang, W. Y. (2024). VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View. Proceedings of the AAAI Conference on Artificial Intelligence, 38(17), 18924–18933. https://doi.org/10.1609/aaai.v38i17.29858