Schumann, Raphael, et al. “VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 17, Mar. 2024, pp. 18924-33, doi:10.1609/aaai.v38i17.29858.