1.
Schumann R, Zhu W, Feng W, Fu T-J, Riezler S, Wang WY. VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View. AAAI [Internet]. 2024 Mar. 24 [cited 2026 May 13];38(17):18924-33. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/29858