LaF-GRPO: In-Situ Navigation Instruction Generation for the Visually Impaired via GRPO with LLM-as-Follower Reward

Authors

  • Yi Zhao The Hong Kong Polytechnic University
  • Siqi Wang The Hong Kong Polytechnic University
  • Jing Li The Hong Kong Polytechnic University

DOI:

https://doi.org/10.1609/aaai.v40i41.40804

Abstract

Navigation instruction generation for visually impaired (VI) individuals (NIG-VI) is critical yet relatively underexplored. This study focuses on generating precise, in-situ, step-by-step navigation instructions that are practically usable for VI users. Specifically, we propose LaF-GRPO (LLM-as-Follower GRPO), where an LLM simulates VI user responses to navigation instructions, thereby providing feedback rewards to guide the post-training of a Vision-Language Model (VLM). This enhances instruction accuracy and usability while reducing costly real-world data collection needs. To address the scarcity of dedicated benchmarks in this field, we introduce NIG4VI, a 27k-sample open-source dataset to facilitate training and evaluation. It comprises diverse navigation scenarios with accurate spatial coordinates, supporting detailed and open-ended in-situ instruction generation. Experiments on NIG4VI demonstrate the effectiveness of LaF-GRPO through quantitative metrics (e.g., Zero-(LaF-GRPO) boosts BLEU 14%; SFT+(LaF-GRPO) METEOR 0.542 vs. GPT-4o 0.323), and qualitative analysis further confirms that our method yields more intuitive and safer instructions.

Downloads

Published

2026-03-14

How to Cite

Zhao, Y., Wang, S., & Li, J. (2026). LaF-GRPO: In-Situ Navigation Instruction Generation for the Visually Impaired via GRPO with LLM-as-Follower Reward. Proceedings of the AAAI Conference on Artificial Intelligence, 40(41), 34994–35002. https://doi.org/10.1609/aaai.v40i41.40804

Issue

Section

AAAI Technical Track on Natural Language Processing VI