Affordance-Guided Coarse-to-Fine Exploration for Base Placement in Open-Vocabulary Mobile Manipulation

Authors

  • Tzu-Jung Lin National Taiwan University
  • Jia-Fong Yeh National Taiwan University
  • Hung-Ting Su National Taiwan University
  • Chung-Yi Lin National Taiwan University
  • Yi-Ting Chen National Yang Ming Chiao Tung University
  • Winston H. Hsu National Taiwan University

DOI:

https://doi.org/10.1609/aaai.v40i22.38909

Abstract

In open-vocabulary mobile manipulation (OVMM), task success often hinges on the selection of an appropriate base placement for the robot. Existing approaches typically navigate to proximity-based regions without considering affordances, resulting in frequent manipulation failures. We propose Affordance-Guided Coarse-to-Fine Exploration, a zero-shot framework for base placement that integrates semantic understanding from vision-language models (VLMs) with geometric feasibility through an iterative optimization process. Our method constructs cross-modal representations, namely Affordance RGB and Obstacle Map+, to align semantics with spatial context. This enables reasoning that extends beyond the egocentric limitations of RGB perception. To ensure interaction is guided by task-relevant affordances, we leverage coarse semantic priors from VLMs to guide the search toward task-relevant regions and refine placements with geometric constraints, thereby reducing the risk of convergence to local optima. Evaluated on five diverse open-vocabulary mobile manipulation tasks, our system achieves an 85% success rate, significantly outperforming classical geometric planners and VLM-based methods. This demonstrates the promise of affordance-aware and multimodal reasoning for generalizable, instruction-conditioned planning in OVMM.

Published

2026-03-14

How to Cite

Lin, T.-J., Yeh, J.-F., Su, H.-T., Lin, C.-Y., Chen, Y.-T., & Hsu, W. H. (2026). Affordance-Guided Coarse-to-Fine Exploration for Base Placement in Open-Vocabulary Mobile Manipulation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(22), 18443–18451. https://doi.org/10.1609/aaai.v40i22.38909

Issue

Section

AAAI Technical Track on Intelligent Robotics