Language and Gesture in Virtual Reality: Is a Gesture Worth 1000 Words?

Authors

  • Padraig Higgins University of Maryland, Baltimore County Army Research Lab
  • Cory J. Hayes Army Research Lab
  • Stephanie Lukin Army Research Lab
  • Cynthia Matuszek University of Maryland, Baltimore County

DOI:

https://doi.org/10.1609/aaaiss.v7i1.36947

Abstract

Robots are increasingly incorporating multimodal information and human signals to resolve ambiguity in embodied human-robot interaction. Harnessing signals such as gestures may expedite robot exploration in large, outdoor urban environments for supporting disaster recovery operations, where speech may be unclear due to noise or the challenges of a dynamic and dangerous environment. Despite this potential, capturing human gesture and properly grounding it to crowded, outdoor environments remains a challenge. In this work, we propose a method to model human gesture and ground it to spoken language instructions given to a robot for execution in large spaces. We implement our method in virtual reality to develop a workflow for faster future data collection. We present a series of proposed experiments that compare a language-only baseline to our proposed language supplemented by gesture approach, and discuss how our approach has the potential to reinforce the human’s intent and detect discrepancies in gesture and spoken instructions in these large and crowded environments.

Downloads

Published

2025-11-23

How to Cite

Higgins, P., Hayes, C. J., Lukin, S., & Matuszek, C. (2025). Language and Gesture in Virtual Reality: Is a Gesture Worth 1000 Words?. Proceedings of the AAAI Symposium Series, 7(1), 658-662. https://doi.org/10.1609/aaaiss.v7i1.36947

Issue

Section

Unifying Representations for Robot Application Development