Guiding Exploratory Behaviors for Multi-Modal Grounding of Linguistic Descriptions

Jesse Thomason; Jivko Sinapov; Raymond Mooney; Peter Stone

doi:10.1609/aaai.v32i1.11966

Authors

Jesse Thomason University of Texas at Austin
Jivko Sinapov Tufts University
Raymond Mooney University of Texas at Austin
Peter Stone University of Texas at Austin

DOI:

https://doi.org/10.1609/aaai.v32i1.11966

Keywords:

Multi-modal grounding, NLP, Human-robot interaction

Abstract

A major goal of grounded language learning research is to enable robots to connect language predicates to a robot's physical interactive perception of the world. Coupling object exploratory behaviors such as grasping, lifting, and looking with multiple sensory modalities (e.g., audio, haptics, and vision) enables a robot to ground non-visual words like ``heavy'' as well as visual words like ``red''. A major limitation of existing approaches to multi-modal language grounding is that a robot has to exhaustively explore training objects with a variety of actions when learning a new such language predicate. This paper proposes a method for guiding a robot's behavioral exploration policy when learning a novel predicate based on known grounded predicates and the novel predicate's linguistic relationship to them. We demonstrate our approach on two datasets in which a robot explored large sets of objects and was tasked with learning to recognize whether novel words applied to those objects.

Guiding Exploratory Behaviors for Multi-Modal Grounding of Linguistic Descriptions

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information