Interactive Visual Task Learning for Robots
DOI:
https://doi.org/10.1609/aaai.v38i21.30567Keywords:
Robots, Human-AI interaction (including Human-robot interaction), Natural language processing and speech recognitionAbstract
We present a demonstrable framework for robots to learn novel visual concepts and visual tasks via in-situ linguistic interactions with human users. Previous approaches in computer vision have either used large pre-trained visual models to infer novel objects zero-shot, or added novel concepts along with their attributes and representations to a concept hierarchy. We extend the approaches that focus on learning visual concept hierarchies and take this ability one step further to demonstrate novel task solving on robots along with the learned visual concepts. To enable a visual concept learner to solve robotics tasks one-shot, we developed two distinct techniques. Firstly, we propose a novel approach, Hi-Viscont(HIerarchical VISual CONcept learner for Task), which augments information of a novel concept, that is being taught, to its parent nodes within a concept hierarchy. This information propagation allows all concepts in a hierarchy to update as novel concepts are taught in a continual learning setting. Secondly, we represent a visual task as a scene graph with language annotations, allowing us to create novel permutations of a demonstrated task zero-shot in-situ. Combining the two techniques, we present a demonstration on a real robot that learns visual task and concepts in one-shot from in-situ interactions with human users, and generalize to perform a novel visual task of the same type in zero-shot. As shown by the studies in the main conference paper, our system achieves a success rate of 50% on solving the whole task correctly with generalization where the baseline performs at 14% without any ability to generalize to novel tasks and concepts. We will demonstrate our working interactive learning pipeline at AAAI 2024 in person with our robot and other required hardware.Downloads
Published
2024-03-24
How to Cite
Gu, W., Sah, A., & Gopalan, N. (2024). Interactive Visual Task Learning for Robots. Proceedings of the AAAI Conference on Artificial Intelligence, 38(21), 23793-23795. https://doi.org/10.1609/aaai.v38i21.30567
Issue
Section
AAAI Demonstration Track