Interactive Visual Task Learning for Robots


  • Weiwei Gu Arizona State University
  • Anant Sah Arizona State University
  • Nakul Gopalan Arizona State University



Robots, Human-AI interaction (including Human-robot interaction), Natural language processing and speech recognition


We present a demonstrable framework for robots to learn novel visual concepts and visual tasks via in-situ linguistic interactions with human users. Previous approaches in computer vision have either used large pre-trained visual models to infer novel objects zero-shot, or added novel concepts along with their attributes and representations to a concept hierarchy. We extend the approaches that focus on learning visual concept hierarchies and take this ability one step further to demonstrate novel task solving on robots along with the learned visual concepts. To enable a visual concept learner to solve robotics tasks one-shot, we developed two distinct techniques. Firstly, we propose a novel approach, Hi-Viscont(HIerarchical VISual CONcept learner for Task), which augments information of a novel concept, that is being taught, to its parent nodes within a concept hierarchy. This information propagation allows all concepts in a hierarchy to update as novel concepts are taught in a continual learning setting. Secondly, we represent a visual task as a scene graph with language annotations, allowing us to create novel permutations of a demonstrated task zero-shot in-situ. Combining the two techniques, we present a demonstration on a real robot that learns visual task and concepts in one-shot from in-situ interactions with human users, and generalize to perform a novel visual task of the same type in zero-shot. As shown by the studies in the main conference paper, our system achieves a success rate of 50% on solving the whole task correctly with generalization where the baseline performs at 14% without any ability to generalize to novel tasks and concepts. We will demonstrate our working interactive learning pipeline at AAAI 2024 in person with our robot and other required hardware.




How to Cite

Gu, W., Sah, A., & Gopalan, N. (2024). Interactive Visual Task Learning for Robots. Proceedings of the AAAI Conference on Artificial Intelligence, 38(21), 23793-23795.