Towards Generalist Robot Learning from Internet Video: A Survey (Abstract Reprint)

Robert McCarthy; Daniel C.H. Tan; Dominik Schmidt; Fernando Acero; Nathan Herr; Yilun Du; Thomas G. Thuruthel; Zhibin Li

doi:10.1609/aaai.v40i47.41397

Authors

Robert McCarthy University College London, United Kingdom
Daniel C.H. Tan University College London, United Kingdom
Dominik Schmidt Weco AI, United Kingdom
Fernando Acero University College London, United Kingdom
Nathan Herr University College London, United Kingdom
Yilun Du Massachusetts Institute of Technology, United States of America
Thomas G. Thuruthel University College London, United Kingdom
Zhibin Li University College London, United Kingdom

DOI:

https://doi.org/10.1609/aaai.v40i47.41397

Abstract

Scaling deep learning to massive and diverse internet data has driven remarkable breakthroughs in domains such as video generation and natural language processing. Robot learning, however, has thus far failed to replicate this success and remains constrained by a scarcity of available data. Learning from Videos (LfV) methods aim to address this data bottleneck by augmenting traditional robot data with large-scale internet video. This video data provides foundational information regarding physical dynamics, behaviours, and tasks, and can be highly informative for general-purpose robots. This survey systematically examines the emerging field of LfV. We first outline essential concepts, including detailing fundamental LfV challenges such as distribution shift and missing action labels in video data. Next, we comprehensively review current methods for extracting knowledge from large-scale internet video, overcoming LfV challenges, and improving robot learning through video-informed training. The survey concludes with a critical discussion of future opportunities. Here, we emphasize the need for scalable foundation model approaches that can leverage the full range of available internet video and enhance the learning of robot policies and dynamics models. Overall, the survey aims to inform and catalyse future LfV research, driving progress towards general-purpose robots.

Towards Generalist Robot Learning from Internet Video: A Survey (Abstract Reprint)

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information