Towards Generalist Robot Learning from Internet Video: A Survey (Abstract Reprint)

Authors

  • Robert McCarthy University College London, United Kingdom
  • Daniel C.H. Tan University College London, United Kingdom
  • Dominik Schmidt Weco AI, United Kingdom
  • Fernando Acero University College London, United Kingdom
  • Nathan Herr University College London, United Kingdom
  • Yilun Du Massachusetts Institute of Technology, United States of America
  • Thomas G. Thuruthel University College London, United Kingdom
  • Zhibin Li University College London, United Kingdom

DOI:

https://doi.org/10.1609/aaai.v40i47.41397

Abstract

Scaling deep learning to massive and diverse internet data has driven remarkable breakthroughs in domains such as video generation and natural language processing. Robot learning, however, has thus far failed to replicate this success and remains constrained by a scarcity of available data. Learning from Videos (LfV) methods aim to address this data bottleneck by augmenting traditional robot data with large-scale internet video. This video data provides foundational information regarding physical dynamics, behaviours, and tasks, and can be highly informative for general-purpose robots. This survey systematically examines the emerging field of LfV. We first outline essential concepts, including detailing fundamental LfV challenges such as distribution shift and missing action labels in video data. Next, we comprehensively review current methods for extracting knowledge from large-scale internet video, overcoming LfV challenges, and improving robot learning through video-informed training. The survey concludes with a critical discussion of future opportunities. Here, we emphasize the need for scalable foundation model approaches that can leverage the full range of available internet video and enhance the learning of robot policies and dynamics models. Overall, the survey aims to inform and catalyse future LfV research, driving progress towards general-purpose robots.

Published

2026-03-14

How to Cite

McCarthy, R., Tan, D. C., Schmidt, D., Acero, F., Herr, N., Du, Y., … Li, Z. (2026). Towards Generalist Robot Learning from Internet Video: A Survey (Abstract Reprint). Proceedings of the AAAI Conference on Artificial Intelligence, 40(47), 39882–39882. https://doi.org/10.1609/aaai.v40i47.41397