Is Policy Learning Overrated?: Width-Based Planning and Active Learning for Atari

Authors

  • Benjamin Ayton Massachusetts Institute of Technology
  • Masataro Asai MIT-IBM Watson AI Lab

DOI:

https://doi.org/10.1609/icaps.v32i1.19841

Keywords:

Width-based Planning, Active Learning, Representation Learning, Variational Autoencoder, Atari Learning Environment

Abstract

Width-based planning has shown promising results on Atari 2600 games using pixel input, while using substantially fewer environment interactions than reinforcement learning. Recent width-based approaches have computed feature vectors for each screen using a hand designed feature set (Rollout-IW) or a variational autoencoder trained on game screens (VAE-IW), and prune screens that do not have novel features during the search. We propose Olive (Online-VAE-IW), which updates the VAE features online using active learning to maximize the utility of screens observed during planning. Experimental results across 55 Atari games demonstrate that it outperforms Rollout-IW by 42-to-11 and VAE-IW by 32-to-20. Moreover, Olive outperforms existing work based on policy-learning (π-IW, DQN) trained with 100 times the training budget by 30-to-22 and 31-to-17, and a state of the art data-efficient reinforcement learning (EfficientZero) trained with the same training budget and ran with 1.8 times the planning budget by 18-to-7 in the Atari 100k benchmark, without any policy learning. The source code and the appendix are available at github.com/ibm/atari-active-learning and arxiv.org/abs/2109.15310 .

Downloads

Published

2022-06-13

How to Cite

Ayton, B., & Asai, M. (2022). Is Policy Learning Overrated?: Width-Based Planning and Active Learning for Atari. Proceedings of the International Conference on Automated Planning and Scheduling, 32(1), 547-555. https://doi.org/10.1609/icaps.v32i1.19841