RePreM: Representation Pre-training with Masked Model for Reinforcement Learning

Authors

  • Yuanying Cai IIIS, Tsinghua Univeristy
  • Chuheng Zhang Microsoft Research
  • Wei Shen Hulu
  • Xuyun Zhang Macquarie University
  • Wenjie Ruan Macquarie University University of Exeter
  • Longbo Huang IIIS, Tsinghua Univeristy

DOI:

https://doi.org/10.1609/aaai.v37i6.25842

Keywords:

ML: Reinforcement Learning Algorithms, ML: Transfer, Domain Adaptation, Multi-Task Learning, ML: Unsupervised & Self-Supervised Learning

Abstract

Inspired by the recent success of sequence modeling in RL and the use of masked language model for pre-training, we propose a masked model for pre-training in RL, RePreM (Representation Pre-training with Masked Model), which trains the encoder combined with transformer blocks to predict the masked states or actions in a trajectory. RePreM is simple but effective compared to existing representation pre-training methods in RL. It avoids algorithmic sophistication (such as data augmentation or estimating multiple models) with sequence modeling and generates a representation that captures long-term dynamics well. Empirically, we demonstrate the effectiveness of RePreM in various tasks, including dynamic prediction, transfer learning, and sample-efficient RL with both value-based and actor-critic methods. Moreover, we show that RePreM scales well with dataset size, dataset quality, and the scale of the encoder, which indicates its potential towards big RL models.

Downloads

Published

2023-06-26

How to Cite

Cai, Y., Zhang, C., Shen, W., Zhang, X., Ruan, W., & Huang, L. (2023). RePreM: Representation Pre-training with Masked Model for Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 37(6), 6879-6887. https://doi.org/10.1609/aaai.v37i6.25842

Issue

Section

AAAI Technical Track on Machine Learning I