RePreM: Representation Pre-training with Masked Model for Reinforcement Learning
DOI:
https://doi.org/10.1609/aaai.v37i6.25842Keywords:
ML: Reinforcement Learning Algorithms, ML: Transfer, Domain Adaptation, Multi-Task Learning, ML: Unsupervised & Self-Supervised LearningAbstract
Inspired by the recent success of sequence modeling in RL and the use of masked language model for pre-training, we propose a masked model for pre-training in RL, RePreM (Representation Pre-training with Masked Model), which trains the encoder combined with transformer blocks to predict the masked states or actions in a trajectory. RePreM is simple but effective compared to existing representation pre-training methods in RL. It avoids algorithmic sophistication (such as data augmentation or estimating multiple models) with sequence modeling and generates a representation that captures long-term dynamics well. Empirically, we demonstrate the effectiveness of RePreM in various tasks, including dynamic prediction, transfer learning, and sample-efficient RL with both value-based and actor-critic methods. Moreover, we show that RePreM scales well with dataset size, dataset quality, and the scale of the encoder, which indicates its potential towards big RL models.Downloads
Published
2023-06-26
How to Cite
Cai, Y., Zhang, C., Shen, W., Zhang, X., Ruan, W., & Huang, L. (2023). RePreM: Representation Pre-training with Masked Model for Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 37(6), 6879-6887. https://doi.org/10.1609/aaai.v37i6.25842
Issue
Section
AAAI Technical Track on Machine Learning I