SEAPoT-RL: Selective Exploration Algorithm for Policy Transfer in RL

Authors

  • Akshay Narayan National University of Singapore
  • Zhuoru Li National University of Singapore
  • Tze-Yun Leong National University of Singapore

DOI:

https://doi.org/10.1609/aaai.v31i1.11104

Keywords:

transfer learning, policy transfer

Abstract

We propose a new method for transferring a policy from a source task to a target task in model-based reinforcement learning. Our work is motivated by scenarios where a robotic agent operates in similar but challenging environments, such as hospital wards, differentiated by structural arrangements or obstacles, such as furniture. We address problems that require fast responses adapted from incomplete, prior knowledge of the agent in new scenarios. We present an efficient selective exploration strategy that maximally reuses the source task policy. Reuse efficiency is effected through identifying sub-spaces that are different in the target environment, thus limiting the exploration needed in the target task. We empirically show that SEAPoT performs better in terms of jump starts and cumulative average rewards, as compared to existing state-of-the-art policy reuse methods.

Downloads

Published

2017-02-12

How to Cite

Narayan, A., Li, Z., & Leong, T.-Y. (2017). SEAPoT-RL: Selective Exploration Algorithm for Policy Transfer in RL. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1). https://doi.org/10.1609/aaai.v31i1.11104