PiCor: Multi-Task Deep Reinforcement Learning with Policy Correction

Authors

  • Fengshuo Bai School of Artificial Intelligence, University of Chinese Academy of Sciences Institute of Automation, Chinese Academy of Sciences (CASIA)
  • Hongming Zhang University of Alberta
  • Tianyang Tao Universit√© Paris-Saclay
  • Zhiheng Wu School of Artificial Intelligence, University of Chinese Academy of Sciences Institute of Automation, Chinese Academy of Sciences (CASIA)
  • Yanna Wang Institute of Automation, Chinese Academy of Sciences (CASIA)
  • Bo Xu Institute of Automation Chinese Academy of Sciences (CASIA) Nanjing Artificial Intelligence Research of IA

DOI:

https://doi.org/10.1609/aaai.v37i6.25825

Keywords:

ML: Reinforcement Learning Algorithms, ML: Transfer, Domain Adaptation, Multi-Task Learning

Abstract

Multi-task deep reinforcement learning (DRL) ambitiously aims to train a general agent that masters multiple tasks simultaneously. However, varying learning speeds of different tasks compounding with negative gradients interference makes policy learning inefficient. In this work, we propose PiCor, an efficient multi-task DRL framework that splits learning into policy optimization and policy correction phases. The policy optimization phase improves the policy by any DRL algothrim on the sampled single task without considering other tasks. The policy correction phase first constructs an adaptive adjusted performance constraint set. Then the intermediate policy learned by the first phase is constrained to the set, which controls the negative interference and balances the learning speeds across tasks. Empirically, we demonstrate that PiCor outperforms previous methods and significantly improves sample efficiency on simulated robotic manipulation and continuous control tasks. We additionally show that adaptive weight adjusting can further improve data efficiency and performance.

Downloads

Published

2023-06-26

How to Cite

Bai, F., Zhang, H., Tao, T., Wu, Z., Wang, Y., & Xu, B. (2023). PiCor: Multi-Task Deep Reinforcement Learning with Policy Correction. Proceedings of the AAAI Conference on Artificial Intelligence, 37(6), 6728-6736. https://doi.org/10.1609/aaai.v37i6.25825

Issue

Section

AAAI Technical Track on Machine Learning I