Offline Quantum Reinforcement Learning in a Conservative Manner


  • Zhihao Cheng The University of Sydney
  • Kaining Zhang The University of Sydney
  • Li Shen JD Explore Academy
  • Dacheng Tao JD Explore Academy The University of Sydney



ML: Quantum Machine Learning, ML: Reinforcement Learning Algorithms


Recently, to reap the quantum advantage, empowering reinforcement learning (RL) with quantum computing has attracted much attention, which is dubbed as quantum RL (QRL). However, current QRL algorithms employ an online learning scheme, i.e., the policy that is run on a quantum computer needs to interact with the environment to collect experiences, which could be expensive and dangerous for practical applications. In this paper, we aim to solve this problem in an offline learning manner. To be more specific, we develop the first offline quantum RL (offline QRL) algorithm named CQ2L (Conservative Quantum Q-learning), which learns from offline samples and does not require any interaction with the environment. CQ2L utilizes variational quantum circuits (VQCs), which are improved with data re-uploading and scaling parameters, to represent Q-value functions of agents. To suppress the overestimation of Q-values resulting from offline data, we first employ a double Q-learning framework to reduce the overestimation bias; then a penalty term that encourages generating conservative Q-values is designed. We conduct abundant experiments to demonstrate that the proposed method CQ2L can successfully solve offline QRL tasks that the online counterpart could not.




How to Cite

Cheng, Z., Zhang, K., Shen, L., & Tao, D. (2023). Offline Quantum Reinforcement Learning in a Conservative Manner. Proceedings of the AAAI Conference on Artificial Intelligence, 37(6), 7148-7156.



AAAI Technical Track on Machine Learning I