Offline Quantum Reinforcement Learning in a Conservative Manner

Zhihao Cheng; Kaining Zhang; Li Shen; Dacheng Tao

doi:10.1609/aaai.v37i6.25872

Authors

Zhihao Cheng The University of Sydney
Kaining Zhang The University of Sydney
Li Shen JD Explore Academy
Dacheng Tao JD Explore Academy The University of Sydney

DOI:

https://doi.org/10.1609/aaai.v37i6.25872

Keywords:

ML: Quantum Machine Learning, ML: Reinforcement Learning Algorithms

Abstract

Recently, to reap the quantum advantage, empowering reinforcement learning (RL) with quantum computing has attracted much attention, which is dubbed as quantum RL (QRL). However, current QRL algorithms employ an online learning scheme, i.e., the policy that is run on a quantum computer needs to interact with the environment to collect experiences, which could be expensive and dangerous for practical applications. In this paper, we aim to solve this problem in an offline learning manner. To be more specific, we develop the first offline quantum RL (offline QRL) algorithm named CQ2L (Conservative Quantum Q-learning), which learns from offline samples and does not require any interaction with the environment. CQ2L utilizes variational quantum circuits (VQCs), which are improved with data re-uploading and scaling parameters, to represent Q-value functions of agents. To suppress the overestimation of Q-values resulting from offline data, we first employ a double Q-learning framework to reduce the overestimation bias; then a penalty term that encourages generating conservative Q-values is designed. We conduct abundant experiments to demonstrate that the proposed method CQ2L can successfully solve offline QRL tasks that the online counterpart could not.

Offline Quantum Reinforcement Learning in a Conservative Manner

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription