Fast Counterfactual Inference for History-Based Reinforcement Learning

Authors

  • Haichuan Gao Tsinghua University
  • Tianren Zhang Tsinghua University
  • Zhile Yang University of Leeds
  • Yuqing Guo Tsinghua University
  • Jinsheng Ren Tsinghua University
  • Shangqi Guo Tsinghua University
  • Feng Chen Tsinghua University

DOI:

https://doi.org/10.1609/aaai.v37i6.25924

Keywords:

ML: Reinforcement Learning Algorithms

Abstract

Incorporating sequence-to-sequence models into history-based Reinforcement Learning (RL) provides a general way to extend RL to partially-observable tasks. This method compresses history spaces according to the correlations between historical observations and the rewards. However, they do not adjust for the confounding correlations caused by data sampling and assign high beliefs to uninformative historical observations, leading to limited compression of history spaces. Counterfactual Inference (CI), which estimates causal effects by single-variable intervention, is a promising way to adjust for confounding. However, it is computationally infeasible to directly apply the single-variable intervention to a huge number of historical observations. This paper proposes to perform CI on observation sub-spaces instead of single observations and develop a coarse-to-fine CI algorithm, called Tree-based History Counterfactual Inference (T-HCI), to reduce the number of interventions exponentially. We show that T-HCI is computationally feasible in practice and brings significant sample efficiency gains in various challenging partially-observable tasks, including Maze, BabyAI, and robot manipulation tasks.

Downloads

Published

2023-06-26

How to Cite

Gao, H., Zhang, T., Yang, Z., Guo, Y., Ren, J., Guo, S., & Chen, F. (2023). Fast Counterfactual Inference for History-Based Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 37(6), 7613-7623. https://doi.org/10.1609/aaai.v37i6.25924

Issue

Section

AAAI Technical Track on Machine Learning I