Reinforcement Learning Explainability via Model Transforms (Student Abstract)

Authors

  • Mira Finkelstein The Hebrew University of Jerusalem, Benin School of Computer Science and Engineering
  • Lucy Liu Harvard University, School of Engineering And Applied Sciences
  • Yoav Kolumbus The Hebrew University of Jerusalem, Benin School of Computer Science and Engineering
  • David C. Parkes Harvard University, School of Engineering And Applied Sciences
  • Jeffrey S. Rosenshein The Hebrew University of Jerusalem, Benin School of Computer Science and Engineering
  • Sarah Keren Technion - Israel Institute of Technology, Taub Faculty of Computer Science

DOI:

https://doi.org/10.1609/aaai.v36i11.21608

Keywords:

Explainable-RL, Reinforcement Learning Algorithms, Planning, Model-Based Reasoning

Abstract

Understanding the emerging behaviors of reinforcement learning agents may be difficult because such agents are often trained using highly complex and expressive models. In recent years, most approaches developed for explaining agent behaviors rely on domain knowledge or on an analysis of the agent’s learned policy. For some domains, relevant knowledge may not be available or may be insufficient for producing meaningful explanations. We suggest using formal model abstractions and transforms, previously used mainly for expediting the search for optimal policies, to automatically explain discrepancies that may arise between the behavior of an agent and the behavior that is anticipated by an observer. We formally define this problem of Reinforcement Learning Policy Explanation(RLPE), suggest a class of transforms which can be used for explaining emergent behaviors, and suggest meth-ods for searching efficiently for an explanation. We demonstrate the approach on standard benchmarks.

Downloads

Published

2022-06-28

How to Cite

Finkelstein, M., Liu, L., Kolumbus, Y., Parkes, D. C., Rosenshein, J. S., & Keren, S. (2022). Reinforcement Learning Explainability via Model Transforms (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 36(11), 12943-12944. https://doi.org/10.1609/aaai.v36i11.21608