Bounded Risk-Sensitive Markov Games: Forward Policy Design and Inverse Reward Learning with Iterative Reasoning and Cumulative Prospect Theory

Authors

  • Ran Tian University of California, Berkeley
  • Liting Sun University of California, Berkeley
  • Masayoshi Tomizuka University of California, Berkeley

DOI:

https://doi.org/10.1609/aaai.v35i7.16750

Keywords:

Learning Human Values and Preferences

Abstract

Classical game-theoretic approaches for multi-agent systems in both the forward policy design problem and the inverse reward learning problem often make strong rationality assumptions: agents perfectly maximize expected utilities under uncertainties. Such assumptions, however, substantially mismatch with observed human behaviors such as satisficing with sub-optimal, risk-seeking, and loss-aversion decisions. Drawing on iterative reasoning models and cumulative prospect theory, we propose a new game-theoretic framework, bounded risk-sensitive Markov Game (BRSMG), that captures two aspects of realistic human behaviors: bounded intelligence and risk-sensitivity. General solutions to both the forward policy design problem and the inverse reward learning problem are provided with theoretical analysis and simulation verification. We validate the proposed forward policy design algorithm and the inverse reward learning algorithm in a two-player navigation scenario. The results show that agents demonstrate bounded-intelligence, risk-averse and risk-seeking behaviors in our framework. Moreover, in the inverse reward learning task, the proposed bounded risk-sensitive inverse learning algorithm outperforms a baseline risk-neutral inverse learning algorithm by effectively learning not only more accurate reward values but also the intelligence levels and the risk-measure parameters of agents from demonstrations.

Downloads

Published

2021-05-18

How to Cite

Tian, R., Sun, L., & Tomizuka, M. (2021). Bounded Risk-Sensitive Markov Games: Forward Policy Design and Inverse Reward Learning with Iterative Reasoning and Cumulative Prospect Theory. Proceedings of the AAAI Conference on Artificial Intelligence, 35(7), 6011-6020. https://doi.org/10.1609/aaai.v35i7.16750

Issue

Section

AAAI Technical Track on Humans and AI