TY - JOUR AU - Huang, Wenjie AU - Pham, Viet Hai AU - Haskell, William Benjamin PY - 2020/04/03 Y2 - 2024/03/28 TI - Model and Reinforcement Learning for Markov Games with Risk Preferences JF - Proceedings of the AAAI Conference on Artificial Intelligence JA - AAAI VL - 34 IS - 02 SE - AAAI Technical Track: Game Theory and Economic Paradigms DO - 10.1609/aaai.v34i02.5574 UR - https://ojs.aaai.org/index.php/AAAI/article/view/5574 SP - 2022-2029 AB - <p> We motivate and propose a new model for non-cooperative Markov game which considers the interactions of risk-aware players. This model characterizes the time-consistent dynamic “risk” from <em>both</em> stochastic state transitions (inherent to the game) and randomized mixed strategies (due to all other players). An appropriate risk-aware equilibrium concept is proposed and the existence of such equilibria is demonstrated in stationary strategies by an application of Kakutani's fixed point theorem. We further propose a simulation-based <em>Q</em>-learning type algorithm for risk-aware equilibrium computation. This algorithm works with a special form of minimax risk measures which can naturally be written as saddle-point stochastic optimization problems, and covers many widely investigated risk measures. Finally, the almost sure convergence of this simulation-based algorithm to an equilibrium is demonstrated under some mild conditions. Our numerical experiments on a two player queuing game validate the properties of our model and algorithm, and demonstrate their worth and applicability in real life competitive decision-making.</p> ER -