Reinforcement Learning under Threats

Victor Gallego; Roi Naveiro; David Rios Insua

doi:10.1609/aaai.v33i01.33019939

Authors

Victor Gallego Instituto de Ciencias Matemáticas
Roi Naveiro Instituto de Ciencias Matemáticas
David Rios Insua Instituto de Ciencias Matemáticas

DOI:

https://doi.org/10.1609/aaai.v33i01.33019939

Abstract

In several reinforcement learning (RL) scenarios, mainly in security settings, there may be adversaries trying to interfere with the reward generating process. However, when non-stationary environments as such are considered, Q-learning leads to suboptimal results (Busoniu, Babuska, and De Schutter 2010). Previous game-theoretical approaches to this problem have focused on modeling the whole multi-agent system as a game. Instead, we shall face the problem of prescribing decisions to a single agent (the supported decision maker, DM) against a potential threat model (the adversary). We augment the MDP to account for this threat, introducing Threatened Markov Decision Processes (TMDPs). Furthermore, we propose a level-k thinking scheme resulting in a new learning framework to deal with TMDPs. We empirically test our framework, showing the benefits of opponent modeling.

Reinforcement Learning under Threats

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information