Robust Opponent Modeling via Adversarial Ensemble Reinforcement Learning

Authors

  • Macheng Shen Massachusetts Institute of Technology
  • Jonathan P. How Massachusetts Institute of Technology

DOI:

https://doi.org/10.1609/icaps.v31i1.16006

Keywords:

Multi-agent Planning And Learning, Representations For Learned Models In Planning, Learning To Improve The Effectiveness Of Planning & Scheduling Systems, Learning Domain And Action Models For Planning

Abstract

This paper studies decision-making in two-player scenarios where the type (e.g. adversary, neutral, or teammate) of the other agent (opponent) is uncertain to the decision-making agent (protagonist), which is an abstraction of security-domain applications. In these settings, the reward for the protagonist agent depends on the type of the opponent, but this is private information known only to the opponent itself, and thus hidden from the protagonist. In contrast, as is often the case, the type of the protagonist agent is assumed to be known to the opponent, and this information-asymmetry significantly complicates the protagonist's decision-making. In particular, to determine the best actions to take, the protagonist agent must infer the opponent type from the observations and agent modeling. To address this problem, this paper presents an opponent-type deduction module based on Bayes' rule. This inference module takes as input the imagined opponent's decision-making rule (opponent model) as well as the observed opponent's history of actions and states, and outputs a belief over the opponent's hidden type. A multiagent reinforcement learning approach is used to develop this game-theoretic opponent model through self-play, which avoids the expensive data collection step that requires interaction with a real opponent. Besides, this multiagent approach also captures the strategy interaction and reasoning between agents. In addition, we apply ensemble training to avoid over-fitting to a single opponent model during the training. As a result, the learned protagonist policy is also effective against unseen opponents. Experimental results show that the proposed game-theoretic modeling, explicit opponent type inference and the ensemble training significantly improves the decision-making performance over baseline approaches, and generalizes well against adversaries that have not been seen during the training.

Downloads

Published

2021-05-17

How to Cite

Shen, M., & How, J. P. (2021). Robust Opponent Modeling via Adversarial Ensemble Reinforcement Learning. Proceedings of the International Conference on Automated Planning and Scheduling, 31(1), 578-587. https://doi.org/10.1609/icaps.v31i1.16006