Reinforcement Learning Via Practice and Critique Advice

Authors

  • Kshitij Judah Oregon State University
  • Saikat Roy Oregon State University
  • Alan Fern Oregon State University
  • Thomas Dietterich Oregon State University

DOI:

https://doi.org/10.1609/aaai.v24i1.7690

Keywords:

Reinforcement Learning, Human-Computer Interaction, Intelligent User Interfaces

Abstract

We consider the problem of incorporating end-user advice into reinforcement learning (RL). In our setting, the learner alternates between practicing, where learning is based on actual world experience, and end-user critique sessions where advice is gathered. During each critique session the end-user is allowed to analyze a trajectory of the current policy and then label an arbitrary subset of the available actions as good or bad. Our main contribution is an approach for integrating all of the information gathered during practice and critiques in order to effectively optimize a parametric policy. The approach optimizes a loss function that linearly combines losses measured against the world experience and the critique data. We evaluate our approach using a prototype system for teaching tactical battle behavior in a real-time strategy game engine. Results are given for a significant evaluation involving ten end-users showing the promise of this approach and also highlighting challenges involved in inserting end-users into the RL loop.

Downloads

Published

2010-07-03

How to Cite

Judah, K., Roy, S., Fern, A., & Dietterich, T. (2010). Reinforcement Learning Via Practice and Critique Advice. Proceedings of the AAAI Conference on Artificial Intelligence, 24(1), 481-486. https://doi.org/10.1609/aaai.v24i1.7690