Combining Multiple Correlated Reward and Shaping Signals by Measuring Confidence

Tim Brys; Ann Nowé; Daniel Kudenko; Matthew Taylor

doi:10.1609/aaai.v28i1.8998

Authors

Tim Brys Vrije Universiteit Brussel
Ann Nowé Vrije Universiteit Brussel
Daniel Kudenko University of York
Matthew Taylor Washington State University

DOI:

https://doi.org/10.1609/aaai.v28i1.8998

Keywords:

Reinforcement Learning, Multi-Objectivization, Ensemble Techniques

Abstract

Multi-objective problems with correlated objectives are a class of problems that deserve specific attention. In contrast to typical multi-objective problems, they do not require the identification of trade-offs between the objectives, as (near-) optimal solutions for any objective are (near-) optimal for every objective. Intelligently combining the feedback from these objectives, instead of only looking at a single one, can improve optimization. This class of problems is very relevant in reinforcement learning, as any single-objective reinforcement learning problem can be framed as such a multi-objective problem using multiple reward shaping functions. After discussing this problem class, we propose a solution technique for such reinforcement learning problems, called adaptive objective selection. This technique makes a temporal difference learner estimate the Q-function for each objective in parallel, and introduces a way of measuring confidence in these estimates. This confidence metric is then used to choose which objective's estimates to use for action selection. We show significant improvements in performance over other plausible techniques on two problem domains. Finally, we provide an intuitive analysis of the technique's decisions, yielding insights into the nature of the problems being solved.

Combining Multiple Correlated Reward and Shaping Signals by Measuring Confidence

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription