Rating-Based Reinforcement Learning

Authors

  • Devin White University of Texas, San Antonio
  • Mingkang Wu University of Texas, San Antonio
  • Ellen Novoseller DEVCOM Army Research Laboratory
  • Vernon J. Lawhern DEVCOM Army Research Laboratory
  • Nicholas Waytowich DEVCOM Army Research Laboratory
  • Yongcan Cao University of Texas, San Antonio

DOI:

https://doi.org/10.1609/aaai.v38i9.28886

Keywords:

HAI: Learning Human Values and Preferences, HAI: Human-in-the-loop Machine Learning

Abstract

This paper develops a novel rating-based reinforcement learning approach that uses human ratings to obtain human guidance in reinforcement learning. Different from the existing preference-based and ranking-based reinforcement learning paradigms, based on human relative preferences over sample pairs, the proposed rating-based reinforcement learning approach is based on human evaluation of individual trajectories without relative comparisons between sample pairs. The rating-based reinforcement learning approach builds on a new prediction model for human ratings and a novel multi-class loss function. We conduct several experimental studies based on synthetic ratings and real human ratings to evaluate the effectiveness and benefits of the new rating-based reinforcement learning approach.

Published

2024-03-24

How to Cite

White, D., Wu, M., Novoseller, E., Lawhern, V. J., Waytowich, N., & Cao, Y. (2024). Rating-Based Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 38(9), 10207-10215. https://doi.org/10.1609/aaai.v38i9.28886

Issue

Section

AAAI Technical Track on Humans and AI