ERLP: Ensembles of Reinforcement Learning Policies (Student Abstract)

Rohan Saphal; Balaraman Ravindran; Dheevatsa Mudigere; Sasikanth Avancha; Bharat Kaul

doi:10.1609/aaai.v34i10.7225

Authors

Rohan Saphal Indian Institute of Technology Madras
Balaraman Ravindran Indian Institute of Technology Madras
Dheevatsa Mudigere Facebook Inc
Sasikanth Avancha Intel Labs
Bharat Kaul Intel Labs

DOI:

https://doi.org/10.1609/aaai.v34i10.7225

Abstract

Reinforcement learning algorithms are sensitive to hyper-parameters and require tuning and tweaking for specific environments for improving performance. Ensembles of reinforcement learning models on the other hand are known to be much more robust and stable. However, training multiple models independently on an environment suffers from high sample complexity. We present here a methodology to create multiple models from a single training instance that can be used in an ensemble through directed perturbation of the model parameters at regular intervals. This allows training a single model that converges to several local minima during the optimization process as a result of the perturbation. By saving the model parameters at each such instance, we obtain multiple policies during training that are ensembled during evaluation. We evaluate our approach on challenging discrete and continuous control tasks and also discuss various ensembling strategies. Our framework is substantially sample efficient, computationally inexpensive and is seen to outperform state of the art (SOTA) approaches

ERLP: Ensembles of Reinforcement Learning Policies (Student Abstract)

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information