ERLP: Ensembles of Reinforcement Learning Policies (Student Abstract)


  • Rohan Saphal Indian Institute of Technology Madras
  • Balaraman Ravindran Indian Institute of Technology Madras
  • Dheevatsa Mudigere Facebook Inc
  • Sasikanth Avancha Intel Labs
  • Bharat Kaul Intel Labs



Reinforcement learning algorithms are sensitive to hyper-parameters and require tuning and tweaking for specific environments for improving performance. Ensembles of reinforcement learning models on the other hand are known to be much more robust and stable. However, training multiple models independently on an environment suffers from high sample complexity. We present here a methodology to create multiple models from a single training instance that can be used in an ensemble through directed perturbation of the model parameters at regular intervals. This allows training a single model that converges to several local minima during the optimization process as a result of the perturbation. By saving the model parameters at each such instance, we obtain multiple policies during training that are ensembled during evaluation. We evaluate our approach on challenging discrete and continuous control tasks and also discuss various ensembling strategies. Our framework is substantially sample efficient, computationally inexpensive and is seen to outperform state of the art (SOTA) approaches




How to Cite

Saphal, R., Ravindran, B., Mudigere, D., Avancha, S., & Kaul, B. (2020). ERLP: Ensembles of Reinforcement Learning Policies (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 34(10), 13905-13906.



Student Abstract Track