OVD-Explorer: Optimism Should Not Be the Sole Pursuit of Exploration in Noisy Environments

Authors

  • Jinyi Liu College of Intelligence and Computing, Tianjin University
  • Zhi Wang Independent Researcher
  • Yan Zheng College of Intelligence and Computing, Tianjin University
  • Jianye Hao College of Intelligence and Computing, Tianjin University
  • Chenjia Bai Shanghai AI Laboratory
  • Junjie Ye Independent Researcher
  • Zhen Wang Northwestern Polytechnical University
  • Haiyin Piao Northwestern Polytechnical University
  • Yang Sun SADRI institute

DOI:

https://doi.org/10.1609/aaai.v38i12.29303

Keywords:

ML: Reinforcement Learning

Abstract

In reinforcement learning, the optimism in the face of uncertainty (OFU) is a mainstream principle for directing exploration towards less explored areas, characterized by higher uncertainty. However, in the presence of environmental stochasticity (noise), purely optimistic exploration may lead to excessive probing of high-noise areas, consequently impeding exploration efficiency. Hence, in exploring noisy environments, while optimism-driven exploration serves as a foundation, prudent attention to alleviating unnecessary over-exploration in high-noise areas becomes beneficial. In this work, we propose Optimistic Value Distribution Explorer (OVD-Explorer) to achieve a noise-aware optimistic exploration for continuous control. OVD-Explorer proposes a new measurement of the policy's exploration ability considering noise in optimistic perspectives, and leverages gradient ascent to drive exploration. Practically, OVD-Explorer can be easily integrated with continuous control RL algorithms. Extensive evaluations on the MuJoCo and GridChaos tasks demonstrate the superiority of OVD-Explorer in achieving noise-aware optimistic exploration.

Published

2024-03-24

How to Cite

Liu, J., Wang, Z., Zheng, Y., Hao, J., Bai, C., Ye, J., Wang, Z., Piao, H., & Sun, Y. (2024). OVD-Explorer: Optimism Should Not Be the Sole Pursuit of Exploration in Noisy Environments. Proceedings of the AAAI Conference on Artificial Intelligence, 38(12), 13954-13962. https://doi.org/10.1609/aaai.v38i12.29303

Issue

Section

AAAI Technical Track on Machine Learning III