Noisy Derivative-Free Optimization With Value Suppression


  • Hong Wang Nanjing University
  • Hong Qian Nanjing University
  • Yang Yu Nanjing University



Derivative-free Optimization, Noisy Environment, Direct Policy Search, Value Suppression


Derivative-free optimization has shown advantage in solving sophisticated problems such as policy search, when the environment is noise-free. Many real-world environments are noisy, where solution evaluations are inaccurate due to the noise. Noisy evaluation can badly injure derivative-free optimization, as it may make a worse solution looks better. Sampling is a straightforward way to reduce noise, while previous studies have shown that delay the noise handling to the comparison time point (i.e., threshold selection) can be helpful for derivative-free optimization. This work further delays the noise handling, and proposes a simple noise handling mechanism, i.e., value suppression. By value suppression, we do nothing about noise until the best-so-far solution has not been improved for a period, and then suppress the value of the best-so-far solution and continue the optimization. On synthetic problems as well as reinforcement learning tasks, experiments verify that value suppression can be significantly more effective than the previous methods.




How to Cite

Wang, H., Qian, H., & Yu, Y. (2018). Noisy Derivative-Free Optimization With Value Suppression. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1).



AAAI Technical Track: Heuristic Search and Optimization