Robust Adversarial Reinforcement Learning with Dissipation Inequation Constraint

Authors

  • Peng Zhai Academy for Engineering and Technology, Fudan University, Shanhai, China Jilin Provincial Key Laboratory of Intelligence Science and Engineering, Changchun, China
  • Jie Luo Academy for Engineering and Technology, Fudan University, Shanhai, China Engineering Research Center of AI and Robotics, Shanghai, China
  • Zhiyan Dong Academy for Engineering and Technology, Fudan University, Shanhai, China Ji Hua Laboratory, Foshan, China Engineering Research Center of AI and Robotics, Shanghai, China
  • Lihua Zhang Academy for Engineering and Technology, Fudan University, Shanhai, China Engineering Research Center of AI and Robotics, Ministry of Education, Shanghai 200433, China Jilin Provincial Key Laboratory of Intelligence Science and Engineering, Changchun, China
  • Shunli Wang Academy for Engineering and Technology, Fudan University, Shanhai, China Engineering Research Center of AI and Robotics, Ministry of Education, Shanghai 200433, China
  • Dingkang Yang Academy for Engineering and Technology, Fudan University, Shanhai, China Jilin Provincial Key Laboratory of Intelligence Science and Engineering, Changchun, China

DOI:

https://doi.org/10.1609/aaai.v36i5.20481

Keywords:

Intelligent Robotics (ROB), Machine Learning (ML)

Abstract

Robust adversarial reinforcement learning is an effective method to train agents to manage uncertain disturbance and modeling errors in real environments. However, for systems that are sensitive to disturbances or those that are difficult to stabilize, it is easier to learn a powerful adversary than establish a stable control policy. An improper strong adversary can destabilize the system, introduce biases in the sampling process, make the learning process unstable, and even reduce the robustness of the policy. In this study, we consider the problem of ensuring system stability during training in the adversarial reinforcement learning architecture. The dissipative principle of robust H-infinity control is extended to the Markov Decision Process, and robust stability constraints are obtained based on L2 gain performance in the reinforcement learning system. Thus, we propose a dissipation-inequation-constraint-based adversarial reinforcement learning architecture. This architecture ensures the stability of the system during training by imposing constraints on the normal and adversarial agents. Theoretically, this architecture can be applied to a large family of deep reinforcement learning algorithms. Results of experiments in MuJoCo and GymFc environments show that our architecture effectively improves the robustness of the controller against environmental changes and adapts to more powerful adversaries. Results of the flight experiments on a real quadcopter indicate that our method can directly deploy the policy trained in the simulation environment to the real environment, and our controller outperforms the PID controller based on hardware-in-the-loop. Both our theoretical and empirical results provide new and critical outlooks on the adversarial reinforcement learning architecture from a rigorous robust control perspective.

Downloads

Published

2022-06-28

How to Cite

Zhai, P., Luo, J., Dong, Z., Zhang, L., Wang, S., & Yang, D. (2022). Robust Adversarial Reinforcement Learning with Dissipation Inequation Constraint. Proceedings of the AAAI Conference on Artificial Intelligence, 36(5), 5431-5439. https://doi.org/10.1609/aaai.v36i5.20481

Issue

Section

AAAI Technical Track on Intelligent Robotics