Robust Adversarial Reinforcement Learning with Dissipation Inequation Constraint

Peng Zhai; Jie Luo; Zhiyan Dong; Lihua Zhang; Shunli Wang; Dingkang Yang

doi:10.1609/aaai.v36i5.20481

Authors

Peng Zhai Academy for Engineering and Technology, Fudan University, Shanhai, China Jilin Provincial Key Laboratory of Intelligence Science and Engineering, Changchun, China
Jie Luo Academy for Engineering and Technology, Fudan University, Shanhai, China Engineering Research Center of AI and Robotics, Shanghai, China
Zhiyan Dong Academy for Engineering and Technology, Fudan University, Shanhai, China Ji Hua Laboratory, Foshan, China Engineering Research Center of AI and Robotics, Shanghai, China
Lihua Zhang Academy for Engineering and Technology, Fudan University, Shanhai, China Engineering Research Center of AI and Robotics, Ministry of Education, Shanghai 200433, China Jilin Provincial Key Laboratory of Intelligence Science and Engineering, Changchun, China
Shunli Wang Academy for Engineering and Technology, Fudan University, Shanhai, China Engineering Research Center of AI and Robotics, Ministry of Education, Shanghai 200433, China
Dingkang Yang Academy for Engineering and Technology, Fudan University, Shanhai, China Jilin Provincial Key Laboratory of Intelligence Science and Engineering, Changchun, China

DOI:

https://doi.org/10.1609/aaai.v36i5.20481

Keywords:

Intelligent Robotics (ROB), Machine Learning (ML)

Abstract

Robust adversarial reinforcement learning is an effective method to train agents to manage uncertain disturbance and modeling errors in real environments. However, for systems that are sensitive to disturbances or those that are difficult to stabilize, it is easier to learn a powerful adversary than establish a stable control policy. An improper strong adversary can destabilize the system, introduce biases in the sampling process, make the learning process unstable, and even reduce the robustness of the policy. In this study, we consider the problem of ensuring system stability during training in the adversarial reinforcement learning architecture. The dissipative principle of robust H-inﬁnity control is extended to the Markov Decision Process, and robust stability constraints are obtained based on L2 gain performance in the reinforcement learning system. Thus, we propose a dissipation-inequation-constraint-based adversarial reinforcement learning architecture. This architecture ensures the stability of the system during training by imposing constraints on the normal and adversarial agents. Theoretically, this architecture can be applied to a large family of deep reinforcement learning algorithms. Results of experiments in MuJoCo and GymFc environments show that our architecture effectively improves the robustness of the controller against environmental changes and adapts to more powerful adversaries. Results of the flight experiments on a real quadcopter indicate that our method can directly deploy the policy trained in the simulation environment to the real environment, and our controller outperforms the PID controller based on hardware-in-the-loop. Both our theoretical and empirical results provide new and critical outlooks on the adversarial reinforcement learning architecture from a rigorous robust control perspective.

Robust Adversarial Reinforcement Learning with Dissipation Inequation Constraint

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription