Exploring Gradient Explosion in Generative Adversarial Imitation Learning: A Probabilistic Perspective

Authors

  • Wanying Wang Department of Mathematics, College of Science, Shanghai University
  • Yichen Zhu Midea Group
  • Yirui Zhou Department of Mathematics, College of Science, Shanghai University
  • Chaomin Shen East China Normal University
  • Jian Tang Midea Group
  • Zhiyuan Xu Midea Group
  • Yaxin Peng Department of Mathematics, College of Science, Shanghai University
  • Yangchun Zhang Department of Mathematics, College of Science, Shanghai University

DOI:

https://doi.org/10.1609/aaai.v38i14.29490

Keywords:

ML: Imitation Learning & Inverse Reinforcement Learning

Abstract

Generative Adversarial Imitation Learning (GAIL) stands as a cornerstone approach in imitation learning. This paper investigates the gradient explosion in two types of GAIL: GAIL with deterministic policy (DE-GAIL) and GAIL with stochastic policy (ST-GAIL). We begin with the observation that the training can be highly unstable for DE-GAIL at the beginning of the training phase and end up divergence. Conversely, the ST-GAIL training trajectory remains consistent, reliably converging. To shed light on these disparities, we provide an explanation from a theoretical perspective. By establishing a probabilistic lower bound for GAIL, we demonstrate that gradient explosion is an inevitable outcome for DE-GAIL due to occasionally large expert-imitator policy disparity, whereas ST-GAIL does not have the issue with it. To substantiate our assertion, we illustrate how modifications in the reward function can mitigate the gradient explosion challenge. Finally, we propose CREDO, a simple yet effective strategy that clips the reward function during the training phase, allowing the GAIL to enjoy high data efficiency and stable trainability.

Published

2024-03-24

How to Cite

Wang, W., Zhu, Y., Zhou, Y., Shen, C., Tang, J., Xu, Z., … Zhang, Y. (2024). Exploring Gradient Explosion in Generative Adversarial Imitation Learning: A Probabilistic Perspective. Proceedings of the AAAI Conference on Artificial Intelligence, 38(14), 15625–15633. https://doi.org/10.1609/aaai.v38i14.29490

Issue

Section

AAAI Technical Track on Machine Learning V