Reward Certification for Policy Smoothed Reinforcement Learning

Ronghui Mu; Leandro Soriano Marcolino; Yanghao Zhang; Tianle Zhang; Xiaowei Huang; Wenjie Ruan

doi:10.1609/aaai.v38i19.30139

Authors

Ronghui Mu University of Liverpool
Leandro Soriano Marcolino Lancaster University
Yanghao Zhang University of Liverpool
Tianle Zhang University of Liverpool
Xiaowei Huang University of Liverpool
Wenjie Ruan University of Liverpool

DOI:

https://doi.org/10.1609/aaai.v38i19.30139

Keywords:

General

Abstract

Reinforcement Learning (RL) has achieved remarkable success in safety-critical areas, but it can be weakened by adversarial attacks. Recent studies have introduced ``smoothed policies" to enhance its robustness. Yet, it is still challenging to establish a provable guarantee to certify the bound of its total reward. Prior methods relied primarily on computing bounds using Lipschitz continuity or calculating the probability of cumulative reward being above specific thresholds. However, these techniques are only suited for continuous perturbations on the RL agent's observations and are restricted to perturbations bounded by the l2-norm. To address these limitations, this paper proposes a general black-box certification method, called ReCePS, which is capable of directly certifying the cumulative reward of the smoothed policy under various lp-norm bounded perturbations. Furthermore, we extend our methodology to certify perturbations on action spaces. Our approach leverages f-divergence to measure the distinction between the original distribution and the perturbed distribution, subsequently determining the certification bound by solving a convex optimisation problem. We provide a comprehensive theoretical analysis and run experiments in multiple environments. Our results show that our method not only improves the tightness of certified lower bound of the mean cumulative reward but also demonstrates better efficiency than state-of-the-art methods.

Reward Certification for Policy Smoothed Reinforcement Learning

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription