Solving Online Threat Screening Games using Constrained Action Space Reinforcement Learning
Large-scale screening for potential threats with limited resources and capacity for screening is a problem of interest at airports, seaports, and other ports of entry. Adversaries can observe screening procedures and arrive at a time when there will be gaps in screening due to limited resource capacities. To capture this game between ports and adversaries, this problem has been previously represented as a Stackelberg game, referred to as a Threat Screening Game (TSG). Given the significant complexity associated with solving TSGs and uncertainty in arrivals of customers, existing work has assumed that screenees arrive and are allocated security resources at the beginning of the time-window. In practice, screenees such as airport passengers arrive in bursts correlated with flight time and are not bound by fixed time-windows. To address this, we propose an online threat screening model in which the screening strategy is determined adaptively as a passenger arrives while satisfying a hard bound on acceptable risk of not screening a threat. To solve the online problem, we first reformulate it as a Markov Decision Process (MDP) in which the hard bound on risk translates to a constraint on the action space and then solve the resultant MDP using Deep Reinforcement Learning (DRL). To this end, we provide a novel way to efficiently enforce linear inequality constraints on the action output in DRL. We show that our solution allows us to significantly reduce screenee wait time without compromising on the risk.