MetaLight: Value-Based Meta-Reinforcement Learning for Traffic Signal Control
Using reinforcement learning for traffic signal control has attracted increasing interests recently. Various value-based reinforcement learning methods have been proposed to deal with this classical transportation problem and achieved better performances compared with traditional transportation methods. However, current reinforcement learning models rely on tremendous training data and computational resources, which may have bad consequences (e.g., traffic jams or accidents) in the real world. In traffic signal control, some algorithms have been proposed to empower quick learning from scratch, but little attention is paid to learning by transferring and reusing learned experience. In this paper, we propose a novel framework, named as MetaLight, to speed up the learning process in new scenarios by leveraging the knowledge learned from existing scenarios. MetaLight is a value-based meta-reinforcement learning workflow based on the representative gradient-based meta-learning algorithm (MAML), which includes periodically alternate individual-level adaptation and global-level adaptation. Moreover, MetaLight improves the-state-of-the-art reinforcement learning model FRAP in traffic signal control by optimizing its model structure and updating paradigm. The experiments on four real-world datasets show that our proposed MetaLight not only adapts more quickly and stably in new traffic scenarios, but also achieves better performance.