DeepSynth: Automata Synthesis for Automatic Task Segmentation in Deep Reinforcement Learning

Authors

  • Mohammadhosein Hasanbeig University of Oxford
  • Natasha Yogananda Jeppu University of Oxford
  • Alessandro Abate University of Oxford
  • Tom Melham University of Oxford
  • Daniel Kroening University of Oxford

Keywords:

Reinforcement Learning, (Deep) Neural Network Algorithms, Temporal Planning, Sequential Decision Making

Abstract

This paper proposes DeepSynth, a method for effective training of deep Reinforcement Learning (RL) agents when the reward is sparse and non-Markovian, but at the same time progress towards the reward requires achieving an unknown sequence of high-level objectives. Our method employs a novel algorithm for synthesis of compact automata to uncover this sequential structure automatically. We synthesise a human-interpretable automaton from trace data collected by exploring the environment. The state space of the environment is then enriched with the synthesised automaton so that the generation of a control policy by deep RL is guided by the discovered structure encoded in the automaton. The proposed approach is able to cope with both high-dimensional, low-level features and unknown sparse non-Markovian rewards. We have evaluated DeepSynth's performance in a set of experiments that includes the Atari game Montezuma's Revenge. Compared to existing approaches, we obtain a reduction of two orders of magnitude in the number of iterations required for policy synthesis, and also a significant improvement in scalability.

Downloads

Published

2021-05-18

How to Cite

Hasanbeig, M., Yogananda Jeppu, N., Abate, A., Melham, T., & Kroening, D. (2021). DeepSynth: Automata Synthesis for Automatic Task Segmentation in Deep Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 35(9), 7647-7656. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/16935

Issue

Section

AAAI Technical Track on Machine Learning II