Real-Time Recurrent Reinforcement Learning

Julian Lemmel; Radu Grosu

doi:10.1609/aaai.v39i17.34001

Authors

Julian Lemmel Vienna University of Technology DatenVorsprung GmbH
Radu Grosu Vienna University of Technology

DOI:

https://doi.org/10.1609/aaai.v39i17.34001

Abstract

We introduce a biologically plausible RL framework for solving tasks in partially observable Markov decision processes (POMDPs). The proposed algorithm combines three integral parts: (1) A Meta-RL architecture, resembling the mammalian basal ganglia; (2) A biologically plausible reinforcement learning algorithm, exploiting temporal difference learning and eligibility traces to train the policy and the value-function; (3) An online automatic differentiation algorithm for computing the gradients with respect to parameters of a shared recurrent network backbone. Our experimental results show that the method is capable of solving a diverse set of partially observable reinforcement learning tasks. The algorithm we call real-time recurrent reinforcement learning (RTRRL) serves as a model of learning in biological neural networks, mimicking reward pathways in the basal ganglia.

Real-Time Recurrent Reinforcement Learning

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information