Attentive Experience Replay
Experience replay (ER) has become an important component of deep reinforcement learning (RL) algorithms. ER enables RL algorithms to reuse past experiences for the update of current policy. By reusing a previous state for training, the RL agent would learn more accurate value estimation and better decision on that state. However, as the policy is continually updated, some states in past experiences become rarely visited, and optimization over these states might not improve the overall performance of current policy. To tackle this issue, we propose a new replay strategy to prioritize the transitions that contain states frequently visited by current policy. We introduce Attentive Experience Replay (AER), a novel experience replay algorithm that samples transitions according to the similarities between their states and the agent's state. We couple AER with different off-policy algorithms and demonstrate that AER makes consistent improvements on the suite of OpenAI gym tasks.