BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems

Zachary Lipton; Xiujun Li; Jianfeng Gao; Lihong Li; Faisal Ahmed; Li Deng

doi:10.1609/aaai.v32i1.11946

BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems

Authors

Zachary Lipton Carnegie Mellon University
Xiujun Li Microsoft Research Redmond
Jianfeng Gao Microsoft Research Redmond
Lihong Li Google Inc.
Faisal Ahmed Microsoft Research Redmond
Li Deng Citadel

DOI:

https://doi.org/10.1609/aaai.v32i1.11946

Keywords:

task-oriented dialogue, deep reinforcement learning, exploration, policy learning

Abstract

We present a new algorithm that significantly improves the efficiency of exploration for deep Q-learning agents in dialogue systems. Our agents explore via Thompson sampling, drawing Monte Carlo samples from a Bayes-by-Backprop neural network. Our algorithm learns much faster than common exploration strategies such as ε-greedy, Boltzmann, bootstrapping, and intrinsic-reward-based ones. Additionally, we show that spiking the replay buffer with experiences from just a few successful episodes can make Q-learning feasible when it might otherwise fail.

Downloads

Published

2018-04-27

How to Cite

Lipton, Z., Li, X., Gao, J., Li, L., Ahmed, F., & Deng, L. (2018). BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1). https://doi.org/10.1609/aaai.v32i1.11946

Download Citation

Issue

Vol. 32 No. 1 (2018): Thirty-Second AAAI Conference on Artificial Intelligence

Section

Main Track: NLP and Machine Learning

BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information