(1)
Wu, Y.; Li, X.; Liu, J.; Gao, J.; Yang, Y. Switch-Based Active Deep Dyna-Q: Efficient Adaptive Planning for Task-Completion Dialogue Policy Learning. AAAI 2019, 33, 7289-7296.