(1)
Wang, C.; Zhou, H.; Hu, Y.; Huo, Y.; Li, B.; Liu, T.; Xiao, T.; Zhu, J. ESRL: Efficient Sampling-Based Reinforcement Learning for Sequence Generation. AAAI 2024, 38, 19107-19115.