Response Enhanced Semi-supervised Dialogue Query Generation

Authors

  • Jianheng Huang School of Informatics, Xiamen University, China Shanghai Artificial Intelligence Laboratory, China Key Laboratory of Digital Protection and Intelligent Processing of Intangible Cultural Heritage of Fujian and Taiwan (Xiamen University), Ministry of Culture and Tourism, China
  • Ante Wang School of Informatics, Xiamen University, China Shanghai Artificial Intelligence Laboratory, China Key Laboratory of Digital Protection and Intelligent Processing of Intangible Cultural Heritage of Fujian and Taiwan (Xiamen University), Ministry of Culture and Tourism, China
  • Linfeng Gao School of Informatics, Xiamen University, China Key Laboratory of Digital Protection and Intelligent Processing of Intangible Cultural Heritage of Fujian and Taiwan (Xiamen University), Ministry of Culture and Tourism, China
  • Linfeng Song Tencent AI Lab
  • Jinsong Su School of Informatics, Xiamen University, China Shanghai Artificial Intelligence Laboratory, China Key Laboratory of Digital Protection and Intelligent Processing of Intangible Cultural Heritage of Fujian and Taiwan (Xiamen University), Ministry of Culture and Tourism, China

DOI:

https://doi.org/10.1609/aaai.v38i16.29790

Keywords:

NLP: Conversational AI/Dialog Systems, NLP: Generation

Abstract

Leveraging vast and continually updated knowledge from the Internet has been considered an important ability for a dialogue system. Therefore, the dialogue query generation task is proposed for generating search queries from dialogue histories, which will be submitted to a search engine for retrieving relevant websites on the Internet. In this regard, previous efforts were devoted to collecting conversations with annotated queries and training a query producer (QP) via standard supervised learning. However, these studies still face the challenges of data scarcity and domain adaptation. To address these issues, in this paper, we propose a semi-supervised learning framework -- SemiDQG, to improve model performance with unlabeled conversations. Based on the observation that the search query is typically related to the topic of dialogue response, we train a response-augmented query producer (RA) to provide rich and effective training signals for QP. We first apply a similarity-based query selection strategy to select high-quality RA-generated pseudo queries, which are used to construct pseudo instances for training QP and RA. Then, we adopt the REINFORCE algorithm to further enhance QP, with RA-provided rewards as fine-grained training signals. Experimental results and in-depth analysis of three benchmarks show the effectiveness of our framework in cross-domain and low-resource scenarios. Particularly, SemiDQG significantly surpasses ChatGPT and competitive baselines. Our code is available at \url{https://github.com/DeepLearnXMU/SemiDQG}.

Published

2024-03-24

How to Cite

Huang, J., Wang, A., Gao, L., Song, L., & Su, J. (2024). Response Enhanced Semi-supervised Dialogue Query Generation. Proceedings of the AAAI Conference on Artificial Intelligence, 38(16), 18307–18315. https://doi.org/10.1609/aaai.v38i16.29790

Issue

Section

AAAI Technical Track on Natural Language Processing I