C2C-GenDA: Cluster-to-Cluster Generation for Data Augmentation of Slot Filling

Yutai Hou; Sanyuan Chen; Wanxiang Che; Cheng Chen; Ting Liu

doi:10.1609/aaai.v35i14.17540

Authors

Yutai Hou Harbin Institute of Technology
Sanyuan Chen Harbin Institute of Technology
Wanxiang Che Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology
Cheng Chen Harbin Institute of Technology
Ting Liu Harbin Institute of Technology

DOI:

https://doi.org/10.1609/aaai.v35i14.17540

Keywords:

Conversational AI/Dialog Systems

Abstract

Slot filling, a fundamental module of spoken language understanding, often suffers from insufficient quantity and diversity of training data. To remedy this, we propose a novel Cluster-to-Cluster generation framework for Data Augmentation (DA), named C2C-GenDA. It enlarges the training set by reconstructing existing utterances into alternative expressions while keeping semantic. Different from previous DA works that reconstruct utterances one by one independently, C2C-GenDA jointly encodes multiple existing utterances of the same semantics and simultaneously decodes multiple unseen expressions. Jointly generating multiple new utterances allows to consider the relations between generated instances and encourages diversity. Besides, encoding multiple existing utterances endows C2C with a wider view of existing expressions, helping to reduce generation that duplicates existing data. Experiments on ATIS and Snips datasets show that instances augmented by C2C-GenDA improve slot filling by 7.99 (11.9%↑) and 5.76 (13.6%↑) F-scores respectively, when there are only hundreds of training utterances. Code: https://github.com/Sanyuan-Chen/C2C-DA.

C2C-GenDA: Cluster-to-Cluster Generation for Data Augmentation of Slot Filling

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information