Search and Learn: Improving Semantic Coverage for Data-to-Text Generation

Authors

  • Shailza Jolly TU Kaiserslautern, Germany DFKI GmbH, Germany
  • Zi Xuan Zhang University of Alberta, Canada
  • Andreas Dengel TU Kaiserslautern, Germany DFKI GmbH, Germany
  • Lili Mou University of Alberta, Canada

DOI:

https://doi.org/10.1609/aaai.v36i10.21332

Keywords:

Speech & Natural Language Processing (SNLP), Machine Learning (ML)

Abstract

Data-to-text generation systems aim to generate text descriptions based on input data (often represented in the tabular form). A typical system uses huge training samples for learning the correspondence between tables and texts. However, large training sets are expensive to obtain, limiting the applicability of these approaches in real-world scenarios. In this work, we focus on few-shot data-to-text generation. We observe that, while fine-tuned pretrained language models may generate plausible sentences, they suffer from the low semantic coverage problem in the few-shot setting. In other words, important input slots tend to be missing in the generated text. To this end, we propose a search-and-learning approach that leverages pretrained language models but inserts the missing slots to improve the semantic coverage. We further finetune our system based on the search results to smooth out the search noise, yielding better-quality text and improving inference efficiency to a large extent. Experiments show that our model achieves high performance on E2E and WikiBio datasets. Especially, we cover 98.35% of input slots on E2E, largely alleviating the low coverage problem.

Downloads

Published

2022-06-28

How to Cite

Jolly, S., Zhang, Z. X., Dengel, A., & Mou, L. (2022). Search and Learn: Improving Semantic Coverage for Data-to-Text Generation. Proceedings of the AAAI Conference on Artificial Intelligence, 36(10), 10858-10866. https://doi.org/10.1609/aaai.v36i10.21332

Issue

Section

AAAI Technical Track on Speech and Natural Language Processing