InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation

Authors

  • Pierre Jean A. Colombo Laboratoire des Signaux et Systèmes (L2S), CentraleSupelec CNRS Universite Paris-Saclay
  • Chloé Clavel Télécom ParisTech, Université Paris Saclay
  • Pablo Piantanida Laboratoire des Signaux et Systèmes (L2S), CentraleSupelec CNRS Universite Paris-Saclay

DOI:

https://doi.org/10.1609/aaai.v36i10.21299

Keywords:

Speech & Natural Language Processing (SNLP), Machine Learning (ML)

Abstract

Assessing the quality of natural language generation (NLG) systems through human annotation is very expensive. Additionally, human annotation campaigns are time-consuming and include non-reusable human labour. In practice, researchers rely on automatic metrics as a proxy of quality. In the last decade, many string-based metrics (e.g., BLEU or ROUGE) have been introduced. However, such metrics usually rely on exact matches and thus, do not robustly handle synonyms. In this paper, we introduce InfoLM a family of untrained metrics that can be viewed as a string-based metric that addresses the aforementioned flaws thanks to a pre-trained masked language model. This family of metrics also makes use of information measures allowing the possibility to adapt InfoLM to different evaluation criteria. Using direct assessment, we demonstrate that InfoLM achieves statistically significant improvement and two figure correlation gains in many configurations compared to existing metrics on both summarization and data2text generation tasks.

Downloads

Published

2022-06-28

How to Cite

Colombo, P. J. A., Clavel, C., & Piantanida, P. (2022). InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation. Proceedings of the AAAI Conference on Artificial Intelligence, 36(10), 10554-10562. https://doi.org/10.1609/aaai.v36i10.21299

Issue

Section

AAAI Technical Track on Speech and Natural Language Processing