InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation

Pierre Jean A. Colombo; Chloé Clavel; Pablo Piantanida

doi:10.1609/aaai.v36i10.21299

Authors

Pierre Jean A. Colombo Laboratoire des Signaux et Systèmes (L2S), CentraleSupelec CNRS Universite Paris-Saclay
Chloé Clavel Télécom ParisTech, Université Paris Saclay
Pablo Piantanida Laboratoire des Signaux et Systèmes (L2S), CentraleSupelec CNRS Universite Paris-Saclay

DOI:

https://doi.org/10.1609/aaai.v36i10.21299

Keywords:

Speech & Natural Language Processing (SNLP), Machine Learning (ML)

Abstract

Assessing the quality of natural language generation (NLG) systems through human annotation is very expensive. Additionally, human annotation campaigns are time-consuming and include non-reusable human labour. In practice, researchers rely on automatic metrics as a proxy of quality. In the last decade, many string-based metrics (e.g., BLEU or ROUGE) have been introduced. However, such metrics usually rely on exact matches and thus, do not robustly handle synonyms. In this paper, we introduce InfoLM a family of untrained metrics that can be viewed as a string-based metric that addresses the aforementioned flaws thanks to a pre-trained masked language model. This family of metrics also makes use of information measures allowing the possibility to adapt InfoLM to different evaluation criteria. Using direct assessment, we demonstrate that InfoLM achieves statistically significant improvement and two figure correlation gains in many configurations compared to existing metrics on both summarization and data2text generation tasks.

InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription