Leveraging Lexical Substitutes for Unsupervised Word Sense Induction

Domagoj Alagić; Jan Šnajder; Sebastian Padó

doi:10.1609/aaai.v32i1.12017

Authors

Domagoj Alagić University of Zagreb
Jan Šnajder University of Zagreb
Sebastian Padó University of Stuttgart, Institut für Maschinelle Sprachverarbeitung

DOI:

https://doi.org/10.1609/aaai.v32i1.12017

Keywords:

word sense induction, lexical substitution

Abstract

Word sense induction is the most prominent unsupervised approach to lexical disambiguation. It clusters word instances, typically represented by their bag-of-words contexts. Therefore, uninformative and ambiguous contexts present a major challenge. In this paper, we investigate the use of an alternative instance representation based on lexical substitutes, i.e., contextually suitable, meaning-preserving replacements. Using lexical substitutes predicted by a state-of-the-art automatic system and a simple clustering algorithm, we outperform bag-of-words instance representations and compete with much more complex structured probabilistic models. Furthermore, we show that an oracle based on manually-labeled lexical substitutes yields yet substantially higher performance. Taken together, this provides evidence for a complementarity between word sense induction and lexical substitution that has not been given much consideration before.

Leveraging Lexical Substitutes for Unsupervised Word Sense Induction

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription