Unsupervised Phrasal Near-Synonym Generation from Text Corpora


  • Dishan Gupta Carnegie Mellon University
  • Jaime Carbonell Carnegie Mellon University
  • Anatole Gershman Carnegie Mellon University
  • Steve Klein Meaningful Machines, LLC
  • David Miller Meaningful Machines, LLC




Phrasal Synonyms, Paraphrase Acquisition, Monolingual Corpora, Distributional Similarity


Unsupervised discovery of synonymous phrases is useful in a variety of tasks ranging from text mining and search engines to semantic analysis and machine translation. This paper presents an unsupervised corpus-based conditional model: Near-Synonym System (NeSS) for finding phrasal synonyms and near synonyms that requires only a large monolingual corpus. The method is based on maximizing information-theoretic combinations of shared contexts and is parallelizable for large-scale processing. An evaluation framework with crowd-sourced judgments is proposed and results are compared with alternate methods, demonstrating considerably superior results to the literature and to thesaurus look up for multi-word phrases. Moreover, the results show that the statistical scoring functions and overall scalability of the system are more important than language specific NLP tools. The method is language-independent and practically useable due to accuracy and real-time performance via parallel decomposition.




How to Cite

Gupta, D., Carbonell, J., Gershman, A., Klein, S., & Miller, D. (2015). Unsupervised Phrasal Near-Synonym Generation from Text Corpora. Proceedings of the AAAI Conference on Artificial Intelligence, 29(1). https://doi.org/10.1609/aaai.v29i1.9504