Semantic Lexicon Induction from Twitter with Pattern Relatedness and Flexible Term Length

Authors

  • Ashequl Qadir University of Utah
  • Pablo Mendes IBM Research
  • Daniel Gruhl IBM Research
  • Neal Lewis IBM Research

DOI:

https://doi.org/10.1609/aaai.v29i1.9519

Keywords:

lexicon induction, social media

Abstract

With the rise of social media, learning from informal text has become increasingly important. We present a novel semantic lexicon induction approach that is able to learn new vocabulary from social media. Our method is robust to the idiosyncrasies of informal and open-domain text corpora. Unlike previous work, it does not impose restrictions on the lexical features of candidate terms — e.g. by restricting entries to nouns or noun phrases —while still being able to accurately learn multiword phrases of variable length. Starting with a few seed terms for a semantic category, our method first explores the context around seed terms in a corpus, and identifies context patterns that are relevant to the category. These patterns are used to extract candidate terms — i.e. multiword segments that are further analyzed to ensure meaningful term boundary segmentation. We show that our approach is able to learn high quality semantic lexicons from informally written social media text of Twitter, and can achieve accuracy as high as 92% in the top 100 learned category members.

Downloads

Published

2015-02-19

How to Cite

Qadir, A., Mendes, P., Gruhl, D., & Lewis, N. (2015). Semantic Lexicon Induction from Twitter with Pattern Relatedness and Flexible Term Length. Proceedings of the AAAI Conference on Artificial Intelligence, 29(1). https://doi.org/10.1609/aaai.v29i1.9519