MimicProp: Learning to Incorporate Lexicon Knowledge into Distributed Word Representation for Social Media Analysis

Muheng Yan; Yu-Ru Lin; Rebecca Hwa; Ali Mert Ertugrul; Meiqi Guo; Wen-Ting Chung

doi:10.1609/icwsm.v14i1.7339

MimicProp: Learning to Incorporate Lexicon Knowledge into Distributed Word Representation for Social Media Analysis

Authors

Muheng Yan University of Pittsburgh
Yu-Ru Lin University of Pittsburgh
Rebecca Hwa University of Pittsburgh
Ali Mert Ertugrul University of Pittsburgh
Meiqi Guo University of Pittsburgh
Wen-Ting Chung University of Pittsburgh

DOI:

https://doi.org/10.1609/icwsm.v14i1.7339

Abstract

Lexicon-based methods and word embeddings are the two widely used approaches for analyzing texts in social media. The choice of an approach can have a significant impact on the reliability of the text analysis. For example, lexicons provide manually curated, domain-specific attributes about a limited set of words, while word embeddings learn to encode some loose semantic interpretations for a much broader set of words. Text analysis can benefit from a representation that offers both the broad coverage of word embeddings and the domain knowledge of lexicons. This paper presents MimicProp, a new graph-mode method that learns a lexicon-aligned word embedding. Our approach improves over prior graph-based methods in terms of its interpretability (i.e., lexicon attributes can be recovered) and generalizability (i.e., new words can be learned to incorporate lexicon knowledge). It also effectively improves the performance of downstream analysis applications, such as text classification.

Downloads

Published

2020-05-26

How to Cite

Yan, M., Lin, Y.-R., Hwa, R., Mert Ertugrul, A., Guo, M., & Chung, W.-T. (2020). MimicProp: Learning to Incorporate Lexicon Knowledge into Distributed Word Representation for Social Media Analysis. Proceedings of the International AAAI Conference on Web and Social Media, 14(1), 738–749. https://doi.org/10.1609/icwsm.v14i1.7339

Download Citation

Issue

Vol. 14 (2020): Fourteenth International AAAI Conference on Web and Social Media

Section

Full Papers

MimicProp: Learning to Incorporate Lexicon Knowledge into Distributed Word Representation for Social Media Analysis

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information