Chinese LIWC Lexicon Expansion via Hierarchical Classification of Word Embeddings with Sememe Attention

Authors

  • Xiangkai Zeng Beihang University
  • Cheng Yang Tsinghua University
  • Cunchao Tu Tsinghua University
  • Zhiyuan Liu Tsinghua University
  • Maosong Sun Tsinghua University

DOI:

https://doi.org/10.1609/aaai.v32i1.11982

Keywords:

lexicon expansion, hierarchical classification, machine learning, natural language processing

Abstract

Linguistic Inquiry and Word Count (LIWC) is a word counting software tool which has been used for quantitative text analysis in many fields. Due to its success and popularity, the core lexicon has been translated into Chinese and many other languages. However, the lexicon only contains several thousand of words, which is deficient compared with the number of common words in Chinese. Current approaches often require manually expanding the lexicon, but it often takes too much time and requires linguistic experts to extend the lexicon. To address this issue, we propose to expand the LIWC lexicon automatically. Specifically, we consider it as a hierarchical classification problem and utilize the Sequence-to-Sequence model to classify words in the lexicon. Moreover, we use the sememe information with the attention mechanism to capture the exact meanings of a word, so that we can expand a more precise and comprehensive lexicon. The experimental results show that our model has a better understanding of word meanings with the help of sememes and achieves significant and consistent improvements compared with the state-of-the-art methods. The source code of this paper can be obtained from https://github.com/thunlp/Auto_CLIWC.

Downloads

Published

2018-04-27

How to Cite

Zeng, X., Yang, C., Tu, C., Liu, Z., & Sun, M. (2018). Chinese LIWC Lexicon Expansion via Hierarchical Classification of Word Embeddings with Sememe Attention. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1). https://doi.org/10.1609/aaai.v32i1.11982