An Embedding-based Joint Sentiment-Topic Model for Short Texts

Ayan Sengupta; William Scott Paka; Suman Roy; Gaurav Ranjan; Tanmoy Chakraborty

doi:10.1609/icwsm.v15i1.18090

Authors

Ayan Sengupta Optum Global Advantage (OGA), (UnitedHealth Group)
William Scott Paka IIIT Delhi
Suman Roy Optum Global Advantage (OGA), (UnitedHealth Group)
Gaurav Ranjan Optum Global Advantage (OGA), (UnitedHealth Group)
Tanmoy Chakraborty IIIT Delhi

DOI:

https://doi.org/10.1609/icwsm.v15i1.18090

Keywords:

Subjectivity in textual data; sentiment analysis; polarity/opinion identification and extraction, linguistic analyses of social media behavior, Text categorization; topic recognition; demographic/gender/age identification

Abstract

Short text is a popular avenue of sharing feedback, opinions and reviews on social media, e-commerce platforms, etc. Many companies need to extract meaningful information (which may include thematic content as well as semantic polarity) out of such short texts to understand users’ behaviour. However, obtaining high quality sentiment-associated and human interpretable themes still remains a challenge for short texts. In this paper we develop ELJST, an embedding enhanced generative joint sentiment-topic model that can discover more coherent and diverse topics from short texts. It uses Markov Random Field Regularizer that can be seen as generalisation of skip-gram based models. Further, it can leverage higher order semantic information appearing in word embedding, such as self-attention weights in graphical models. Our results show an average improvement of 10% in topic coherence and 5% in topic diversification over baselines. Finally, ELJST helps understand users' behaviour at more granular levels which can be explained. All these can bring significant values to service and healthcare industries often dealing with customers.

An Embedding-based Joint Sentiment-Topic Model for Short Texts

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information