Explaining (Sarcastic) Utterances to Enhance Affect Understanding in Multimodal Dialogues

Shivani Kumar; Ishani Mondal; Md Shad Akhtar; Tanmoy Chakraborty

doi:10.1609/aaai.v37i11.26526

Authors

Shivani Kumar Indraprastha Institute of Information Technology Delhi, India
Ishani Mondal University of Maryland, College Park
Md Shad Akhtar Indraprastha Institute of Information Technology Delhi, India
Tanmoy Chakraborty Indian Institute of Technology Delhi, India

DOI:

https://doi.org/10.1609/aaai.v37i11.26526

Keywords:

SNLP: Conversational AI/Dialogue Systems, SNLP: Applications, SNLP: Discourse, Pragmatics & Argument Mining, SNLP: Generation, SNLP: Information Extraction, SNLP: Speech and Multimodality, SNLP: Summarization, SNLP: Text Classification

Abstract

Conversations emerge as the primary media for exchanging ideas and conceptions. From the listener’s perspective, identifying various affective qualities, such as sarcasm, humour, and emotions, is paramount for comprehending the true connotation of the emitted utterance. However, one of the major hurdles faced in learning these affect dimensions is the presence of figurative language, viz. irony, metaphor, or sarcasm. We hypothesize that any detection system constituting the exhaustive and explicit presentation of the emitted utterance would improve the overall comprehension of the dialogue. To this end, we explore the task of Sarcasm Explanation in Dialogues, which aims to unfold the hidden irony behind sarcastic utterances. We propose MOSES, a deep neural network which takes a multimodal (sarcastic) dialogue instance as an input and generates a natural language sentence as its explanation. Subsequently, we leverage the generated explanation for various natural language understanding tasks in a conversational dialogue setup, such as sarcasm detection, humour identification, and emotion recognition. Our evaluation shows that MOSES outperforms the state-of-the-art system for SED by an average of ∼2% on different evaluation metrics, such as ROUGE, BLEU, and METEOR. Further, we observe that leveraging the generated explanation advances three downstream tasks for affect classification – an average improvement of ~14% F1-score in the sarcasm detection task and ∼2% in the humour identification and emotion recognition task. We also perform extensive analyses to assess the quality of the results.

Explaining (Sarcastic) Utterances to Enhance Affect Understanding in Multimodal Dialogues

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription