Well, Now We Know! Unveiling Sarcasm: Initiating and Exploring Multimodal Conversations with Reasoning

Authors

  • Gopendra Vikram Singh Department of Computer Science and Engineering, Indian Institute of Technology Patna, India
  • Mauajama Firdaus Department of Computing Science, University of Alberta, Canada
  • Dushyant Singh Chauhan Department of Computer Science and Engineering, Indian Institute of Technology Patna, India
  • Asif Ekbal Department of Computer Science and Engineering, Indian Institute of Technology Patna, India
  • Pushpak Bhattacharyya Indian Institute of Technology Bombay, India

DOI:

https://doi.org/10.1609/aaai.v38i17.29864

Keywords:

NLP: Text Classification, NLP: Generation

Abstract

Sarcasm is a widespread linguistic phenomenon that poses a considerable challenge to explain due to its subjective nature, absence of contextual cues, and rooted personal perspectives. Even though the identification of sarcasm has been extensively studied in dialogue analysis, merely detecting sarcasm falls short of enabling conversational systems to genuinely comprehend the underlying meaning of a conversation and generate fitting responses. It is imperative to not only detect sarcasm but also pinpoint its origination and the rationale behind the sarcastic expressions to capture its authentic essence. In this paper, we delve into the discourse structure of conversations infused with sarcasm and introduce a novel task - Sarcasm Initiation and Reasoning in Conversations (SIRC). Embedded in a multimodal environment and involving a combination of both English and code-mixed interactions, the objective of the task is to discern the trigger or starting point of sarcasm. Additionally, the task involves producing a natural language explanation that rationalizes the satirical dialogues. To this end, we introduce Sarcasm Initiation and Reasoning Dataset (SIRD) to facilitate our task and provide sarcasm initiation annotations and reasoning. We develop a comprehensive model named Sarcasm Initiation and Reasoning Generation (SIRG), which is designed to encompass textual, audio, and visual representations. To achieve this, we introduce a unique shared fusion method that employs cross-attention mechanisms to seamlessly integrate these diverse modalities. Our experimental outcomes, conducted on the SIRC dataset, demonstrate that our proposed framework establishes a new benchmark for both sarcasm initiation and its reasoning generation in the context of multimodal conversations. The code and dataset can be accessed from https://www.iitp.ac.in/∼ai-nlp-ml resources.html#sarcasm-explain and https://github.com/GussailRaat/SIRG-Sarcasm-Initiation-and-Reasoning-Generation.

Published

2024-03-24

How to Cite

Singh, G. V., Firdaus, M., Chauhan, D. S., Ekbal, A., & Bhattacharyya, P. (2024). Well, Now We Know! Unveiling Sarcasm: Initiating and Exploring Multimodal Conversations with Reasoning. Proceedings of the AAAI Conference on Artificial Intelligence, 38(17), 18981-18989. https://doi.org/10.1609/aaai.v38i17.29864

Issue

Section

AAAI Technical Track on Natural Language Processing II