Humor Knowledge Enriched Transformer for Understanding Multimodal Humor

Md Kamrul Hasan; Sangwu Lee; Wasifur Rahman; Amir Zadeh; Rada Mihalcea; Louis-Philippe Morency; Ehsan Hoque

doi:10.1609/aaai.v35i14.17534

Authors

Md Kamrul Hasan Department of Computer Science, University of Rochester, USA
Sangwu Lee Department of Computer Science, University of Rochester, USA
Wasifur Rahman Department of Computer Science, University of Rochester, USA
Amir Zadeh Language Technologies Institute, CMU, USA
Rada Mihalcea Computer Science & Engineering, University of Michigan, USA
Louis-Philippe Morency Language Technologies Institute, CMU, USA
Ehsan Hoque Department of Computer Science, University of Rochester, USA

DOI:

https://doi.org/10.1609/aaai.v35i14.17534

Keywords:

Language Grounding & Multi-modal NLP

Abstract

Recognizing humor from a video utterance requires understanding the verbal and non-verbal components as well as incorporating the appropriate context and external knowledge. In this paper, we propose Humor Knowledge enriched Transformer (HKT) that can capture the gist of a multimodal humorous expression by integrating the preceding context and external knowledge. We incorporate humor centric external knowledge into the model by capturing the ambiguity and sentiment present in the language. We encode all the language, acoustic, vision, and humor centric features separately using Transformer based encoders, followed by a cross attention layer to exchange information among them. Our model achieves 77.36% and 79.41% accuracy in humorous punchline detection on UR-FUNNY and MUStaRD datasets -- achieving a new state-of-the-art on both datasets with the margin of 4.93% and 2.94% respectively. Furthermore, we demonstrate that our model can capture interpretable, humor-inducing patterns from all modalities.

Humor Knowledge Enriched Transformer for Understanding Multimodal Humor

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription