Humor Knowledge Enriched Transformer for Understanding Multimodal Humor

Authors

  • Md Kamrul Hasan Department of Computer Science, University of Rochester, USA
  • Sangwu Lee Department of Computer Science, University of Rochester, USA
  • Wasifur Rahman Department of Computer Science, University of Rochester, USA
  • Amir Zadeh Language Technologies Institute, CMU, USA
  • Rada Mihalcea Computer Science & Engineering, University of Michigan, USA
  • Louis-Philippe Morency Language Technologies Institute, CMU, USA
  • Ehsan Hoque Department of Computer Science, University of Rochester, USA

Keywords:

Language Grounding & Multi-modal NLP

Abstract

Recognizing humor from a video utterance requires understanding the verbal and non-verbal components as well as incorporating the appropriate context and external knowledge. In this paper, we propose Humor Knowledge enriched Transformer (HKT) that can capture the gist of a multimodal humorous expression by integrating the preceding context and external knowledge. We incorporate humor centric external knowledge into the model by capturing the ambiguity and sentiment present in the language. We encode all the language, acoustic, vision, and humor centric features separately using Transformer based encoders, followed by a cross attention layer to exchange information among them. Our model achieves 77.36% and 79.41% accuracy in humorous punchline detection on UR-FUNNY and MUStaRD datasets -- achieving a new state-of-the-art on both datasets with the margin of 4.93% and 2.94% respectively. Furthermore, we demonstrate that our model can capture interpretable, humor-inducing patterns from all modalities.

Downloads

Published

2021-05-18

How to Cite

Hasan, M. K., Lee, S., Rahman, W., Zadeh, A., Mihalcea, R., Morency, L.-P., & Hoque, E. (2021). Humor Knowledge Enriched Transformer for Understanding Multimodal Humor. Proceedings of the AAAI Conference on Artificial Intelligence, 35(14), 12972-12980. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/17534

Issue

Section

AAAI Technical Track on Speech and Natural Language Processing I