CMNet: Contrastive Magnification Network for Micro-Expression Recognition

Authors

  • Mengting Wei Key Laboratory of Child Development and Learning Science of Ministry of Education School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
  • Xingxun Jiang Key Laboratory of Child Development and Learning Science of Ministry of Education School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
  • Wenming Zheng Key Laboratory of Child Development and Learning Science of Ministry of Education
  • Yuan Zong Key Laboratory of Child Development and Learning Science of Ministry of Education
  • Cheng Lu Key Laboratory of Child Development and Learning Science of Ministry of Education School of Information Science and Engineering, Southeast University, Nanjing, China
  • Jiateng Liu Key Laboratory of Child Development and Learning Science of Ministry of Education School of Biological Science and Medical Engineering, Southeast University, Nanjing, China

DOI:

https://doi.org/10.1609/aaai.v37i1.25083

Keywords:

CMS: Affective Computing, CMS: Applications, CV: Applications, CV: Video Understanding & Activity Analysis

Abstract

Micro-Expression Recognition (MER) is challenging because the Micro-Expressions' (ME) motion is too weak to distinguish. This hurdle can be tackled by enhancing intensity for a more accurate acquisition of movements. However, existing magnification strategies tend to use the features of facial images that include not only intensity clues as intensity features, leading to the intensity representation deficient of credibility. In addition, the intensity variation over time, which is crucial for encoding movements, is also neglected. To this end, we provide a reliable scheme to extract intensity clues while considering their variation on the time scale. First, we devise an Intensity Distillation (ID) loss to acquire the intensity clues by contrasting the difference between frames, given that the difference in the same video lies only in the intensity. Then, the intensity clues are calibrated to follow the trend of the original video. Specifically, due to the lack of truth intensity annotation of the original video, we build the intensity tendency by setting each intensity vacancy an uncertain value, which guides the extracted intensity clues to converge towards this trend rather some fixed values. A Wilcoxon rank sum test (Wrst) method is enforced to implement the calibration. Experimental results on three public ME databases i.e. CASME II, SAMM, and SMIC-HS validate the superiority against state-of-the-art methods.

Downloads

Published

2023-06-26

How to Cite

Wei, M., Jiang, X., Zheng, W., Zong, Y., Lu, C., & Liu, J. (2023). CMNet: Contrastive Magnification Network for Micro-Expression Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 37(1), 119-127. https://doi.org/10.1609/aaai.v37i1.25083

Issue

Section

AAAI Technical Track on Cognitive Modeling & Cognitive Systems