CMNet: Contrastive Magnification Network for Micro-Expression Recognition

Mengting Wei; Xingxun Jiang; Wenming Zheng; Yuan Zong; Cheng Lu; Jiateng Liu

doi:10.1609/aaai.v37i1.25083

Authors

Mengting Wei Key Laboratory of Child Development and Learning Science of Ministry of Education School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
Xingxun Jiang Key Laboratory of Child Development and Learning Science of Ministry of Education School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
Wenming Zheng Key Laboratory of Child Development and Learning Science of Ministry of Education
Yuan Zong Key Laboratory of Child Development and Learning Science of Ministry of Education
Cheng Lu Key Laboratory of Child Development and Learning Science of Ministry of Education School of Information Science and Engineering, Southeast University, Nanjing, China
Jiateng Liu Key Laboratory of Child Development and Learning Science of Ministry of Education School of Biological Science and Medical Engineering, Southeast University, Nanjing, China

DOI:

https://doi.org/10.1609/aaai.v37i1.25083

Keywords:

CMS: Affective Computing, CMS: Applications, CV: Applications, CV: Video Understanding & Activity Analysis

Abstract

Micro-Expression Recognition (MER) is challenging because the Micro-Expressions' (ME) motion is too weak to distinguish. This hurdle can be tackled by enhancing intensity for a more accurate acquisition of movements. However, existing magnification strategies tend to use the features of facial images that include not only intensity clues as intensity features, leading to the intensity representation deficient of credibility. In addition, the intensity variation over time, which is crucial for encoding movements, is also neglected. To this end, we provide a reliable scheme to extract intensity clues while considering their variation on the time scale. First, we devise an Intensity Distillation (ID) loss to acquire the intensity clues by contrasting the difference between frames, given that the difference in the same video lies only in the intensity. Then, the intensity clues are calibrated to follow the trend of the original video. Specifically, due to the lack of truth intensity annotation of the original video, we build the intensity tendency by setting each intensity vacancy an uncertain value, which guides the extracted intensity clues to converge towards this trend rather some fixed values. A Wilcoxon rank sum test (Wrst) method is enforced to implement the calibration. Experimental results on three public ME databases i.e. CASME II, SAMM, and SMIC-HS validate the superiority against state-of-the-art methods.

CMNet: Contrastive Magnification Network for Micro-Expression Recognition

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription