TY - JOUR AU - Zou, Xinyi AU - Yan, Yan AU - Xue, Jing-Hao AU - Chen, Si AU - Wang, Hanzi PY - 2022/06/28 Y2 - 2024/03/28 TI - When Facial Expression Recognition Meets Few-Shot Learning: A Joint and Alternate Learning Framework JF - Proceedings of the AAAI Conference on Artificial Intelligence JA - AAAI VL - 36 IS - 5 SE - AAAI Technical Track on Humans and AI DO - 10.1609/aaai.v36i5.20474 UR - https://ojs.aaai.org/index.php/AAAI/article/view/20474 SP - 5367-5375 AB - Human emotions involve basic and compound facial expressions. However, current research on facial expression recognition (FER) mainly focuses on basic expressions, and thus fails to address the diversity of human emotions in practical scenarios. Meanwhile, existing work on compound FER relies heavily on abundant labeled compound expression training data, which are often laboriously collected under the professional instruction of psychology. In this paper, we study compound FER in the cross-domain few-shot learning setting, where only a few images of novel classes from the target domain are required as a reference. In particular, we aim to identify unseen compound expressions with the model trained on easily accessible basic expression datasets. To alleviate the problem of limited base classes in our FER task, we propose a novel Emotion Guided Similarity Network (EGS-Net), consisting of an emotion branch and a similarity branch, based on a two-stage learning framework. Specifically, in the first stage, the similarity branch is jointly trained with the emotion branch in a multi-task fashion. With the regularization of the emotion branch, we prevent the similarity branch from overfitting to sampled base classes that are highly overlapped across different episodes. In the second stage, the emotion branch and the similarity branch play a “two-student game” to alternately learn from each other, thereby further improving the inference ability of the similarity branch on unseen compound expressions. Experimental results on both in-the-lab and in-the-wild compound expression datasets demonstrate the superiority of our proposed method against several state-of-the-art methods. ER -