HKAFER: Achieve Visual Parameter-Efficient Fine-Tuning via Heterogeneous Kronecker Adaptation for Facial Expression Recognition

Yu Gao; Haoyu Ji; Zhiyong Wang; Wenze Huang; Qian Dong; Zhihao Yang; Xueting Liu; Weihong Ren; Honghai Liu

doi:10.1609/aaai.v40i6.42416

Authors

Yu Gao Harbin Institute of Technology, Shenzhen
Haoyu Ji Harbin Institute of Technology, Shenzhen
Zhiyong Wang Harbin Institute of Technology, Shenzhen
Wenze Huang Harbin Institute of Technology, Shenzhen
Qian Dong Harbin Institute of Technology, Shenzhen
Zhihao Yang Harbin Institute of Technology, Shenzhen
Xueting Liu Southern University of Science and Technology
Weihong Ren Harbin Institute of Technology, Shenzhen
Honghai Liu Harbin Institute of Technology, Shenzhen

DOI:

https://doi.org/10.1609/aaai.v40i6.42416

Abstract

Facial Expression Recognition (FER) seeks to classify affective states from facial images, which remains a challenging problem due to variations in real-world conditions. FER task becomes particularly complex when handling unconstrained environments characterized by partial occlusions, different head poses, and so on. To address the above problems, current approaches rely on extensive learnable parameters and complex model architectures, which inevitably lead to overfitting and cause the FER model to focus on non-discriminative facial regions. In this work, we propose an HKAFER model that can adaptively enhance visual expression representations through efficiently fine-tuning the image encoder in large Visual Foundation Models (VFMs) and Vision-Language Models (VLMs). Specifically, we establish Heterogeneous Kronecker Adaptation (HeKA), which consists of multi-scale adapters based on Kronecker product in a parallel manner, offering significantly diverse subspaces to learn the incremental matrices. Besides, we also propose Dual-Branch Interactive Router (DBIR) to dynamically assign the weights of adapters, which promotes collaboration and information flow among them. In this way, our HKAFER can effectively capture robust spatial features and the regional associations. Experimental results demonstrate that our proposed model not only outperforms state-of-the-art methods on several FER benchmarks but also uses significantly fewer trainable parameters.

HKAFER: Achieve Visual Parameter-Efficient Fine-Tuning via Heterogeneous Kronecker Adaptation for Facial Expression Recognition

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information