Fair Facial Attribute Recognition via Group-Decoupled Vision Transformer with Mask-Guided Correlation Suppression

Huichang Huang; Kunchi Li; Si Chen; Da-Han Wang

doi:10.1609/aaai.v40i7.37415

Authors

Huichang Huang Xiamen University of Technology
Kunchi Li Xiamen University of Technology
Si Chen Xiamen University of Technology
Da-Han Wang Xiamen University of Technology

DOI:

https://doi.org/10.1609/aaai.v40i7.37415

Abstract

Facial Attribute Recognition (FAR) holds significant potential for wide-ranging applications. However, traditionally trained FAR models exhibit unfairness, largely due to data bias—where certain sensitive attributes correlate statistically with target attributes. To address this, we propose a group-attention mechanism: first, each image is categorized into subgroups (e.g., Male/Female&short hair, Male/Female&long hair). Within the attention mechanism, distinct Query parameters are used for each group, with shared Key and Value parameters. As group-specific Query parameters are trained on subgrouped data, the noted bias is effectively mitigated. Consequently, integrating this Group-Attention into Vision Transformer (ViT) yields our novel Group-Decoupled ViT (GD-ViT) model. Moreover, to further attenuate the statistical correlation between sensitive and target attributes, we propose a Mask-Guided Correlation Suppression learning strategy. Specifically, in Stage 1, it first leverages a min-max dual-loss optimization strategy to train GD-ViT in capturing key regions related to sensitive attributes yet irrelevant to target attributes. Then, in Stage 2, it trains another GD-ViT by masking sensitive regions identified in Stage 1, fusing the masked output (as intermediate input) with the model’s intermediate outputs. This weakens regions associated with sensitive attributes while enhancing others, suppressing the learning of key features related to sensitive attributes. Consequently, it encourages the model to focus more on intrinsic target attribute regions and balances the learning process between the sensitive attribute and the target attribute. Extensive experiments demonstrate that our method achieves superior performance across three benchmark datasets for fair facial attribute recognition.

Fair Facial Attribute Recognition via Group-Decoupled Vision Transformer with Mask-Guided Correlation Suppression

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information