Fair Facial Attribute Recognition via Group-Decoupled Vision Transformer with Mask-Guided Correlation Suppression

Authors

  • Huichang Huang Xiamen University of Technology
  • Kunchi Li Xiamen University of Technology
  • Si Chen Xiamen University of Technology
  • Da-Han Wang Xiamen University of Technology

DOI:

https://doi.org/10.1609/aaai.v40i7.37415

Abstract

Facial Attribute Recognition (FAR) holds significant potential for wide-ranging applications. However, traditionally trained FAR models exhibit unfairness, largely due to data bias—where certain sensitive attributes correlate statistically with target attributes. To address this, we propose a group-attention mechanism: first, each image is categorized into subgroups (e.g., Male/Female&short hair, Male/Female&long hair). Within the attention mechanism, distinct Query parameters are used for each group, with shared Key and Value parameters. As group-specific Query parameters are trained on subgrouped data, the noted bias is effectively mitigated. Consequently, integrating this Group-Attention into Vision Transformer (ViT) yields our novel Group-Decoupled ViT (GD-ViT) model. Moreover, to further attenuate the statistical correlation between sensitive and target attributes, we propose a Mask-Guided Correlation Suppression learning strategy. Specifically, in Stage 1, it first leverages a min-max dual-loss optimization strategy to train GD-ViT in capturing key regions related to sensitive attributes yet irrelevant to target attributes. Then, in Stage 2, it trains another GD-ViT by masking sensitive regions identified in Stage 1, fusing the masked output (as intermediate input) with the model’s intermediate outputs. This weakens regions associated with sensitive attributes while enhancing others, suppressing the learning of key features related to sensitive attributes. Consequently, it encourages the model to focus more on intrinsic target attribute regions and balances the learning process between the sensitive attribute and the target attribute. Extensive experiments demonstrate that our method achieves superior performance across three benchmark datasets for fair facial attribute recognition.

Downloads

Published

2026-03-14

How to Cite

Huang, H., Li, K., Chen, S., & Wang, D.-H. (2026). Fair Facial Attribute Recognition via Group-Decoupled Vision Transformer with Mask-Guided Correlation Suppression. Proceedings of the AAAI Conference on Artificial Intelligence, 40(7), 5022–5030. https://doi.org/10.1609/aaai.v40i7.37415

Issue

Section

AAAI Technical Track on Computer Vision IV