Evaluating Robustness of Vision Transformers on Imbalanced Datasets (Student Abstract)

Kevin Li; Rahul Duggal; Duen Horng Chau

doi:10.1609/aaai.v37i13.26986

Authors

Kevin Li Georgia Institute of Technology
Rahul Duggal Georgia Institute of Technology
Duen Horng Chau Georgia Institute of Technology

DOI:

https://doi.org/10.1609/aaai.v37i13.26986

Keywords:

Long-tailed Class Distribution/imbalanced Class Dataset, Vision Transformer, Loss-reweighting

Abstract

Data in the real world is commonly imbalanced across classes. Training neural networks on imbalanced datasets often leads to poor performance on rare classes. Existing work in this area has primarily focused on Convolution Neural Networks (CNN), which are increasingly being replaced by Self-Attention-based Vision Transformers (ViT). Fundamentally, ViTs differ from CNNs in that they offer the flexibility in learning the appropriate inductive bias conducive to improving performance. This work is among the first to evaluate the performance of ViTs under class imbalance. We find that accuracy degradation in the presence of class imbalance is much more prominent in ViTs compared to CNNs. This degradation can be partially mitigated through loss reweighting - a popular strategy that increases the loss contributed by rare classes. We investigate the impact of loss reweighting on different components of a ViT, namely, the patch embedding, self-attention backbone, and linear classifier. Our ongoing investigations reveal that loss reweighting impacts mostly the linear classifier and self-attention backbone while having a small and negligible effect on the embedding layer.

Evaluating Robustness of Vision Transformers on Imbalanced Datasets (Student Abstract)

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information