HGLTR: Hierarchical Knowledge Injection for Calibrating Pre-trained Models in Long-Tail Recognition

Jinpeng Zheng; Shao-Yuan Li; Gan Xu; Wenhai Wan; Zijian Tao; Songcan Chen; Kangkan Wang

doi:10.1609/aaai.v40i34.40116

Authors

Jinpeng Zheng MIIT Key Laboratory of Pattern Analysis and Machine Intelligence College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics
Shao-Yuan Li MIIT Key Laboratory of Pattern Analysis and Machine Intelligence College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics State Key Lab. for Novel Software Technology, Nanjing University
Gan Xu College of Information Engineering, Zhejiang University of Technology
Wenhai Wan School of Computer Science and Technology, Huazhong University of Science and Technology
Zijian Tao MIIT Key Laboratory of Pattern Analysis and Machine Intelligence College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics
Songcan Chen MIIT Key Laboratory of Pattern Analysis and Machine Intelligence College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics
Kangkan Wang School of Computer Science and Engineering, Nanjing University of Science and Technology

DOI:

https://doi.org/10.1609/aaai.v40i34.40116

Abstract

Long-tail recognition remains challenging for pre-trained foundation models like CLIP, which often suffer from performance degradation under imbalanced data. This stems not only from the overfitting/underfitting issues during fine-tuning but, more fundamentally, from the inherent bias inherited from the long-tail distribution of their massive pre-training datasets. To address this, we propose HGLTR (Hierarchy-Guided Long-Tail Recognition), a novel framework that calibrates pre-trained models by injecting objective class hierarchy knowledge. We argue that the semantic proximity defined by a hierarchy provides a robust, data-independent prior to counteract model bias. Our method is specifically designed for vision-language models' dual-modality architecture. At the feature level, we align image embeddings with a hierarchy-guided text similarity structure. At the classifier level, we employ a distillation loss to regularize predictions using soft labels derived from the hierarchy. This dual-level injection effectively transfers knowledge from head to tail classes. Experiments on ImageNet-LT, Places-LT, and iNaturalist 2018 demonstrate that HGLTR achieves state-of-the-art performance, particularly in tail-classes accuracy, highlighting the importance of leveraging structural priors to calibrate foundation models for real-world data.

HGLTR: Hierarchical Knowledge Injection for Calibrating Pre-trained Models in Long-Tail Recognition

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information