HGLTR: Hierarchical Knowledge Injection for Calibrating Pre-trained Models in Long-Tail Recognition

Authors

  • Jinpeng Zheng MIIT Key Laboratory of Pattern Analysis and Machine Intelligence College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics
  • Shao-Yuan Li MIIT Key Laboratory of Pattern Analysis and Machine Intelligence College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics State Key Lab. for Novel Software Technology, Nanjing University
  • Gan Xu College of Information Engineering, Zhejiang University of Technology
  • Wenhai Wan School of Computer Science and Technology, Huazhong University of Science and Technology
  • Zijian Tao MIIT Key Laboratory of Pattern Analysis and Machine Intelligence College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics
  • Songcan Chen MIIT Key Laboratory of Pattern Analysis and Machine Intelligence College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics
  • Kangkan Wang School of Computer Science and Engineering, Nanjing University of Science and Technology

DOI:

https://doi.org/10.1609/aaai.v40i34.40116

Abstract

Long-tail recognition remains challenging for pre-trained foundation models like CLIP, which often suffer from performance degradation under imbalanced data. This stems not only from the overfitting/underfitting issues during fine-tuning but, more fundamentally, from the inherent bias inherited from the long-tail distribution of their massive pre-training datasets. To address this, we propose HGLTR (Hierarchy-Guided Long-Tail Recognition), a novel framework that calibrates pre-trained models by injecting objective class hierarchy knowledge. We argue that the semantic proximity defined by a hierarchy provides a robust, data-independent prior to counteract model bias. Our method is specifically designed for vision-language models' dual-modality architecture. At the feature level, we align image embeddings with a hierarchy-guided text similarity structure. At the classifier level, we employ a distillation loss to regularize predictions using soft labels derived from the hierarchy. This dual-level injection effectively transfers knowledge from head to tail classes. Experiments on ImageNet-LT, Places-LT, and iNaturalist 2018 demonstrate that HGLTR achieves state-of-the-art performance, particularly in tail-classes accuracy, highlighting the importance of leveraging structural priors to calibrate foundation models for real-world data.

Downloads

Published

2026-03-14

How to Cite

Zheng, J., Li, S.-Y., Xu, G., Wan, W., Tao, Z., Chen, S., & Wang, K. (2026). HGLTR: Hierarchical Knowledge Injection for Calibrating Pre-trained Models in Long-Tail Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 40(34), 28821–28829. https://doi.org/10.1609/aaai.v40i34.40116

Issue

Section

AAAI Technical Track on Machine Learning XI