Learning Task-Aware Language-Image Representation for Class-Incremental Object Detection

Authors

  • Hongquan Zhang East China Normal University Chongqing Institute of East China Normal University
  • Bin-Bin Gao Tencent YouTu Lab
  • Yi Zeng Tencent YouTu Lab
  • Xudong Tian East China Normal University Chongqing Institute of East China Normal University
  • Xin Tan East China Normal University Chongqing Institute of East China Normal University
  • Zhizhong Zhang East China Normal University Chongqing Institute of East China Normal University
  • Yanyun Qu Xiamen University
  • Jun Liu Tencent YouTu Lab
  • Yuan Xie East China Normal University Chongqing Institute of East China Normal University

DOI:

https://doi.org/10.1609/aaai.v38i7.28537

Keywords:

CV: Object Detection & Categorization, ML: Life-Long and Continual Learning

Abstract

Class-incremental object detection (CIOD) is a real-world desired capability, requiring an object detector to continuously adapt to new tasks without forgetting learned ones, with the main challenge being catastrophic forgetting. Many methods based on distillation and replay have been proposed to alleviate this problem. However, they typically learn on a pure visual backbone, neglecting the powerful representation capabilities of textual cues, which to some extent limits their performance. In this paper, we propose task-aware language-image representation to mitigate catastrophic forgetting, introducing a new paradigm for language-image-based CIOD. First of all, we demonstrate the significant advantage of language-image detectors in mitigating catastrophic forgetting. Secondly, we propose a learning task-aware language-image representation method that overcomes the existing drawback of directly utilizing the language-image detector for CIOD. More specifically, we learn the language-image representation of different tasks through an insulating approach in the training stage, while using the alignment scores produced by task-specific language-image representation in the inference stage. Through our proposed method, language-image detectors can be more practical for CIOD. We conduct extensive experiments on COCO 2017 and Pascal VOC 2007 and demonstrate that the proposed method achieves state-of-the-art results under the various CIOD settings.

Published

2024-03-24

How to Cite

Zhang, H., Gao, B.-B., Zeng, Y., Tian, X., Tan, X., Zhang, Z., Qu, Y., Liu, J., & Xie, Y. (2024). Learning Task-Aware Language-Image Representation for Class-Incremental Object Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 38(7), 7096-7104. https://doi.org/10.1609/aaai.v38i7.28537

Issue

Section

AAAI Technical Track on Computer Vision VI