RA-SGG: Retrieval-Augmented Scene Graph Generation Framework via Multi-Prototype Learning

Kanghoon Yoon; Kibum Kim; Jaehyeong Jeon; Yeonjun In; Donghyun Kim; Chanyoung Park

doi:10.1609/aaai.v39i9.33036

Authors

Kanghoon Yoon Department of Industrial and Systems Engineering, Korea Advanced Institute of Science and Technology
Kibum Kim Department of Industrial and Systems Engineering, Korea Advanced Institute of Science and Technology
Jaehyeong Jeon Department of Industrial and Systems Engineering, Korea Advanced Institute of Science and Technology
Yeonjun In Department of Industrial and Systems Engineering, Korea Advanced Institute of Science and Technology
Donghyun Kim Department of Artificial Intelligence, Korea University
Chanyoung Park Department of Industrial and Systems Engineering, Korea Advanced Institute of Science and Technology

DOI:

https://doi.org/10.1609/aaai.v39i9.33036

Abstract

Scene Graph Generation (SGG) research has suffered from two fundamental challenges: the long-tailed predicate distribution and semantic ambiguity between predicates. These challenges lead to a bias towards head predicates in SGG models, favoring dominant general predicates while overlooking fine-grained predicates. In this paper, we address the challenges of SGG by framing it as multi-label classification problem with partial annotation, where relevant labels of fine-grained predicates are missing. Under the new frame, we propose Retrieval-Augmented Scene Graph Generation (RA-SGG), which identifies potential instances to be multilabeled and enriches the single-label with multi-labels that are semantically similar to the original label by retrieving relevant samples from our established memory bank. Based on augmented relations (i.e., discovered multi-labels), we apply multi-prototype learning to train our SGG model. Several comprehensive experiments have demonstrated that RASGG outperforms state-of-the-art baselines by up to 3.6% on VG and 5.9% on GQA, particularly in terms of F@K, showing that RA-SGG effectively alleviates the issue of biased prediction caused by the long-tailed distribution and semantic ambiguity of predicates.

RA-SGG: Retrieval-Augmented Scene Graph Generation Framework via Multi-Prototype Learning

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information