From Scene to Object: Enhancing Open-Vocabulary Object Detection via Foreground-Background Context Reasoning

Yanqi Li; Jianwei Niu; Ningbo Gu; Tao Ren

doi:10.1609/aaai.v40i8.37590

Authors

Yanqi Li Beihang University Zhongguancun Laboratory
Jianwei Niu Beihang University Zhongguancun Laboratory Hangzhou Innovation Institute of Beihang University
Ningbo Gu Beihang University Hangzhou Innovation Institute of Beihang University
Tao Ren University of the Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v40i8.37590

Abstract

Open-Vocabulary Object Detection (OVOD) aims to detect both known and novel categories in complex visual scenes, surpassing the limitations of conventional closed-set detectors. Recent advances in vision-language models (VLMs) like CLIP have enabled zero-shot recognition by aligning visual features with large-scale textual embeddings. However, current OVOD approaches often fall short by overlooking critical contextual and semantic cues necessary for discovering a broader range of novel objects. To address this, we propose BFDet, a scene-to-object reasoning framework that leverages the complementary strengths of Large Language Models (LLMs) and VLMs. BFDet introduces a novel scene-to-object reasoning mechanism grounded in foreground-background context interaction. It first uses high-confidence objects to infer the scene-level background. This scene background then guides the discovery of foreground objects by prompting an LLM to generate scene-sensitive novel object candidates. These candidates are subsequently verified through cross-modal alignment and used as high-quality pseudo-labels to enrich detector training. Designed as a plug-and-play module, BFDet integrates seamlessly into existing detection pipelines and consistently improves performance on novel categories across COCO and LVIS benchmarks.

From Scene to Object: Enhancing Open-Vocabulary Object Detection via Foreground-Background Context Reasoning

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information