OW-DAR: Dual-Granularity Adaptive Reconstruction-Error Modeling for Open-World Object Detection
DOI:
https://doi.org/10.1609/aaai.v40i14.38182Abstract
Open-world object detection (OWOD) aims to detect known and unknown objects in dynamic environments. However, only known classes are labeled during training, making it challenging for detectors to recognize unknown objects during inference. Existing methods typically rely on supervision from known categories, leading models to overconfidently misclassify visually similar unknowns as known, and dissimilar ones as background. This known-class prior bias limits the model’s ability to detect unknown objects. In this paper, we propose a novel method, OW-DAR, which enhances foreground-background separability through collaborative fine-grained and coarse-grained modeling. At the fine-grained level, we propose Fine-grained Masked Reconstruction (FMR), which randomly masks regions of the feature map to guide the reconstruction toward semantic structures, rather than memorizing low-level patterns. At the coarse-grained level, we propose Adaptive Region-based Error Aggregation (AREA), which operates on object proposals to aggregate reconstruction errors. This enables the model to attend to semantically ambiguous foreground-background boundaries while suppressing the influence of local outliers during optimization. Finally, we leverage robust reconstruction errors to perform unsupervised foreground-background modeling, enabling probabilistic estimation for potential unknown objects. We validate the effectiveness of OW-DAR on standard OWOD benchmark. Experimental results demonstrate that OW-DAR consistently outperforms existing state-of-the-art methods, achieving a +18.8 improvement in unknown object recall (U-Recall).Downloads
Published
2026-03-14
How to Cite
Ye, L., Xi, X., & Luo, R. (2026). OW-DAR: Dual-Granularity Adaptive Reconstruction-Error Modeling for Open-World Object Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 40(14), 11946–11954. https://doi.org/10.1609/aaai.v40i14.38182
Issue
Section
AAAI Technical Track on Computer Vision XI