Spotting the Unseen: Reciprocal Consensus Network Guided by Visual Archetypes
DOI:
https://doi.org/10.1609/aaai.v38i11.29149Keywords:
ML: Deep Learning Algorithms, APP: Humanities & Computational Social Science, CV: Applications, CV: Object Detection & CategorizationAbstract
Humans often require only a few visual archetypes to spot novel objects. Based on this observation, we present a strategy rooted in ``spotting the unseen" by establishing dense correspondences between potential query image regions and a visual archetype, and we propose the Consensus Network (CoNet). Our method leverages relational patterns intra and inter images via Auto-Correlation Representation (ACR) and Mutual-Correlation Representation (MCR). Within each image, the ACR module is capable of encoding both local self-similarity and global context simultaneously. Between the query and support images, the MCR module computes the cross-correlation across two image representations and introduces a reciprocal consistency constraint, which can incorporate to exclude outliers and enhance model robustness. To overcome the challenges of low-resource training data, particularly in one-shot learning scenarios, we incorporate an adaptive margin strategy to better handle diverse instances. The experimental results indicate the effectiveness of the proposed method across diverse domains such as object detection in natural scenes, and text spotting in both historical manuscripts and natural scenes, which demonstrates its sparkling generalization ability. Our code is available at: https://github.com/infinite-hwb/conet.Downloads
Published
2024-03-24
How to Cite
Hu, W., Zhan, H., Ma, X., Lu, Y., & Suen, C. Y. (2024). Spotting the Unseen: Reciprocal Consensus Network Guided by Visual Archetypes. Proceedings of the AAAI Conference on Artificial Intelligence, 38(11), 12556–12564. https://doi.org/10.1609/aaai.v38i11.29149
Issue
Section
AAAI Technical Track on Machine Learning II