Instance Mining with Class Feature Banks for Weakly Supervised Object Detection
Keywords:Object Detection & Categorization
AbstractRecent progress on weakly supervised object detection (WSOD) is characterized by formulating WSOD as a Multiple Instance Learning (MIL) problem and taking online refinement with the selected region proposals from MIL. However, MIL inclines to select the most discriminative part rather than the entire instance as the top-scoring region proposals, which leads to weak localization capability for weakly supervised object detectors. We attribute this problem to the limited intra-class diversity within a single image. Specifically, due to the lack of annotated bounding boxes, the network tends to focus on the most common parts of each class and neglect the diverse parts of objects. To solve the problem, we introduce a novel Instance Mining with Class Feature Banks (IM-CFB) framework, which includes a Class Feature Banks (CFB) module and a Feature Guided Instance Mining (FGIM) algorithm. Concretely, Class Feature Banks (CFB) consist of sub-banks for each class, which are utilized to collect diversity information from a broader view. At the training stage, the RoI features of reliable region proposals are recorded and updated in the CFB. Then, FGIM leverages the features recorded in the CFB to ameliorate the region proposal selection of the MIL branch. Extensive experiments conducted on two publicly available datasets, Pascal VOC 2007 and 2012, demonstrate the effectiveness of our method. More remarkably, our method achieves 54.3% on mAP and 70.7% on CorLoc on Pascal VOC 2007. When further re-trained by a Fast-RCNN detector, we obtain to-date the best reported mAP and CorLoc of 55.8% and 72.2%, respectively.
How to Cite
Yin, Y., Deng, J., Zhou, W., & Li, H. (2021). Instance Mining with Class Feature Banks for Weakly Supervised Object Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 35(4), 3190-3198. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/16429
AAAI Technical Track on Computer Vision III