FAS-Net: Construct Effective Features Adaptively for Multi-Scale Object Detection
Feature pyramid is the mainstream method for multi-scale object detection. In most detectors with feature pyramid, each proposal is predicted based on feature grids pooled from only one feature level, which is assigned heuristically. Recent studies report that the feature representation extracted using this method is sub-optimal, since they ignore the valid information exists on other unselected layers of the feature pyramid. To address this issue, researchers present to fuse valid information across all feature levels. However, these methods can be further improved: the feature fusion strategies, which use common operation (element-wise max or sum) in most detectors, should be replaced by a more flexible way. In this work, a novel method called feature adaptive selection subnetwork (FAS-Net) is proposed to construct effective features for detecting objects of different scales. Particularly, its adaption consists of two level: global attention and local adaptive selection. First, we model the global context of each feature map with global attention based feature selection module (GAFSM), which can strengthen the effective features across each layer adaptively. Then we extract the features of each region of interest (RoI) on the entire feature pyramid to construct a RoI feature pyramid. Finally, the RoI feature pyramid is sent to the feature adaptive selection module (FASM) to integrate the strengthened features according to the input adaptively. Our FAS-Net can be easily extended to other two-stage object detectors with feature pyramid, and supports to analyze the importance of different feature levels for multi-scale objects quantitatively. Besides, FAS-Net can also be further applied to instance segmentation task and get consistent improvements. Experiments on PASCAL07/12 and MSCOCO17 demonstrate the effectiveness and generalization of the proposed method.