Dictionary Learning Inspired Deep Network for Scene Recognition
Scene recognition remains one of the most challenging problems in image understanding. With the help of fully connected layers (FCL) and rectified linear units (ReLu), deep networks can extract the moderately sparse and discriminative feature representation required for scene recognition. However, few methods consider exploiting a sparsity model for learning the feature representation in order to provide enhanced discriminative capability. In this paper, we replace the conventional FCL and ReLu with a new dictionary learning layer, that is composed of a finite number of recurrent units to simultaneously enhance the sparse representation and discriminative abilities of features via the determination of optimal dictionaries. In addition, with the help of the structure of the dictionary, we propose a new label discriminative regressor to boost the discrimination ability. We also propose new constraints to prevent overfitting by incorporating the advantage of the Mahalanobis and Euclidean distances to balance the recognition accuracy and generalization performance. Our proposed approach is evaluated using various scene datasets and shows superior performance to many state-of-the-art approaches.