SA-BNN: State-Aware Binary Neural Network

Authors

  • Chunlei Liu Beihang University The University of Adelaide
  • Peng Chen The University of Adelaide
  • Bohan Zhuang Monash University
  • Chunhua Shen The University of Adelaide
  • Baochang Zhang Beihang University
  • Wenrui Ding Beihang University

DOI:

https://doi.org/10.1609/aaai.v35i3.16306

Keywords:

Other Foundations of Computer Vision, Learning on the Edge & Model Compression

Abstract

Binary Neural Networks (BNNs) have received significant attention due to the memory and computation efficiency recently. However, the considerable accuracy gap between BNNs and their full-precision counterparts hinders BNNs to be deployed to resource-constrained platforms. One of the main reasons for the performance gap can be attributed to the frequent weight flip, which is caused by the misleading weight update in BNNs. To address this issue, we propose a state-aware binary neural network (SA-BNN) equipped with the well designed state-aware gradient. Our SA-BNN is inspired by the observation that the frequent weight flip is more likely to occur, when the gradient magnitude for all quantization states {-1,1} is identical. Accordingly, we propose to employ independent gradient coefficients for different states when updating the weights. Furthermore, we also analyze the effectiveness of the state-aware gradient on suppressing the frequent weight flip problem. Experiments on ImageNet show that the proposed SA-BNN outperforms the current state-of-the-arts (e.g., Bi-Real Net) by more than 3% when using a ResNet architecture. Specifically, we achieve 61.7%, 65.5% and 68.7% Top-1 accuracy with ResNet-18, ResNet-34 and ResNet-50 on ImageNet, respectively.

Downloads

Published

2021-05-18

How to Cite

Liu, C., Chen, P., Zhuang, B., Shen, C., Zhang, B., & Ding, W. (2021). SA-BNN: State-Aware Binary Neural Network. Proceedings of the AAAI Conference on Artificial Intelligence, 35(3), 2091-2099. https://doi.org/10.1609/aaai.v35i3.16306

Issue

Section

AAAI Technical Track on Computer Vision II