Adaptive Convolutional ReLUs
DOI:
https://doi.org/10.1609/aaai.v34i04.5805Abstract
Rectified linear units (ReLUs) are currently the most popular activation function used in neural networks. Although ReLUs can solve the gradient vanishing problem and accelerate training convergence, it suffers from the dying ReLU problem in which some neurons are never activated if the weights are not updated properly. In this work, we propose a novel activation function, known as the adaptive convolutional ReLU (ConvReLU), that can better mimic brain neuron activation behaviors and overcome the dying ReLU problem. With our novel parameter sharing scheme, ConvReLUs can be applied to convolution layers that allow each input neuron to be activated by different trainable thresholds without involving a large number of extra parameters. We employ the zero initialization scheme in ConvReLU to encourage trainable thresholds to be close to zero. Finally, we develop a partial replacement strategy that only replaces the ReLUs in the early layers of the network. This resolves the dying ReLU problem and retains sparse representations for linear classifiers. Experimental results demonstrate that our proposed ConvReLU has consistently better performance compared to ReLU, LeakyReLU, and PReLU. In addition, the partial replacement strategy is shown to be effective not only for our ConvReLU but also for LeakyReLU and PReLU.