Volume 28, Issue 5
Dying ReLU and Initialization: Theory and Numerical Examples

Lu Lu, Yeonjong Shin, Yanhui SuGeorge Em Karniadakis

Commun. Comput. Phys., 28 (2020), pp. 1671-1706.

Published online: 2020-11

Preview Purchase PDF 384 9157
Export citation
  • Abstract

The dying ReLU refers to the problem when ReLU neurons become inactive and only output 0 for any input. There are many empirical and heuristic explanations of why ReLU neurons die. However, little is known about its theoretical analysis. In this paper, we rigorously prove that a deep ReLU network will eventually die in probability as the depth goes to infinite. Several methods have been proposed to alleviate the dying ReLU. Perhaps, one of the simplest treatments is to modify the initialization procedure. One common way of initializing weights and biases uses symmetric probability distributions, which suffers from the dying ReLU. We thus propose a new initialization procedure, namely, a randomized asymmetric initialization. We show that the new initialization can effectively prevent the dying ReLU. All parameters required for the new initialization are theoretically designed. Numerical examples are provided to demonstrate the effectiveness of the new initialization procedure.

  • Keywords

Neural network, Dying ReLU, Vanishing/Exploding gradient, Randomized asymmetric initialization.

  • AMS Subject Headings

60J05, 62M45, 68U99

  • Copyright

COPYRIGHT: © Global Science Press

  • Email address
  • BibTex
  • RIS
  • TXT
@Article{CiCP-28-1671, author = {Lu , Lu and Shin , Yeonjong and Su , Yanhui and Em Karniadakis , George}, title = {Dying ReLU and Initialization: Theory and Numerical Examples}, journal = {Communications in Computational Physics}, year = {2020}, volume = {28}, number = {5}, pages = {1671--1706}, abstract = {

The dying ReLU refers to the problem when ReLU neurons become inactive and only output 0 for any input. There are many empirical and heuristic explanations of why ReLU neurons die. However, little is known about its theoretical analysis. In this paper, we rigorously prove that a deep ReLU network will eventually die in probability as the depth goes to infinite. Several methods have been proposed to alleviate the dying ReLU. Perhaps, one of the simplest treatments is to modify the initialization procedure. One common way of initializing weights and biases uses symmetric probability distributions, which suffers from the dying ReLU. We thus propose a new initialization procedure, namely, a randomized asymmetric initialization. We show that the new initialization can effectively prevent the dying ReLU. All parameters required for the new initialization are theoretically designed. Numerical examples are provided to demonstrate the effectiveness of the new initialization procedure.

}, issn = {1991-7120}, doi = {https://doi.org/10.4208/cicp.OA-2020-0165}, url = {http://global-sci.org/intro/article_detail/cicp/18393.html} }
TY - JOUR T1 - Dying ReLU and Initialization: Theory and Numerical Examples AU - Lu , Lu AU - Shin , Yeonjong AU - Su , Yanhui AU - Em Karniadakis , George JO - Communications in Computational Physics VL - 5 SP - 1671 EP - 1706 PY - 2020 DA - 2020/11 SN - 28 DO - http://doi.org/10.4208/cicp.OA-2020-0165 UR - https://global-sci.org/intro/article_detail/cicp/18393.html KW - Neural network, Dying ReLU, Vanishing/Exploding gradient, Randomized asymmetric initialization. AB -

The dying ReLU refers to the problem when ReLU neurons become inactive and only output 0 for any input. There are many empirical and heuristic explanations of why ReLU neurons die. However, little is known about its theoretical analysis. In this paper, we rigorously prove that a deep ReLU network will eventually die in probability as the depth goes to infinite. Several methods have been proposed to alleviate the dying ReLU. Perhaps, one of the simplest treatments is to modify the initialization procedure. One common way of initializing weights and biases uses symmetric probability distributions, which suffers from the dying ReLU. We thus propose a new initialization procedure, namely, a randomized asymmetric initialization. We show that the new initialization can effectively prevent the dying ReLU. All parameters required for the new initialization are theoretically designed. Numerical examples are provided to demonstrate the effectiveness of the new initialization procedure.

Lu Lu, Yeonjong Shin, Yanhui Su & George Em Karniadakis. (2020). Dying ReLU and Initialization: Theory and Numerical Examples. Communications in Computational Physics. 28 (5). 1671-1706. doi:10.4208/cicp.OA-2020-0165
Copy to clipboard
The citation has been copied to your clipboard