Volume 2, Issue 2
A Local Convergence Theory for the Stochastic Gradient Descent Method in Non-Convex Optimization with Non-Isolated Local Minima

Taehee Ko & Xiantao Li

J. Mach. Learn. , 2 (2023), pp. 138-160.

Published online: 2023-06

Category: Theory

Export citation
  • Abstract

Loss functions with non-isolated minima have emerged in several machine-learning problems, creating a gap between theoretical predictions and observations in practice. In this paper, we formulate a new type of local convexity condition that is suitable to describe the behavior of loss functions near non-isolated minima. We show that such a condition is general enough to encompass many existing conditions. In addition, we study the local convergence of the stochastic gradient descent (SGD) method under this mild condition by adopting the notion of stochastic stability. In the convergence analysis, we establish concentration inequalities for the iterates in SGD, which can be used to interpret the empirical observation from some practical training results.

  • General Summary

A huge success in modern machine learning applications is attributed to the feasibility of non-convex optimization via effective optimization techniques such as the stochastic gradient descent (SGD). But at the theoretical level, it is still not clear why the SGD works for the task of non-convex optimization, especially when non-isolated local minima are present. In this paper, we present a local convergence analysis of the SGD in non-convex optimization with non-isolated minima. We propose new non-convex conditions that characterize a large class of non-convex loss functions with non-isolated minima. Under those conditions, we prove the concentration inequalities of the SGD that characterized the convergence behavior. Some numerical results confirm the existence of a deep neural network satisfying the weakest non-convex condition among the proposed ones. 

  • AMS Subject Headings

  • Copyright

COPYRIGHT: © Global Science Press

  • Email address
  • BibTex
  • RIS
  • TXT
@Article{JML-2-138, author = {Ko , Taehee and Li , Xiantao}, title = {A Local Convergence Theory for the Stochastic Gradient Descent Method in Non-Convex Optimization with Non-Isolated Local Minima}, journal = {Journal of Machine Learning}, year = {2023}, volume = {2}, number = {2}, pages = {138--160}, abstract = {

Loss functions with non-isolated minima have emerged in several machine-learning problems, creating a gap between theoretical predictions and observations in practice. In this paper, we formulate a new type of local convexity condition that is suitable to describe the behavior of loss functions near non-isolated minima. We show that such a condition is general enough to encompass many existing conditions. In addition, we study the local convergence of the stochastic gradient descent (SGD) method under this mild condition by adopting the notion of stochastic stability. In the convergence analysis, we establish concentration inequalities for the iterates in SGD, which can be used to interpret the empirical observation from some practical training results.

}, issn = {2790-2048}, doi = {https://doi.org/10.4208/jml.230106}, url = {http://global-sci.org/intro/article_detail/jml/21759.html} }
TY - JOUR T1 - A Local Convergence Theory for the Stochastic Gradient Descent Method in Non-Convex Optimization with Non-Isolated Local Minima AU - Ko , Taehee AU - Li , Xiantao JO - Journal of Machine Learning VL - 2 SP - 138 EP - 160 PY - 2023 DA - 2023/06 SN - 2 DO - http://doi.org/10.4208/jml.230106 UR - https://global-sci.org/intro/article_detail/jml/21759.html KW - Stochastic Gradient Descent, Stochastic Stability, Non-Convex Optimization, Local Convergence, Non-Isolated Minima. AB -

Loss functions with non-isolated minima have emerged in several machine-learning problems, creating a gap between theoretical predictions and observations in practice. In this paper, we formulate a new type of local convexity condition that is suitable to describe the behavior of loss functions near non-isolated minima. We show that such a condition is general enough to encompass many existing conditions. In addition, we study the local convergence of the stochastic gradient descent (SGD) method under this mild condition by adopting the notion of stochastic stability. In the convergence analysis, we establish concentration inequalities for the iterates in SGD, which can be used to interpret the empirical observation from some practical training results.

Taehee Ko & Xiantao Li. (2023). A Local Convergence Theory for the Stochastic Gradient Descent Method in Non-Convex Optimization with Non-Isolated Local Minima. Journal of Machine Learning. 2 (2). 138-160. doi:10.4208/jml.230106
Copy to clipboard
The citation has been copied to your clipboard