Machine learning in Madrid
Lunes, 22 de noviembre de 2021, 12-13h
Ponente: Anulf Jentzen (University of Münster)
Título: Convergence analysis for gradient descent optimization methods in the training of artificial neural networks
Abstract: Gradient descent (GD) type optimization methods are the standard instrument to train artificial neural networks (ANNs) with rectified linear unit (ReLU) activation. Despite the great success of GD type optimization methods in numerical simulations for the training of ANNs with ReLU activation, it remains -- even in the simplest situation of the plain vanilla GD optimization method with random initializations -- an open problem to prove (or disprove) the conjecture that the true risk of the GD optimization method converges in the training of ANNs with ReLU activation to zero as the width/depth of the ANNs, the number of independent random initializations, and the number of GD steps increase to infinity. In this talk we prove this conjecture in the situation where the probability distribution of the input data is equivalent to the continuous uniform distribution on a compact interval, where the probability distributions for the random initializations of the ANN parameters are standard normal distributions, and where the target function under consideration is continuous and piecewise affine linear.