Ioannis Mitliagkas on studying momentum dynamics for faster training, better scaling and easier tuning
March 12, 2019
This talk revolves around Polyak’s momentum gradient descent method, also known as ‘momentum’. Its stochastic version, momentum stochastic gradient descent (SGD), is one of the most commonly used optimization methods in deep learning. Throughout the talk we will study a number of important properties of this versatile method, and see how this understanding can be used to engineer better deep learning systems.