Header menu link for other important links
X
ADINE: An adaptive momentum method for stochastic gradient descent
V. Srinivasan, A.R. Sankar,
Published in Association for Computing Machinery
2018
Pages: 249 - 256
Abstract
Momentum based learning algorithms are one of the most successful learning algorithms in both convex and non-convex optimization. Two major momentum based techniques that achieved tremendous success in gradient-based optimization are Polyak’s heavy ball method and Nesterov’s accelerated gradient. A crucial step in all the momentum based methods is the choice of the momentum parameter m, which is always set to less than 1. Although the choice of m < 1 is justified only under very strong theoretical assumptions, it works well in practice. In this paper we propose a new momentum based method ADINE, which relaxes the constraint of m < 1 and allows the learning algorithm to use adaptive higher momentum. We motivate our relaxation on m by experimentally verifying that a higher momentum (? 1) can help escape saddles much faster. ADINE uses this intuition and helps weigh the previous updates more, inherently setting the momentum parameter to be greater in the optimization method. To the best of our knowledge, the idea of increased momentum is first of its kind and is very novel. We evaluate this on deep neural networks and show that ADINE helps the learning algorithm to converge much faster without compromising on the generalization error. © 2018 Association for Computing Machinery.
About the journal
JournalData powered by TypesetACM International Conference Proceeding Series
PublisherData powered by TypesetAssociation for Computing Machinery