TensorFlow第八步 Nesterov's accelerated gradient descent+L2 regularization

L2 regularization: C=C0+lambda/n/2*sum(w^2)python Nesterov's accelerated gradient descentdom http://www.javashuo.com/article/p-snymsggd-cn.html学习 看上面一张图仔细想一下就能够明白,Nesterov动量法和经典动量法的差异就在B点和C点梯度的不一样。spa
相关文章
相关标签/搜索