从新发现梯度降低法--backtracking line search

时间 2019-11-20

标签从新发现梯度降低 backtracking line search 繁體版

原文原文链接

一直觉得梯度降低很简单的，结果最近发现我写的一个梯度降低特别慢，后来终于找到缘由：step size的选择很关键，有一种叫backtracking line search的梯度降低法就很是高效，该算法描述见下图：python

下面用一个简单的例子来展现，给一个无约束优化问题：算法

minimize y = (x-3)*(x-3)app

下面是python代码，比较两种方法优化

# -*- coding: cp936 -*-
#optimization test, y = (x-3)^2
from matplotlib.pyplot import figure, hold, plot, show, xlabel, ylabel, legend
def f(x):
        "The function we want to minimize"
        return (x-3)**2
def f_grad(x):
        "gradient of function f"
        return 2*(x-3)
x = 0
y = f(x)
err = 1.0
maxIter = 300
curve = [y]
it = 0
step = 0.1
#下面展现的是我以前用的方法，看上去貌似还挺合理的，可是很慢
while err > 1e-4 and it < maxIter:
    it += 1
    gradient = f_grad(x)
    new_x = x - gradient * step
    new_y = f(new_x)
    new_err = abs(new_y - y)
    if new_y > y: #若是出现divergence的迹象，就减少step size
        step *= 0.8
    err, x, y = new_err, new_x, new_y
    print 'err:', err, ', y:', y
    curve.append(y)

print 'iterations: ', it
figure(); hold(True); plot(curve, 'r*-')
xlabel('iterations'); ylabel('objective function value')

#下面展现的是backtracking line search，速度很快
x = 0
y = f(x)
err = 1.0
alpha = 0.25
beta = 0.8
curve2 = [y]
it = 0

while err > 1e-4 and it < maxIter:
    it += 1
    gradient = f_grad(x)
    step = 1.0
    while f(x - step * gradient) > y - alpha * step * gradient**2:
        step *= beta
    x = x - step * gradient
    new_y = f(x)
    err = y - new_y
    y = new_y
    print 'err:', err, ', y:', y
    curve2.append(y)

print 'iterations: ', it
plot(curve2, 'bo-')
legend(['gradient descent I used', 'backtracking line search'])
show()

运行结果以下图：spa

孰优孰劣，一目了然code

个人方法用了25次迭代，而backtracking line search只用了6次。（并且以前我用的方法不必定会收敛的，好比你把第一种方法的stepsize改为1，就会发现，没有收敛到最优解就中止了，这是一个bug，要注意）blog

这只是个toy example，在我真实使用的优化问题上，二者的效率差异更加显著，估计有10倍的样子it

-- io

文章中截图来自：https://www.youtube.com/watch?v=nvZF-t2ltSMfunction

（是cmu的优化课程）