机器学习之梯度降低法

时间 2020-07-12

标签机器学习梯度降低繁體版

原文原文链接

机器学习

机器学习
- 梯度降低法

梯度降低法

基本概念

梯度降低法（gradient descent），又名最速降低法（steepest descent）是求解无约束最优化问题最经常使用的方法，它是一种迭代方法，每一步主要的操做是求解目标函数的梯度向量，将当前位置的负梯度方向做为搜索方向（由于在该方向上目标函数降低最快，这也是最速降低法名称的由来）。算法

梯度降低，其实就是一个公式：网络

公式推导

![img](file:///E:\qq聊天记录\1713176942\Image\C2C\2D65DC80FB67F846BD86FE94D6FF8215.jpg)app

基本梯度降低步骤

步骤：dom

η为学习率，ε为收敛条件。梯度降低法属于机器学习，本质为：不断迭代判断是否知足条件，会用到循环语句。机器学习

st=>start: 首先设定一个较小的正数m,n;
op=>operation: 求当前位置处的各个偏导数;
修改当前函数的参数值;
cond=>condition: 参数变化量小于n
sub1=>subroutine: 回退迭代
io=>inputoutput: 求得极小值
e=>end: 结束框
st->op->cond
cond(yes)->io->e
cond(no)->sub1(right)->op

批量梯度降低(BGD)

Batch gradient descent:：批量梯度降低算法(BGD)，其须要计算整个训练集的梯度，即：函数

其中η为学习率，用来控制更新的“力度”/"步长"。学习

优势：

对于凸目标函数，能够保证全局最优；对于非凸目标函数，能够保证一个局部最优。优化

缺点：

速度慢; 数据量大时不可行; 没法在线优化(即没法处理动态产生的新样本)。code

代码实现

#引库
#引入matplotlib库,用于画图
import matplotlib.pyplot as plt
from math import pow
#图片嵌入jupyter
#matplotlib inline

#为了便于取用数据,咱们将数据分为x,y,在直角坐标系中(x,y)是点
x = [1,2,3,4,5,6]
y = [13,14,20,21,25,30]
print("打印初始数据图...")
plt.scatter(x,y)
plt.xlabel("X")
plt.ylabel("Y")
plt.show()

#超参数设定
alpha = 0.01#学习率/步长
theta0 = 0#θ0
theta1 = 0#θ1
epsilon = 0.001#偏差
m = len(x)

count = 0
loss = []

for time in range(1000):
    count += 1
    #求偏导theta0和theta1的结果
    temp0 = 0#J(θ)对θ0求导的结果
    temp1 = 0#J(θ)对θ1求导的结果
    diss = 0
    for i in range(m):
        temp0 += (theta0+theta1*x[i]-y[i])/m
        temp1 += ((theta0+theta1*x[i]-y[i])/m)*x[i]

    #更新theta0和theta1
    for i in range(m):
        theta0 = theta0 - alpha*((theta0+theta1*x[i]-y[i])/m) 
        theta1 = theta1 - alpha*((theta0+theta1*x[i]-y[i])/m)*x[i]

    #求损失函数J(θ)
    for i in range(m):
        diss = diss + 0.5*(1/m)*pow((theta0+theta1*x[i]-y[i]),2)
    loss.append(diss)

    #看是否知足条件
    '''
    if diss<=epsilon:
        break
    else:
        continue
    '''
print("最终的结果为:")
print("这次迭代次数为:{}次,最终theta0的结果为:{},最终theta1的结果为:{}".format(count,theta0,theta1))
print("预测的最终回归函数为:y={}+{}x\n".format(theta0,theta1))
print("迭代图像绘制...")
plt.scatter(range(count),loss)
plt.show()

运行结果

随机梯度降低(SGD)

Stochastic gradient descent：随机梯度降低算法(SGD)，仅计算某个样本的梯度，即针对某一个训练样本 xi及其label yi更新参数：

逐步减少学习率，SGD表现得同BGD很类似，最后均可以有不错的收敛。

优势：

更新频次快，优化速度更快; 能够在线优化(能够没法处理动态产生的新样本)；必定的随机性致使有概率跳出局部最优(随机性来自于用一个样本的梯度去代替总体样本的梯度)。

缺点：

随机性可能致使收敛复杂化，即便到达最优势仍然会进行过分优化，所以SGD得优化过程相比BGD充满动荡。

代码实现

#引库
#引入matplotlib库,用于画图
import matplotlib.pyplot as plt
from math import pow
import numpy as np
#图片嵌入jupyter
#matplotlib inline

#为了便于取用数据,咱们将数据分为x,y,在直角坐标系中(x,y)是点
x = [1,2,3,4,5,6]
y = [13,14,20,21,25,30]
print("打印初始数据图...")
plt.scatter(x,y)
plt.xlabel("X")
plt.ylabel("Y")
plt.show()

#超参数设定
alpha = 0.01#学习率/步长
theta0 = 0#θ0
theta1 = 0#θ1
epsilon = 0.001#偏差
m = len(x)

count = 0
loss = []

for time in range(1000):
    count += 1
    diss = 0
    #求偏导theta0和theta1的结果
    temp0 = 0#J(θ)对θ0求导的结果
    temp1 = 0#J(θ)对θ1求导的结果
    for i in range(m):
        temp0 += (theta0+theta1*x[i]-y[i])/m
        temp1 += ((theta0+theta1*x[i]-y[i])/m)*x[i]

    #更新theta0和theta1
    for i in range(m):
        theta0 = theta0 - alpha*((theta0+theta1*x[i]-y[i])/m) 
        theta1 = theta1 - alpha*((theta0+theta1*x[i]-y[i])/m)*x[i]

    #求损失函数J(θ)
    rand_i = np.random.randint(0,m)
    diss += 0.5*(1/m)*pow((theta0+theta1*x[rand_i]-y[rand_i]),2)
    loss.append(diss)

    #看是否知足条件
    '''
    if diss<=epsilon:
        break
    else:
        continue
    '''
print("最终的结果为:")
print("这次迭代次数为:{}次,最终theta0的结果为:{},最终theta1的结果为:{}".format(count,theta0,theta1))
print("预测的最终回归函数为:y={}+{}x\n".format(theta0,theta1))
print("迭代图像绘制...")
plt.scatter(range(count),loss)
plt.show()

运行结果

小批量梯度降低(MBGD)

Mini-batch gradient descent：小批量梯度降低算法(MBGD)，计算包含n个样本的mini-batch的梯度：

MBGD是训练神经网络最经常使用的优化方法。

优势：

参数更新时的动荡变小，收敛过程更稳定，下降收敛难度；能够利用现有的线性代数库高效的计算多个样本的梯度。

代码实现

#引库
#引入matplotlib库,用于画图
import matplotlib.pyplot as plt
from math import pow
import numpy as np
#图片嵌入jupyter
#matplotlib inline

#为了便于取用数据,咱们将数据分为x,y,在直角坐标系中(x,y)是点
x = [1,2,3,4,5,6]
y = [13,14,20,21,25,30]
print("打印初始数据图...")
plt.scatter(x,y)
plt.xlabel("X")
plt.ylabel("Y")
plt.show()

#超参数设定
alpha = 0.01#学习率/步长
theta0 = 0#θ0
theta1 = 0#θ1
epsilon = 0.001#偏差
diss = 0#损失函数
m = len(x)

count = 0
loss = []

for time in range(1000):
    count += 1
    diss = 0
    #求偏导theta0和theta1的结果
    temp0 = 0#J(θ)对θ0求导的结果
    temp1 = 0#J(θ)对θ1求导的结果
    for i in range(m):
        temp0 += (theta0+theta1*x[i]-y[i])/m
        temp1 += ((theta0+theta1*x[i]-y[i])/m)*x[i]

    #更新theta0和theta1
    for i in range(m):
        theta0 = theta0 - alpha*((theta0+theta1*x[i]-y[i])/m) 
        theta1 = theta1 - alpha*((theta0+theta1*x[i]-y[i])/m)*x[i]

    #求损失函数J(θ)
    result = []
    for i in range(3):
        rand_i = np.random.randint(0,m)
        result.append(rand_i)
    for j in result:
        diss += 0.5*(1/m)*pow((theta0+theta1*x[j]-y[j]),2)
    loss.append(diss)

    #看是否知足条件
    '''
    if diss<=epsilon:
        break
    else:
        continue
    '''
print("最终的结果为:")
print("这次迭代次数为:{}次,最终theta0的结果为:{},最终theta1的结果为:{}".format(count,theta0,theta1))
print("预测的最终回归函数为:y={}+{}x\n".format(theta0,theta1))
print("迭代图像绘制...")
plt.scatter(range(count),loss)
plt.show()

运行结果