4层神经网络的前向传播与反向传播计算过程

4 4 层神经网络:设置 L L 为第几层, n n 为每一层的个数, L = [ L 1 , L 2 , L 3 , L 4 ] , n = [ 5 , 5 , 3 , 1 ] L=[L1,L2,L3,L4],n=[5,5,3,1]
在这里插入图片描述

1、前向传播

(1)单个样本向量表示

每层经过 线性计算 和 激活函数 两步计算
z [ 1 ] = W [ 1 ] x + b [ 1 ] a [ 1 ] = g [ 1 ] ( z [ 1 ] ) x a [ 1 ] z^{[1]} = W^{[1]}x+b^{[1]},a^{[1]}=g^{[1]}(z^{[1]}),输入x,输出a^{[1]};

z [ 2 ] = W [ 2 ] a [ 1 ] + b [ 2 ] a [ 2 ] = g [ 2 ] ( z [ 2 ] ) a [ 1 ] a [ 2 ] z^{[2]} = W^{[2]}a^{[1]}+b^{[2]},a^{[2]}=g^{[2]}(z^{[2]}),输入a^{[1]},输出a^{[2]};

z [ 3 ] = W [ 3 ] a [ 2 ] + b [ 3 ] a [ 3 ] = g [ 3 ] ( z [ 3 ] ) a [ 2 ] a [ 3 ] z^{[3]} = W^{[3]}a^{[2]}+b^{[3]},a^{[3]}=g^{[3]}(z^{[3]}),输入a^{[2]},输出a^{[3]};

z [ 4 ] = W [ 4 ] a [ 3 ] + b [ 4 ] a [ 4 ] = σ ( z [ 4 ] ) a [ 3 ] a [ 4 ] z^{[4]} = W^{[4]}a^{[3]}+b^{[4]},a^{[4]}=\sigma(z^{[4]}),输入a^{[3]},输出a^{[4]};

g [ L ] σ 其中 g^{[L]}、\sigma为激活函数。

上式的通用公式表达:

z [ L ] = W [ L ] a [ L 1 ] + b [ L ] a [ L ] = g [ L ] ( z [ L ] ) x = a [ 0 ] a [ L 1 ] a [ L ] z^{[L]} = W^{[L]}a^{[L-1]}+b^{[L]},a^{[L]}=g^{[L]}(z^{[L]}),原始输入x = a^{[0]},输入a^{[L-1]},输出a^{[L]}

(2)m个样本的向量表示

Z [ L ] = W [ L ] A [ L 1 ] + b [ L ] A [ L ] = g [ L ] ( Z [ L ] ) X = A [ 0 ] a [ L 1 ] a [ L ] Z^{[L]} = W^{[L]}A^{[L-1]}+b^{[L]},A^{[L]}=g^{[L]}(Z^{[L]}),原始输入X = A^{[0]},输入a^{[L-1]},输出a^{[L]}

2、反向传播

通过一个图来表示反向的过程
在这里插入图片描述

(1)单个训练样本的反向传播

d Z [ l ] = J Z [ l ] = J a [ l ] a [ l ] Z [ l ] = d a [ l ] g [ l ] ( Z [ l ] ) dZ^{[l]}=\frac{\partial J}{\partial Z^{[l]}}=\frac{\partial J}{\partial a^{[l]}}\cdot \frac{\partial a^{[l]}}{\partial Z^{[l]}}=da^{[l]}\cdot g^{[l]}{'}(Z^{[l]})

d W [ l ] = J W [ l ] = J a [ l ] a [ l ] Z [ l ] Z [ l ] W [ l ] = d Z [ l ] a [ l 1 ] dW^{[l]}=\frac{\partial J}{\partial W^{[l]}}=\frac{\partial J}{\partial a^{[l]}}\cdot \frac{\partial a^{[l]}}{\partial Z^{[l]}}\cdot \frac{\partial Z^{[l]}}{\partial W^{[l]}}=dZ^{[l]}\cdot a^{[l-1]}

d b [ l ] = J b [ l ] = J a [ l ] a [ l ] Z [ l ] Z [ l ] b [ l ] = d Z [ l ] db^{[l]}=\frac{\partial J}{\partial b^{[l]}}=\frac{\partial J}{\partial a^{[l]}}\cdot \frac{\partial a^{[l]}}{\partial Z^{[l]}}\cdot \frac{\partial Z^{[l]}}{\partial b^{[l]}}=dZ^{[l]}

d a [ l 1 ] = J a [ l ] = W [ l ] T d Z [ l ] da^{[l-1]}=\frac{\partial J}{\partial a^{[l]}}=W^{[l]T}\cdot dZ^{[l]}

  • d W [ l ] d b [ l ] dW^{[l]}、db^{[l]} 用于该层参数更新; d a [ l ] da^{[l]} 用于向前层反向传播误差信号
  • 单个训练样本,参数 W W b b 的更新公式为:
    W : = W α J ( W , b ) W b : = b α J ( W , b ) b J ( W , b ) α / W := W - \alpha\frac{\partial J(W, b)}{\partial W};b := b - \alpha\frac{\partial J(W, b)}{\partial b},其中J(W, b)为损失函数, \alpha为学习率/步长。
(2)全体训练样本的反向传播

d Z [ l ] = d A [ l ] g [ l ] ( Z [ l ] ) dZ^{[l]}=dA^{[l]}\cdot g^{[l]}{'}(Z^{[l]})

d W [ l ] = 1 m d Z [ l ] A [ l 1 ] T dW^{[l]}=\frac{1}{m}dZ^{[l]}\cdot {A^{[l-1]}}^{T}

d b [ l ] = 1 m n p . s u m ( d Z [ l ] , a x i s = 1 ) db^{[l]}=\frac{1}{m}np.sum(dZ^{[l]},axis=1)

d A [ l 1 ] = W [ l ] T d Z [ l ] dA^{[l-1]}=W^{[l]T}\cdot dZ^{[l]}

  • 全体训练样本,参数 W W b b 的更新公式为:
    W : = W α d W b : = b α d b W := W - \alpha dW,b := b - \alpha db