Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

时间 2021-01-02

原文原文链接

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift 1 Introduction 深度学习极大地提高了视觉、语言和许多其他领域的艺术水平。随机梯度下降（SGD）被证明是训练深层网络的一种有效方法，SGD变量如动量（Sutskever et al.，2013）和Ada