1804.03235-Large scale distributed neural network training through online distillation.md

时间 2021-01-13

原文原文链接

1804.03235-Large scale distributed neural network training through online distillation.md 现有分布式模型训练的模式分布式SGD 并行SGD：大规模训练中，一次的最长时间取决于最慢的机器异步SGD：不同步的数据，有可能导致权重更新向着未知方向并行多模型：多个集群训练不同的模型，再组合最终模型，但是会