Mastering the game of Go without human knowledge (AlphaGo Zero)

时间 2020-12-21

原文原文链接

AlphaGo的树搜索结合了深度神经网络，这些网络是由专家知识进行监督式学习以及从selfplay中进行强化学习。AlphaGo Zero仅基于强化学习，一个神经网络被训练来预测行为的选择和价值。该神经网络提高了树搜索的性能，从而在下一次迭代中提供了更高质量的移动选择和更强的自我玩法，同时更精确的树搜索又能改善网络性能。文章目录 Introduction Reinforcement learni