RL策略梯度方法之(四): Asynchronous Advantage Actor-Critic（A3C）

时间 2020-12-30

原文原文链接

本专栏按照 https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html 顺序进行总结。文章目录原理解析算法实现总体流程代码实现 A 3 C \color{red}A3C A3C ：[ paper | code ] 原理解析在A3C中，critic 学习值函数，同时多个 actor 并行