Reinforcement Learning(四):Actor-Critic Methods

主要思想: Policy Network (Actor) Value Network (Critic): 形象对比: Train the Neural Networks 具体步骤: Update value network q using TD Update policy network Π using policy gradient Actor-Critic Method Summary of
相关文章
相关标签/搜索