David Silver《Reinforcement Learning》课程解读—— Lecture 3: Planning by Dynamic Programming

David Silver《Reinforcement Learning》课程解读—— Lecture 3: Planning by Dynamic Programming DP用来解决MDPs的planning问题,主要解决途径有policy iteration和value iteration。 目录: Introduction Policy Evaluation Policy Iteration
相关文章
相关标签/搜索