David Silver《Reinforcement Learning》课程解读—— Lecture 4： Model-Free Prediction

时间 2021-01-11

原文原文链接

David Silver《Reinforcement Learning》课程解读—— Lecture 4： Model-Free Prediction DP动态规划能够解决已知environment的MDP问题，即已知 S,A,P,R,γ ，根据是否已知policy又将问题划分为prediction和control的问题。本质上来说这种known MDP问题已知environment即转移矩阵与