Reinforcement Learning_By David Silver笔记四: Model Free Prediction

时间 2021-01-02

原文原文链接

前面的动态规划主要用来解决model已知的MDP问题，这里主要解决model/环境未知时的MDP预估价值函数问题，方法主要有： MC方法：不需要知道转移矩阵或回报矩阵，在非马尔科夫环境中高效时序差分方法： Monte-Carlo Learning 直接从experience的episode中学习不需要MDP的transition、rewards 主要思想：value = mean return