深度加强学习David Silver（四）——Model-Free Prediction

时间 2019-12-11

标签深度加强学习 david silver model free prediction 繁體版

原文原文链接

本节课主要介绍：web Monte-Carlo Learning Temporal-Difference Learning TD(λ) Lecture03讲到了已知环境的MDP，也就是作出行动以后知道到达哪一个状态及奖励，可是现实中大部分状况下状态和奖励是未知的，这种状况称为model-free，即环境模型未知。本节课探讨prediction，估计未知环境的MDP的价值函数，下节课讲control

>>阅读原文<<