Reinforcement Learning_By David Silver笔记五: Model Free Control

(Optimise the value function of an unknown MDP) On-policy learning —— Learn about policy π from experience sampled from π Off-policy learning —— Learn about policy π from experience sampled from u On-
相关文章
相关标签/搜索