Reinforcement Learning_By David Silver笔记五: Model Free Control

时间 2021-01-02

原文原文链接

(Optimise the value function of an unknown MDP) On-policy learning —— Learn about policy π from experience sampled from π Off-policy learning —— Learn about policy π from experience sampled from u On-