The awkward Bellman optimality equation in RL

时间 2020-12-24

原文原文链接

通过博文2017 Fall CS294 Lecture 6: Actor-critic introduction，一文中插播的Reinforcement Learning: An introduction(Sutton1998)书中的一页截图，对于 Vπ(s) : the state-value function for policy π . Qπ(s,a) : the action-value