The awkward Bellman optimality equation in RL

通过博文2017 Fall CS294 Lecture 6: Actor-critic introduction,一文中插播的Reinforcement Learning: An introduction(Sutton1998)书中的一页截图,对于 Vπ(s) : the state-value function for policy π . Qπ(s,a) : the action-value
相关文章
相关标签/搜索