强化学习中的on-policy和off-policy解释

首先引经据典一番,在sutton的introduction to reinforcement中,82页(第二版,November 5, 2017)中写道: On-policy methods attempt to evaluate or improve the policy that is used to make decisions, whereas off-policy methods eva
相关文章
相关标签/搜索