强化学习论文——Policy invariance under reward transformations: Theory and application to reward shaping

Policy invariance under reward transformations: Theory and application to reward shaping 这篇文章是奖励塑造的重要理论基础,对奖励函数的设计具有指导作用,作者有吴恩达,地址http://luthuli.cs.uiuc.edu/~daf/courses/games/AIpapers/ng99policy.pdf
相关文章
相关标签/搜索