Policy in Reinforcement Learning

From the last post about MDP, we know the environment consists of 5 basic elements:html State Space of environment;post Actions Space that the environment allows;ui Transition Matrix: The probabilitie
相关文章
相关标签/搜索