CS231N-14-Reinforcement Learning

What is Reinforcement Learning Markov Decision Process MDP Value Function Q-value Function Bellman Equation Q-learning Policy Gradient 最后一节。 So far, we have mainly talked about supervised learning lik
相关文章
相关标签/搜索