Reinforcement Learning:An Introduction Chapter 2 Multi-armed Bandits

文章目录 Abstract 2.1 A k-armed Bandit Problem 2.2 Action-value Methods 2.3 The 10-armed Testbed 2.4 Incremental Implementation 2.5 Tracking a Nonstationary Problem 2.6 Optimistic Initial Values 2.7 Upper
相关文章
相关标签/搜索