multi-arm-bandits问题python代码

假设有k=10个摇臂的老虎机,其奖励分布满足高斯正态分布,每个摇臂对应的正态分布的均值与方差分别为: #the real mean value of each ation's reward qa_star = np.array([0.2,-0.3,1.5,0.5,1.2,-1.6,-0.2,-1,1.1,-0.6]) #the vars of each action's reward var_qa
相关文章
相关标签/搜索