一、基础html
建立本身的预测算法很是简单:算法只不过是一个派生自AlgoBase
具备estimate
方法的类。这是该方法调用的predict()
方法。它接受内部用户ID,内部项ID,并返回估计评级r算法
from surprise import AlgoBase from surprise import Dataset from surprise.model_selection import cross_validate class MyOwnAlgorithm(AlgoBase): def __init__(self): # Always call base method before doing anything. AlgoBase.__init__(self) def estimate(self, u, i): # 存储有关预测的其余信息,还能够返回包含给定详细信息的字典 details = {'info1' : 'That was', 'info2' : 'easy stuff :)'} return 3, details data = Dataset.load_builtin('ml-100k') algo = MyOwnAlgorithm() cross_validate(algo, data, verbose=True)
以上代码实现了一个最简单的自定义预测方法。函数
二、fit()方法ui
如今,让咱们制做一个稍微聪明的算法来预测列车集的全部评级的平均值。因为这是一个不依赖于当前用户或项目的常量值,咱们宁愿一劳永逸地计算它。这能够经过定义fit
方法来完成:atom
class MyOwnAlgorithm(AlgoBase): def __init__(self): # Always call base method before doing anything. AlgoBase.__init__(self) def fit(self, trainset): # Here again: call base method before doing anything. AlgoBase.fit(self, trainset) # Compute the average rating. We might as well use the # trainset.global_mean attribute ;) self.the_mean = np.mean([r for (_, _, r) in self.trainset.all_ratings()]) return self def estimate(self, u, i): return self.the_mean
fit
方法例如经过cross_validate
交叉验证过程的每一个折叠处的函数调用(也能够本身调用它)。在作任何事情以前,你应该调用基类fit()
方法。spa
请注意,该fit()
方法返回self
。这容许使用表达式algo.fit(trainset).test(testset)
。rest
三、trainset属性code
fit()
返回基类方法后,您须要的有关当前训练集的全部信息(评级值等)都存储在self.trainset
属性中。这是一个Trainset
具备许多预测属性和方法的对象。orm
为了说明它的用法,让咱们制做一个算法来预测全部评级的平均值,用户的平均评分和项目的平均评级之间的平均值:xml
def estimate(self, u, i): sum_means = self.trainset.global_mean div = 1 if self.trainset.knows_user(u): sum_means += np.mean([r for (_, r) in self.trainset.ur[u]]) div += 1 if self.trainset.knows_item(i): sum_means += np.mean([r for (_, r) in self.trainset.ir[i]]) div += 1 return sum_means / div
四、预测不可能
由算法决定是否可以产生预测。若是预测不可能,则能够提出 PredictionImpossible
异常。您须要先导入它:
from surprise import PredictionImpossible
该异常将被该predict()
方法和估计r捕获^你我[R^ü一世将根据default_prediction()
方法设置,能够覆盖。默认状况下,它返回列车集中全部评级的平均值。
五、类似性和基线
若是算法使用类似性度量或基线估计,您将须要接受bsl_options
并sim_options
做为__init__
方法的参数 ,并将它们传递给Base类。
class MyOwnAlgorithm(AlgoBase): def __init__(self, sim_options={}, bsl_options={}): AlgoBase.__init__(self, sim_options=sim_options, bsl_options=bsl_options) def fit(self, trainset): AlgoBase.fit(self, trainset) # Compute baselines and similarities self.bu, self.bi = self.compute_baselines() self.sim = self.compute_similarities() return self def estimate(self, u, i): if not (self.trainset.knows_user(u) and self.trainset.knows_item(i)): raise PredictionImpossible('User and/or item is unkown.') # Compute similarities between u and v, where v describes all other # users that have also rated item i. neighbors = [(v, self.sim[u, v]) for (v, r) in self.trainset.ir[i]] # Sort these neighbors by similarity neighbors = sorted(neighbors, key=lambda x: x[1], reverse=True) print('The 3 nearest neighbors of user', str(u), 'are:') for v, sim_uv in neighbors[:3]: print('user {0:} with sim {1:1.2f}'.format(v, sim_uv)) # ... Aaaaand return the baseline estimate anyway ;)