构建推荐引擎最重要的任务之一是找到类似的用户. 这将指导创建将提供给这些用户的建议. 让我们看看如何建立它.
怎么做...?
- 创建文件并导入需要的包:
import json
import numpy as np
from pearson_score import pearson_score
- 让我们定义一个函数来查找输入用户的类似用户. 它需要三个输入参数: 数据库, 输入用户和我们正在寻找的类似用户的数量. 我们的第一步是检查用户是否存在于数据库中. 如果用户存在, 我们需要计算该用户与数据库中所有其他用户之间的Pearson相关分数:
# Finds a specified number of users who are similar to the input user
def find_similar_users(dataset, user, num_users):
if user not in dataset:
raise TypeError('User ' + user + ' not present in the dataset')
# Compute Pearson scores for all the users
scores = np.array([[x, pearson_score(dataset, user, x)]
for x in dataset if user != x])
# Sort the scores based on second column
scores_sorted = np.argsort(scores[:, 1])
# Sort the scores in decreasing order (highest score first)
scored_sorted_dec = scores_sorted[::-1]
# Extract top 'k' indices
top_k = scored_sorted_dec[0:num_users]
return scores[top_k]
- 我们来定义主要功能并加载输入数据库:
if __name__ == '__main__':
data_file = 'movie_ratings.json'
with open(data_file, 'r') as f:
data = json.loads(f.read())
user = 'John Carson'
print ("Users similar to " + user + ":\n")
similar_users = find_similar_users(data, user, 3)
print ("User Similarity score")
for item in similar_users:
print (item[0], '\t\t', round(float(item[1]), 2))
- 输出结果如下: