构建推荐引擎最重要的任务之一是找到类似的用户. 这将指导创建将提供给这些用户的建议. 让我们看看如何建立它.

怎么做...?

创建文件并导入需要的包:

import json
import numpy as np

from pearson_score import pearson_score

让我们定义一个函数来查找输入用户的类似用户. 它需要三个输入参数: 数据库, 输入用户和我们正在寻找的类似用户的数量. 我们的第一步是检查用户是否存在于数据库中. 如果用户存在, 我们需要计算该用户与数据库中所有其他用户之间的Pearson相关分数:

# Finds a specified number of users who are similar to the input user


def find_similar_users(dataset, user, num_users):
    if user not in dataset:
        raise TypeError('User ' + user + ' not present in the dataset')

    # Compute Pearson scores for all the users
    scores = np.array([[x, pearson_score(dataset, user, x)]
                       for x in dataset if user != x])

    # Sort the scores based on second column
    scores_sorted = np.argsort(scores[:, 1])

    # Sort the scores in decreasing order (highest score first)
    scored_sorted_dec = scores_sorted[::-1]

    # Extract top 'k' indices
    top_k = scored_sorted_dec[0:num_users]

    return scores[top_k]

我们来定义主要功能并加载输入数据库:

if __name__ == '__main__':
    data_file = 'movie_ratings.json'

    with open(data_file, 'r') as f:
        data = json.loads(f.read())

    user = 'John Carson'
    print ("Users similar to " + user + ":\n")
    similar_users = find_similar_users(data, user, 3)
    print ("User Similarity score")
    for item in similar_users:
        print (item[0], '\t\t', round(float(item[1]), 2))

输出结果如下:

在数据集中查找类似的用户

怎么做...?

results matching ""

No results matching ""