让我们将所有这些知识应用到现实世界的问题. 我们将构建一个SVM来预测进出大楼的人数. 数据集可从以下网址获得: https://archive.ics.uci.edu/ml/datasets/CalIt2+Building+People+Counts. 我们将使用此数据集的略微修改版本, 以便更容易分析. 修改后的数据位于已提供给您的building_event_binary.txt和building_event_multiclass.txt文件中.

准备

让我们在开始构建模型之前了解数据格式. 构建事件binary.txt中的每一行由六个逗号分隔的字符串组成. 这六个字符串的顺序如下:

Day
Date
Time
The number of people going out of the building
The number of people coming into the building
The output indicating whether or not it's an event

前五个字符串构成输入数据, 我们的任务是预测一个事件是否在建筑物中发生.

building_event_multiclass.txt中的每行由六个逗号分隔的字符串组成. 这比以前的文件更细致, 因为输出是建筑物中发生的确切类型的事件. 这六个字符串的顺序如下:

Day
Date
Time
The number of people going out of the building
The number of people coming into of the building
The output indicating whether or not it's an event

前五个字符串形成输入数据, 我们的任务是预测在建筑物中发生什么类型的事件.

怎么做...?

我们将使用已经提供给您的event.py以供参考. 创建一个新的Python文件, 并添加以下行:

import numpy as np
from sklearn import preprocessing
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score

input_file = 'building_event_binary.txt'
# input_file = 'building_event_multiclass.txt'

# Reading the data
X = []
count = 0
with open(input_file, 'r') as f:
    for line in f.readlines():
        data = line[:-1].split(',')
        X.append([data[0]] + data[2:])

X = np.array(X)
# We just loaded all the data into X

让我们将数据转换为数值形式

# Convert string data to numerical data
label_encoder = []
X_encoded = np.empty(X.shape)
for i, item in enumerate(X[0]):
    if item.isdigit():
        X_encoded[:, i] = X[:, i]
    else:
        label_encoder.append(preprocessing.LabelEncoder())
        X_encoded[:, i] = label_encoder[-1].fit_transform(X[:, i])

X = X_encoded[:, :-1].astype(int)
y = X_encoded[:, -1].astype(int)

让我们使用radial basis function，Platt缩放和类平衡来训练SVM:

params = {'kernel': 'rbf', 'probability': True, 'class_weight': 'auto'}
classifier = SVC(**params)
classifier.fit(X, y)

使用交叉验证:

accuracy = cross_val_score(
    classifier,
    X,
    y,
    scoring='accuracy', cv=3
)
print (
    "Accuracy of the classifier: " + str(
        round(100 * accuracy.mean(), 2)
    ) + "%"
)

使用新数据测试分类器

input_data = ['Tuesday', '12:30:00', '21', '23']
input_data_encoded = [-1] * len(input_data)
count = 0
for i, item in enumerate(input_data):
    if item.isdigit():
        input_data_encoded[i] = int(input_data[i])
    else:
        input_data_encoded[i] = int(
            label_encoder[count].transform([input_data[i]]))
        count = count + 1

input_data_encoded = np.array(input_data_encoded)

# Predict and print output for a particular datapoint
output_class = classifier.predict(input_data_encoded)
print ("Output class:", label_encoder[-1].inverse_transform(output_class)[0])

输出结果如下:

Accuracy of the classifier: 93.95%
Output class: noevent

如果用building_event_multiclass.txt代替building_event_binary.txt, 则输出结果为:

Accuracy of the classifier: 65.33%
Output class: eventA

构建事件预测器

准备

怎么做...?

results matching ""

No results matching ""