让我们将所有这些知识应用到现实世界的问题. 我们将构建一个SVM来预测进出大楼的人数. 数据集可从以下网址获得: https://archive.ics.uci.edu/ml/datasets/CalIt2+Building+People+Counts. 我们将使用此数据集的略微修改版本, 以便更容易分析. 修改后的数据位于已提供给您的building_event_binary.txt和building_event_multiclass.txt文件中.

准备

让我们在开始构建模型之前了解数据格式. 构建事件binary.txt中的每一行由六个逗号分隔的字符串组成. 这六个字符串的顺序如下:

  • Day
  • Date
  • Time
  • The number of people going out of the building
  • The number of people coming into the building
  • The output indicating whether or not it's an event

前五个字符串构成输入数据, 我们的任务是预测一个事件是否在建筑物中发生.

building_event_multiclass.txt中的每行由六个逗号分隔的字符串组成. 这比以前的文件更细致, 因为输出是建筑物中发生的确切类型的事件. 这六个字符串的顺序如下:

  • Day
  • Date
  • Time
  • The number of people going out of the building
  • The number of people coming into of the building
  • The output indicating whether or not it's an event

前五个字符串形成输入数据, 我们的任务是预测在建筑物中发生什么类型的事件.

怎么做...?

  • 我们将使用已经提供给您的event.py以供参考. 创建一个新的Python文件, 并添加以下行:
import numpy as np
from sklearn import preprocessing
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score

input_file = 'building_event_binary.txt'
# input_file = 'building_event_multiclass.txt'

# Reading the data
X = []
count = 0
with open(input_file, 'r') as f:
    for line in f.readlines():
        data = line[:-1].split(',')
        X.append([data[0]] + data[2:])

X = np.array(X)
# We just loaded all the data into X
  • 让我们将数据转换为数值形式
# Convert string data to numerical data
label_encoder = []
X_encoded = np.empty(X.shape)
for i, item in enumerate(X[0]):
    if item.isdigit():
        X_encoded[:, i] = X[:, i]
    else:
        label_encoder.append(preprocessing.LabelEncoder())
        X_encoded[:, i] = label_encoder[-1].fit_transform(X[:, i])

X = X_encoded[:, :-1].astype(int)
y = X_encoded[:, -1].astype(int)
  • 让我们使用radial basis function,Platt缩放和类平衡来训练SVM:
params = {'kernel': 'rbf', 'probability': True, 'class_weight': 'auto'}
classifier = SVC(**params)
classifier.fit(X, y)
  • 使用交叉验证:
accuracy = cross_val_score(
    classifier,
    X,
    y,
    scoring='accuracy', cv=3
)
print (
    "Accuracy of the classifier: " + str(
        round(100 * accuracy.mean(), 2)
    ) + "%"
)
  • 使用新数据测试分类器
input_data = ['Tuesday', '12:30:00', '21', '23']
input_data_encoded = [-1] * len(input_data)
count = 0
for i, item in enumerate(input_data):
    if item.isdigit():
        input_data_encoded[i] = int(input_data[i])
    else:
        input_data_encoded[i] = int(
            label_encoder[count].transform([input_data[i]]))
        count = count + 1

input_data_encoded = np.array(input_data_encoded)

# Predict and print output for a particular datapoint
output_class = classifier.predict(input_data_encoded)
print ("Output class:", label_encoder[-1].inverse_transform(output_class)[0])
  • 输出结果如下:
Accuracy of the classifier: 93.95%
Output class: noevent
  • 如果用building_event_multiclass.txt代替building_event_binary.txt, 则输出结果为:
Accuracy of the classifier: 65.33%
Output class: eventA

results matching ""

    No results matching ""