Powered by GitBook

SVM的一个有趣的应用是基于相关数据预测流量. 在前面的配方中, 我们使用SVM作为分类器. 在这个配方中, 我们将使用它作为回归来估计交通流量.

准备

我们将使用https://archive.ics.uci.edu/ml/datasets/Dodgers+Loop+Sensor上提供的数据集. 这是一个数据集, 计算在洛杉矶道奇队主场体育场的棒球比赛过程中经过的车辆数量. 我们将使用该数据集的稍微修改的形式, 以便更容易分析. 您可以使用已提供给您的traffic_data.txt文件. 此文件中的每一行包含以下列方式格式化的逗号分隔字符串:

Day
Time
The opponent team
Whether or not a baseball game is going on
The number of cars passing by

怎么做...?

让我们看看如何构建一个SVM回归. 我们将使用已提供给您的traffic.py作为参考. 创建一个新的Python文件, 并添加以下行:

# SVM regressor to estimate traffic

import numpy as np
from sklearn import preprocessing
from sklearn.svm import SVR
import sklearn.metrics as sm

input_file = 'traffic_data.txt'

# Reading the data
X = []
count = 0
with open(input_file, 'r') as f:
    for line in f.readlines():
        data = line[:-1].split(',')
        X.append(data)

X = np.array(X)

数据编码

# Convert string data to numerical data
label_encoder = []
X_encoded = np.empty(X.shape)
for i, item in enumerate(X[0]):
    if item.isdigit():
        X_encoded[:, i] = X[:, i]
    else:
        label_encoder.append(preprocessing.LabelEncoder())
        X_encoded[:, i] = label_encoder[-1].fit_transform(X[:, i])

X = X_encoded[:, :-1].astype(int)
y = X_encoded[:, -1].astype(int)

让我们使用径向基函数构建和训练SVM回归

params = {'kernel': 'rbf', 'C': 10.0, 'epsilon': 0.2}
regressor = SVR(**params)
regressor.fit(X, y)
# 在前面的行中, C参数指定错误分类的惩罚, epsilon指定不应用惩罚的限制

让我们执行交叉验证以检查回归者的性能

y_pred = regressor.predict(X)
print ("Mean absolute error =", round(sm.mean_absolute_error(y, y_pred), 2))

测试单个数据点

# Testing encoding on single data instance
input_data = ['Tuesday', '13:35', 'San Francisco', 'yes']
input_data_encoded = [-1] * len(input_data)
count = 0
for i, item in enumerate(input_data):
    if item.isdigit():
        input_data_encoded[i] = int(input_data[i])
    else:
        input_data_encoded[i] = int(
            label_encoder[count].transform([input_data[i]])
        )
        count = count + 1

input_data_encoded = np.array(input_data_encoded).reshape(1, -1)

# Predict and print output for a particular datapoint
print ("Predicted traffic:", int(regressor.predict(input_data_encoded)[0]))

结果如下:

Mean absolute error = 4.08
Predicted traffic: 29

results matching ""

No results matching ""