机器学习（5）--SVM（Support Vector Machine）支持向量机

Table of Contents一、模式识别1.1 模式识别系统结构1.2 特征提取1.3 特征向量分类二、机器学习分类2.1 监督学习（Supervised Learning）2.2 无监督学习（Unsupervised Learning）三、有监督学习：分类（Classification）3.1 有监督学习（supervised learning）与分...

Techblog of HaoWANG

1006人浏览 · 2019-03-19 14:15:26

Techblog of HaoWANG · 2019-03-19 14:15:26 发布

Table of Contents

2.1 监督学习（Supervised Learning）

2.2 无监督学习（Unsupervised Learning）

三、有监督学习：分类（Classification）

3.1 有监督学习（supervised learning）与分类问题（Classification）

3.2 分类器 Popular classifiers

四、高斯混合模型

4.1 高斯模型（Gaussian Model）

一、模式识别

1.1 模式识别系统结构

模式识别系统一般由两个部分组成：

特征提取（feature extraction ）
特征向量分类（feature-vector classification.）

1.2 特征提取

特征提取是提取与任务相关的特征向量，例如，用于语音识别的频谱系数。

Feature extraction is to extract feature vectors that are relevant to the task, e.g., spectral coefficients for speech recognition.

通常，特征向量的维度低于原始输入数据, 例如，语音识别中的频谱系数具有尺寸39，而原始语音样本具有尺寸256。

Typically, feature vectors are of lower-dimension than the raw input data–E.g., spectral coefficients in speech recognition has dimension 39 while raw speech samples have dimension 256.

1.3 特征向量分类

特征向量分类是将标签（从K个可能的分类标签）分配给每个特征向量，例如，在人脸识别系统中将未知图像分类至已知的人的ID中去。

Feature-vector classification is to assign a label (from K possible labels) to each of the feature vectors, e.g., assigning a person ID to an unknown image in a face recognition system.

二、机器学习分类

机器学习主要分为：监督学习（supervised learnin）和无监督学习（Unsupervised learning）两大类

2.1 监督学习（Supervised Learning）

Supervised Learning: Learning with input patterns (feature vectors) and their desired outputs--labels

监督学习主要有以下两个应用场景：

Supervised learning has two typical applications:

–分类Classification: Constructing a classifier by fitting a Gaussian distribution to the distribution of each class.

The class labels of input feature vectors are used for dividing the data into groups. The desired outputs are the class labels.

–回归Regression: Find a function f(x) to fit the training data where the desired outputs are continuous variables.

2.2 无监督学习（Unsupervised Learning）

•Unsupervised Learning: Learning with input patterns (feature vectors) ，but without desired output nor class labels.

•Unsupervised learning has two typical applications:

–聚类Clustering or Cluster Analysis: Finding the clusters and how many clusters in a dataset.

–预训练：As a pre-training step of supervised learning models, e.g., deep neural networks

三、有监督学习：分类（Classification）

3.1 有监督学习（supervised learning）与分类问题（Classification）

分类是学习将数据项映射到几个预定义类之一的函数的过程。

学习的目标是创建一个分类模型，称为分类器Classifier，当未知样本的特征向量呈现给其输入时，该分类模型将类标签作为输出。

Classification is a process of learning a function that maps a data item into one of several pre-defined classes.

The goal of learning is to create a classification model, known as a classifier, which produces a class label as output when a feature vector of an unknown sample is presented to its input.

3.2 分类器 Popular classifiers

主流的分类器有以下几种：

支持向量机SVM--Support Vector Machines
高斯混合模型GMM -- Gaussian Mixture Models
决策树 -- Decision Trees
逻辑回归 -- Logistics Regression
线性回归 -- Linear Regression
K最近邻 -- K-Nearest Neighbour
线性判别分析LDA -- Linear Discriminant Analysis
概率线性判别分析 -- Probabilistic LDA
隐马尔可夫模型 -- Hidden Markov Model
深度/卷积神经网络DNN/CNN -- Deep / Convolution Neural Networks

四、高斯混合模型

4.1 高斯模型（Gaussian Model）

一维问题（1-Dimension case）：

where mean表示期望E(x), var表示方差，x服从（μk，var）的高斯或正态分布，p(x|Ck)表示特征向量x属于k-th Class的概率密度函数（Probability Density Function--PDF），因此，对p(x|Ck)从0到无穷大积分结果为1.

将问题扩展至N维（N-Dimension case）
最大似然估计（Maximum Likelihood）

4.2 高斯混合模型GMM

五、 SVM支持向量机

5.1 简单线性SVM

简单的分类边界：

但是，我们希望消除远离决策边界的特征向量对分类的影响，因为它们的影响通常很小。我们也可能只选择一些重要的数据点（称为支持向量）并对它们进行不同的加权。然后，我们得到了支持向量机。

5.2 线性SVM的局限

一般的，线性SVM方法能完成简单的分类任务，但是线性SVM会在复杂特征分类时出现问题，例如：特征线性不可分情况。

5.3 非线性特征映射方法

思路：特征在当前低维（n-Dim）向量空间是线性不可分的，总存在一个函数φ(x)，使得在x映射到高维（（n+1）-Dim）向量空间后，该向量群是线性可分的。

【举例：1-D--->> 2-D 】

【举例：2-D --->> 3-D】

常用kernels：

对比：

5.4 SVM分类器

思路：SVM分类器比较适合解决二分类问题，当分类的特征类别超过2时，可以使用多次SVM的策略，每个SVM将问题简化为二分类问题，即只区分 ”子问题A“ 与 ”除A以外的其他“ 两类。最后求出使得分类观测值最大的那个分类，即为分类结果 k-th.

6. 上机实验

####################################################
import numpy as np
import pandas as pd
from sklearn import svm
from sklearn.model_selection import train_test_split
import matplotlib as mpl
import matplotlib.pyplot as plt

##################################################


def iris_type(s):
	# python3读取数据时候，需要一个编码因此在string面前加一个b
	it = {b'Iris-setosa':0, b'Iris-versicolor':1, b'Iris-virginica':2}
	return it[s]
 
iris_feature = 'sepal length', 'sepal width', 'petal lenght', 'petal width'
 
def show_accuracy(a, b, tip):
	acc = a.ravel() == b.ravel()
	print('%s Accuracy:%.3f' %(tip, np.mean(acc)))
 
if __name__ == '__main__':
	# 加载数据
	iris_feature = 'sepal length', 'sepal width', 'petal lenght', 'petal width'
	'''
	# 方法1：通过pandas读取数据
	data = pd.read_csv('iris.data', header=None)
	iris_type = data[4].unique()
	for i, type in enumerate(iris_type):
		data.set_value(data[4] == type, 4, 1)
	# print('--------------------------')
	# print(data)
	'''
	# 方法2：numpy读取
	data = np.loadtxt('iris.data', dtype=float, delimiter=',', converters={4:iris_type})
 
	x, y = np.split(data, (4,), axis=1)
	# y = y.reshape((-1))
	# print(x)
	print('--------------------------')
	# print(y.ravel())
	x = x[:, :2]
	x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=1, train_size=0.6)
	# print(x_train)
	# print('--------------------------')
	# print(y_train.ravel())
 
	# 分类器
	# 高斯核
	# clf = svm.SVC(C=0.8, kernel='rbf', gamma=50, decision_function_shape='ovr')
	# 线性核
	clf = svm.SVC(C=0.5, kernel='linear', decision_function_shape='ovr')
	clf.fit(x_train, y_train.ravel())
 
	# 中间结果的输出
	print('trianing prediction:%.3f' %(clf.score(x_train, y_train)))
	# 预测值
	y_hat = clf.predict(x_train)
	show_accuracy(y_hat, y_train, 'traing data')
	print('test data prediction:%.3f' %(clf.score(x_test, y_test)))
	y_hat_test = clf.predict(x_test)
	show_accuracy(y_hat_test, y_test, 'testing data')
 
	# decision function
	print('decision_function:\n', clf.decision_function(x_train))
	# print('\npredict:\n', clf.predict(x_train).reshape(-1, 1))
	print('\npredict:\n', clf.predict(x_train))
 
	# 开始画图
	x1_min, x1_max = x[:, 0].min(), x[:, 0].max()
	x2_min, x2_max = x[:, 1].min(), x[:, 1].max()
	# 生成网格采样点
	x1, x2 = np.mgrid[x1_min:x1_max:200j, x2_min:x2_max:200j]
	# 测试点
	grid_test = np.stack((x1.flat, x2.flat), axis=1)
	print('grid_test:\n', grid_test)
	# 输出样本到决策面的距离
	z = clf.decision_function(grid_test)
	print('the distance to decision plane:\n', z)
 
	# 预测分类值
	grid_hat = clf.predict(grid_test)
	print('grid_hat:\n', grid_hat)
	# reshape grid_hat和x1形状一致
	grid_hat = grid_hat.reshape(x1.shape)
 
	cm_light = mpl.colors.ListedColormap(['#A0FFA0', '#FFA0A0', '#A0A0FF'])
	cm_dark = mpl.colors.ListedColormap(['g', 'b', 'r'])
 
	plt.pcolormesh(x1, x2, grid_hat, cmap=cm_light)
	# 样本点
	plt.scatter(x[:, 0], x[:, 1], c=np.squeeze(y), edgecolor='k', s=50, cmap=cm_dark)
	# 测试点
	plt.scatter(x_test[:, 0], x_test[:, 1], s=120, facecolor='none', zorder=10)
	plt.xlabel(iris_feature[0], fontsize=20)
	plt.ylabel(iris_feature[1], fontsize=20)
	plt.xlim(x1_min, x1_max)
	plt.ylim(x2_min, x2_max)
	plt.title('svm in iris data classification', fontsize=30)
	plt.grid()
	plt.show()

-------

方法二：使用tf.keras 和 SVM

先使用CNN进行训练，利用TensorFlow函数将CNN全连接层的值取出来，给SVM进行训练

from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation,Flatten
from keras.layers.convolutional import Convolution2D, MaxPooling2D
from keras.optimizers import SGD
from keras.datasets import mnist
from keras.layers import BatchNormalization
from sklearn.svm import SVC
import theano
from keras.utils import np_utils


def svc(traindata,trainlabel,testdata,testlabel):
    print("Start training SVM...")
    svcClf = SVC(C=1.0,kernel="rbf",cache_size=3000)
    svcClf.fit(traindata,trainlabel)

    pred_testlabel = svcClf.predict(testdata)
    num = len(pred_testlabel)
    accuracy = len([1 for i in range(num) if testlabel[i]==pred_testlabel[i]])/float(num)
    print("cnn-svm Accuracy:",accuracy)

#each add as one layer
model = Sequential()

#1 .use convolution,pooling,full connection
model.add(Convolution2D(5, 3, 3,border_mode='valid',input_shape=(1, 28, 28),activation='tanh'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Convolution2D(10, 3, 3,activation='tanh'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())
model.add(Dense(100,activation='tanh')) #Full connection

model.add(Dense(10,activation='softmax'))

#2 .just only user full connection
# model.add(Dense(100,input_dim = 784, init='uniform',activation='tanh'))
# model.add(Dense(100,init='uniform',activation='tanh'))
# model.add(Dense(10,init='uniform',activation='softmax'))

# sgd = SGD(lr=0.2, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(optimizer='sgd', loss='categorical_crossentropy')

(X_train, y_train), (X_test, y_test) = mnist.load_data()
#change data type,keras category need ont hot
#2 reshape
#X_train = X_train.reshape(X_train.shape[0],X_train.shape[1]*X_train.shape[2]) #X_train.shape[0] 60000 X_train.shape[1] 28 X_train.shape[2] 28
#1 reshape
X_train = X_train.reshape(X_train.shape[0],1,X_train.shape[1],X_train.shape[2])

Y_train = np_utils.to_categorical(y_train, 10)

#new label for svm
y_train_new = y_train[0:42000]
y_test_new = y_train[42000:]

#new train and test data
X_train_new = X_train[0:42000]
X_test = X_train[42000:]
Y_train_new = Y_train[0:42000]
Y_test = Y_train[42000:]

model.fit(X_train_new, Y_train_new, batch_size=200, nb_epoch=100,shuffle=True, verbose=1, show_accuracy=True, validation_split=0.2)
print("Validation...")
val_loss,val_accuracy = model.evaluate(X_test, Y_test, batch_size=1,show_accuracy=True)
print "val_loss: %f" %val_loss
print "val_accuracy: %f" %val_accuracy

#define theano funtion to get output of FC layer
get_feature = theano.function([model.layers[0].input],model.layers[5].get_output(train=False),allow_input_downcast=False)
FC_train_feature = get_feature(X_train_new)
FC_test_feature = get_feature(X_test)
svc(FC_train_feature,y_train_new,FC_test_feature,y_test_new)