论文读书笔记-Support vector machine

论文标题：Support vector machine这篇论文是马波老师在模式识别课上要求我们进行翻译的，讲的内容是支持向量机。考虑到既然他单独把这篇文章拿出来让我们翻译，足以见得支持向量机SVM在模式识别中的重要作用。下面是一些摘抄：1、定义首先肯定要说说SVM的定义: The basic SVM takes a set of input data and pre

Felven

2525人浏览 · 2012-12-16 13:01:24

Felven · 2012-12-16 13:01:24 发布

论文标题：Support vector machine

这篇论文是马波老师在模式识别课上要求我们进行翻译的，讲的内容是支持向量机。考虑到既然他单独把这篇文章拿出来让我们翻译，足以见得支持向量机SVM在模式识别中的重要作用。

下面是一些摘抄：

1、定义

首先肯定要说说SVM的定义: The basic SVM takes a set of input data and predicts, for each given input, which of two possible classes forms the output, making it a non-probabilistic binary linear classifier. 这里指出了SVM是接收一系列的输入，然后是把这些输入划分为两个类，并且是一个非概率的二元线性分类器。 An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible.这里把SVM模型说成是空间中的一些点，这些点被分为两类，它们之间有一条尽可能宽的明显的鸿沟。More formally, a SVM constructs a hyperplane or a set of hyperplanes in a high-or infinite- dimensional space, which can be used for classification, regression, or other tasks.

2、准备条件

既然涉及到二分，显然并不是所有的点都能线性分开。For this reason, it was proposed that the original finite-dimensional space be mapped into a much higher-dimensional space, presumably making the separation easier in that space. 通过把空间进行维度拓展，点就能变成可以二分的了。除了这一点，还有一个很重要的方面是两个类的分解区，这里指的就是一个超平面。SVM的关键之处就是让这个超平面尽可能的宽，这样才能够更好的区分这两类。 The hyperplanes in the higher-dimensional space are defined as the set of points whose dot product with a vector in that space is constant. 其中vector是数据源特征向量的线性组合，通过这种选择，使得为常数，其中K(x,y)的值随着y离x越远而越小

3、 Linear SVM

给定一个集合其中每个xi是一个p维实向量，我们想找到一个超平面把属于yi=1和yi=-1的点尽量区分开，其中任意的超平面可以写成 w·x-b=0，其中· 是点积，w是超平面的标准向量，确定了超平面与w的偏离程度。当训练数据可分时，我们可以找到两个超平面，这样问题简化为让这两个超平面之间的距离最远。其中这两个超平面可以表示为:

w·x-b=1, w·x-b=-1，用几何学的知识，我们知道这两者之间距离为，于是我们希望最小化||w||，同时我们要防止数据落入这两个超平面之间，于是我们得到一个最优化问题：

minimize(in w,b) ||w|| subject to yi(w·xi-b)≥1 i=1..n

为了求解这个最优化问题，由于||w||很难求，需要进行变形

Minimize(in w,b)

Subject to yi(w·xi-b)≥1

引入拉格朗日乘子α,又可变形如下：

求解这个问题可以使用standard quadratic programming techniques and programs, 最后w可以表示为

4、 Dual form

5、 soft margin

考虑到并不是所有的样本都能被正确标记，当不存在这样一个超平面能区分这些样本时，the soft margin method will choose a hyperplane that splits the examples as cleanly as possible, while still maximizing the distance to the nearest cleanly split examples. The method introduces slack variables,ξi, which measure the degree of misclassfication of the data xi

yi(w·xi-b)≥1-ξi i=1..n

这样变形后，最优化问题就变成了一个在较大边界和较小错误率之间的权衡问题，式子中也需要引入一个惩罚项:

Subject to yi(w·xi-b)≥1-ξi ξi≥0

然后这个问题可以再用拉格朗日乘子法进行求解

6、 dual form

7、 nonlinear classification

The algorithm is formally similar, except that every dot product is replaced by a nonlinear kernel function. 唯一的区别就是点积用核函数替代而已，其中核函数可以用Gaussian radial basis function:

Hyperbolic tangent等等，k(xi,xj)=φ(xi)·φ(xj)，w仍然是变化空间，只不过 . 针对w的点积可以写成 .

8、 parameter selection

The effectiveness of SVM depends on the selection of kernel, the kernel’s parameters, and soft margin parameter C. A common choice is a Gaussian kernel. Best combination of C and γ is often selected by a grid search with exponentially growing sequences of C and γ.