机器学习（1）——线性回归（Linear Regression）

1. Introduction of Machine LearningStep1 : Define a set of functionStep2 : Goodness of functionStep3 : Pick the best function2. Linear Regression2.1 简介线性回归是利用数理统计中回归分析，来确定两种或两种以上变量...

Mr_ZZTC

629人浏览 · 2019-09-05 21:59:53

Mr_ZZTC · 2019-09-05 21:59:53 发布

1. Introduction of Machine Learning

Step1 : Define a set of function
Step2 : Goodness of function
Step3 : Pick the best function

2. Linear Regression

2.1 简介

线性回归是利用数理统计中回归分析，来确定两种或两种以上变量间相互依赖的定量关系的一种统计分析方法。给定数据集D = {(x1,y1),(x2,y2),...,(xm,ym)}，其中xi = (xi1；xi2；...；xid)， $yi\in \mathbb{R}$ 。“线性回归”（linear regression）试图学得一个线性模型以尽可能准确地预测实值输出标记。

线性回归是监督学习的内容，监督学习分为分类和回归两部分；而线性回归又是回归内容中基础又经典的部分。

综上所述，学习线性回归十分必要。

2.2 简单虚例

背景资料：一个人的身高和颜值（男）的关系。通过实际调查10名雄性人类，测量他们的身高，并问卷调查若干女性对男性颜值进行评分（1-10），得到10组对应数据。此处实例，我们利用机器学习中的线性回归来建立模型，以预测之后碰到男性人类的女性评价分值。ps：以上实例纯属虚构，简称虚例。

数学语言：10组数据。xi为男性身高，yi为女性评价分。

X = [160, 163, 167, 168, 171, 173, 181, 183, 185, 150]；

Y = [2, 3, 4, 5, 6, 7, 8, 9, 10,1].

（1）建立模型

根据实例对虚例进行合理的假设：

假设颜值和身高的最好模型是线性模型
假设所有数据真实
假设所有可推翻此模型的不可控因素都不存在......

模型建立： $y = b + w \cdot x$ （w and b are parameters）

xi: an attribute of input x
wi: weight
b: bias

（2）损失函数

Training Data:

Training Data
(x1,y1)	(x2,y2)	(x3,y3)	(x4,y4)	(x5,y5)	(x6,y6)	(x7,y7)	(x8,y8)	(x9,y9)	(x10,y10)
(160,2)	(163,3)	(167,4)	(168,5)	(171,6)	(173,7)	(181,8)	(183,9)	(185,10)	(150,1)

损失函数：

损失函数（有时也被成为代价函数），是我们寻找最佳模型的一种依据。在这里，我们可以这样理解：先给予w和b这两个parameters一个初始值，然后计算出初始假设函数下的男性颜值，与真实颜值进行比对，两者做差。为了反映出绝对值，可以直接平方，然后加和。

（之后会进行迭代，L最小时，就是parameters最佳的时候，下一部分会叙述）

Loss function：

$L = \sum_{n = 1}^{10}(y(n) - (b + w \cdot x(n))) ^{2}$

（3）Best Function

确定损失函数后，我们需要找出最佳的损失函数，也就是L值最小的时候的函数。在这里利用梯度下降法进行求解。梯度下降是一种求局部最优解的方法。

梯度下降法：

1.随机选取w和b

此处我们选取w = 0；b = 0；作为初始值

2.计算偏导数

$\frac{\partial L}{\partial w}= \sum_{n = 1}^{10}2(y(n) - (b + w \cdot x(n))) \cdot (-x(n))$

$\frac{\partial L}{\partial b}= \sum_{n = 1}^{10}2(y(n) - (b + w \cdot x(n))) \cdot (-1)$

3.迭代计算

此处引入学习率的概念，学习率(Learning rate)： $\eta$ 作为监督学习以及深度学习中重要的超参，其决定着目标函数能否收敛到局部最小值以及何时收敛到最小值。合适的学习率能够使目标函数在合适的时间内收敛到局部最小值。

$w(1) = w(0) - \eta\frac{\partial L}{\partial w}(w = w(0), b = b(0))$

$b(1) = b(0) - \eta\frac{\partial L}{\partial b}(w = w(0), b = b(0))$

......

$w(n) = w(n-1) - \eta\frac{\partial L}{\partial w}(w = w(n -1), b = b(n-1))$

$b(n) = b(n-1) - \eta\frac{\partial L}{\partial b}(w = w(n-1), b = b(n-1))$

4.最后一步步走向胜利

2.3 虚例伪代码（MATLAB）

%% Regression
%% Initialize
clear; close all; clc;

%% ========== Part1 Data Load ========== %%
fprintf('Loading data...\n');
x_data = [1.60, 1.63, 1.67, 1.68, 1.71, 1.73, 1.75,1.77, 1.79,1.55]; % The height of the men
y_data = [2.51, 2.52, 2.53, 2.54, 2.55, 2.56, 2.57, 2.58, 2.59, 2.5]; % Male face value
plot(x_data, y_data, 'rx', 'MarkerSize', 10);
xlabel('Height');
ylabel('Male face value');

fprintf('Program paused. Press any enter to continue.\n')
pause;

%% ========== Part2 Loss function ==========%%
m = length(y_data); % Number of training examples
% Model: y = b + w * x_data
w0 = 0;
b0 = 0;
L = 0;
for i = 1:m
L1(i) = (y_data(i) - (b0 + w0 * x_data(i)))^2 % Loss function
fprintf('',L1(i))
L = L + L1(i)
end
fprintf('With w0 = 0, b = 0\n Cost computed = %f\n', L);
fprintf('Program paused. Press enter to continue.\n');
pause;

%% ========== Part3 Gradient Descent ========== %%
iterations = 200;
Yita = 0.0015; % Learning rate
L11_w = 0;
L11_b = 0;
for j = 1:m
L1_w(i) = 2 * (y_data(i) - (b0 + w0 * x_data(i))) * (-x_data(i))
L1_b(i) = 2 * (y_data(i) - (b0 + w0 * x_data(i))) * (-1)
L11_w = L11_w + L1_w(i)
L11_b = L11_b + L1_b(i)
end
w1 = w0 - Yita * L11_w;
b1 = b0 - Yita * L11_b;

L111 = 0
for i = 1:m
L11(i) = (y_data(i) - (b1 + w1 * x_data(i)))^2 % Loss function
fprintf('',L11(i))
L111 = L111 + L11(i)
end

%% ========== Part4 Cycle ========== %%
L22_w = zeros(1,iterations);
L22_b = zeros(1,iterations);
w = zeros(1,iterations);
b = zeros(1,iterations);
for j = 1:iterations
for i = 1:m
L2_w(i) = 2 * (y_data(i) - (b(j) + w(j) * x_data(i))) * (-x_data(i))
L2_b(i) = 2 * (y_data(i) - (b(j) + w(j) * x_data(i))) * (-1)
L22_w(j) = L22_w(1) + L2_w(i)
L22_b(j) = L22_b(1) + L2_b(i)
end
w(j+1) = w(j) - Yita * L22_w(j)
b(j+1) = b(j) - Yita * L22_b(j)
end

L122 = 0
for i = 1:m
L12(i) = (y_data(i) - (b(10) + w(10) * x_data(i)))^2 % Loss function
fprintf('',L12(i))
L122 = L122 + L12(i)
end

%% ========== Part5 Visualizing ========== %%
hold on
w_final = w(11);
b_final = b(11);
y = zeros(10);
y1 = zeros(10);
y1 = b1 + w1 * x_data;
plot(x_data, y1,'r');
y = b_final + w_final * x_data;
plot(x_data, y);

%% Regression
%% Initialize
clear; close all; clc;


%% ========== Part1 Data Load ========== %%
fprintf('Loading data...\n');
x_data = [1.60, 1.63, 1.67, 1.68, 1.71, 1.73, 1.75,1.77, 1.79,1.55]; % The height of the men
y_data = [2.51, 2.52, 2.53, 2.54, 2.55, 2.56, 2.57, 2.58, 2.59, 2.5]; % Male face value
plot(x_data, y_data, 'rx', 'MarkerSize', 10);
xlabel('Height');
ylabel('Male face value');

fprintf('Program paused. Press any enter to continue.\n')
pause;


%% ========== Part2 Loss function ==========%%
m = length(y_data); % Number of training examples
% Model: y = b + w * x_data
w0 = 0;
b0 = 0;
L = 0;
for i = 1:m
    L1(i) = (y_data(i) - (b0 + w0 * x_data(i)))^2 % Loss function
    fprintf('',L1(i))
    L = L + L1(i)
end
fprintf('With w0 = 0, b = 0\n Cost computed = %f\n', L);
fprintf('Program paused. Press enter to continue.\n');
pause;


%% ========== Part3 Gradient Descent ========== %%
iterations = 200;
Yita = 0.0015; % Learning rate
L11_w = 0;
L11_b = 0;
for j = 1:m
    L1_w(i) = 2 * (y_data(i) - (b0 + w0 * x_data(i))) * (-x_data(i))
    L1_b(i) = 2 * (y_data(i) - (b0 + w0 * x_data(i))) * (-1)
    L11_w = L11_w + L1_w(i)
    L11_b = L11_b + L1_b(i)
end
w1 = w0 - Yita * L11_w;
b1 = b0 - Yita * L11_b;

L111 = 0
for i = 1:m
    L11(i) = (y_data(i) - (b1 + w1 * x_data(i)))^2 % Loss function
    fprintf('',L11(i))
    L111 = L111 + L11(i)
end


%% ========== Part4 Cycle ========== %%
L22_w = zeros(1,iterations);
L22_b = zeros(1,iterations);
w = zeros(1,iterations);
b = zeros(1,iterations);
for j = 1:iterations
    for i = 1:m
        L2_w(i) = 2 * (y_data(i) - (b(j) + w(j) * x_data(i))) * (-x_data(i))
        L2_b(i) = 2 * (y_data(i) - (b(j) + w(j) * x_data(i))) * (-1)
        L22_w(j) = L22_w(1) + L2_w(i)
        L22_b(j) = L22_b(1) + L2_b(i)
    end
    w(j+1) = w(j) - Yita * L22_w(j)
    b(j+1) = b(j) - Yita * L22_b(j)
end

L122 = 0
for i = 1:m
    L12(i) = (y_data(i) - (b(10) + w(10) * x_data(i)))^2 % Loss function
    fprintf('',L12(i))
    L122 = L122 + L12(i)
end


%% ========== Part5 Visualizing ========== %%
hold on
w_final = w(11);
b_final = b(11);
y = zeros(10);
y1 = zeros(10);
y1 = b1 + w1 * x_data;
plot(x_data, y1,'r');
y = b_final + w_final * x_data;
plot(x_data, y);