基于MATLAB的R方计算

大作家佚名

54520人浏览 · 2019-02-25 10:22:38

大作家佚名 · 2019-02-25 10:22:38 发布

R方计算原理

什么是R方

R-square是你以后很多数据模型都需要用到的统计量，计量模型什么的，还有回归系数显著性检验，F检验，德斌沃森统计量检验。利用数据拟合一个模型时，模型肯定存在误差，那么回归方程对观测值拟合的好坏，就叫做拟合优度。这里的R方就是拟合优度的一个统计量，也可以叫做决定系数。R方计算方法为：
$R^2=1- \frac{\sum(y-\hat{y})^2}{\sum(y-\widetilde{y})^2}$
式中用1减去y对回归方程的方差与y的总方差的比值，y减去 $\hat{y}$ 也就是残差，是拟合方程中不能解释的部分，用1减去不能解释的部分，那么剩下的就是解释的部分，也就是说自变量解释了因变量变动的百分比的多少，那么r方的值肯定是越大越好，意味着该模型把y的变动解释得好，R方的范围显然是0到1，在预测实践中，人们往往采纳R方最高的模型。除此之外，拟合优度还有另一个测定指标是相关系数，相关系数的公式：

$r=\frac{\sum_{i=1}^{n}(x_i-\widetilde{x})(y_i-\widetilde{y})}{\sqrt{\sum_{i=1}^{n}(x_i-\widetilde{x})^2\sum_{i=1}^{n}(y_i-\widetilde{y})^2}}$

从公式里面可以看出，可决系数只是相关系数的平方，它们存在的目的是为了提供互相补充的信息，它俩最著要的区别在于：相关系数有正负，正意味着因变量随自变量递增，拟合直线从左到右上升，反之意味着递减，从左到右下降。相关系数的意义不像可决系数那样明显，但也有类似的意义，与可决系数同理，它越接近于+1或者-1，拟合程度越好。

实例分析

基于MATLAB的R方计算函数(函数中有作者信息)，返回 $R^2$ 与RMSE

function [r2 rmse] = rsquare(y,f,varargin)
% Compute coefficient of determination of data fit model and RMSE
%
% [r2 rmse] = rsquare(y,f)
% [r2 rmse] = rsquare(y,f,c)
%
% RSQUARE computes the coefficient of determination (R-square) value from
% actual data Y and model data F. The code uses a general version of 
% R-square, based on comparing the variability of the estimation errors 
% with the variability of the original values. RSQUARE also outputs the
% root mean squared error (RMSE) for the user's convenience.
%
% Note: RSQUARE ignores comparisons involving NaN values.
% 
% INPUTS
%   Y       : Actual data
%   F       : Model fit
%
% OPTION
%   C       : Constant term in model
%             R-square may be a questionable measure of fit when no
%             constant term is included in the model.
%   [DEFAULT] TRUE : Use traditional R-square computation
%            FALSE : Uses alternate R-square computation for model
%                    without constant term [R2 = 1 - NORM(Y-F)/NORM(Y)]
%
% OUTPUT 
%   R2      : Coefficient of determination
%   RMSE    : Root mean squared error
%
% EXAMPLE
%   x = 0:0.1:10;
%   y = 2.*x + 1 + randn(size(x));
%   p = polyfit(x,y,1);
%   f = polyval(p,x);
%   [r2 rmse] = rsquare(y,f);
%   figure; plot(x,y,'b-');
%   hold on; plot(x,f,'r-');
%   title(strcat(['R2 = ' num2str(r2) '; RMSE = ' num2str(rmse)]))
%   
% Jered R Wells
% 11/17/11
% jered [dot] wells [at] duke [dot] edu
%
% v1.2 (02/14/2012)
%
% Thanks to John D'Errico for useful comments and insight which has helped
% to improve this code. His code POLYFITN was consulted in the inclusion of
% the C-option (REF. File ID: #34765).

if isempty(varargin); c = true; 
elseif length(varargin)>1; error 'Too many input arguments';
elseif ~islogical(varargin{1}); error 'C must be logical (TRUE||FALSE)'
else c = varargin{1}; 
end

% Compare inputs
if ~all(size(y)==size(f)); error 'Y and F must be the same size'; end

% Check for NaN
tmp = ~or(isnan(y),isnan(f));
y = y(tmp);
f = f(tmp);

if c; r2 = max(0,1 - sum((y(:)-f(:)).^2)/sum((y(:)-mean(y(:))).^2));
else r2 = 1 - sum((y(:)-f(:)).^2)/sum((y(:)).^2);
    if r2<0
    % http://web.maths.unsw.edu.au/~adelle/Garvan/Assays/GoodnessOfFit.html
        warning('Consider adding a constant term to your model') %#ok<WNTAG>
        r2 = 0;
    end
end

rmse = sqrt(mean((y(:) - f(:)).^2));

调用方式

clc
clear all
close all

% 制作输入数据，y = a*x + b
x = 0:0.1:10;
y = 2.*x + 1 + randn(size(x));
p = polyfit(x,y,1)
f = polyval(p,x);
[r2 rmse] = rsquare(y,f);
figure
plot(x,y,'b.');
hold on; 
plot(x,f,'r-');
axis equal
title(strcat(['R2 = ' num2str(r2) '; RMSE = ' num2str(rmse)]))
str = ['y = ' num2str(p(1)) 'x + ' num2str(p(2))];
gtext(str)
% text(mean(x),mean(y),str)