时间序列数据的ADF检验—adfuller()函数的模型拟合系数介绍

@创建于：20210318@修改于：20210318文章目录1、背景2、单位根检验（Unit Root Test）理论3、python接口解释3.1 adfuller API介绍3.2 参数3.2 输出内容4、实例化参数介绍5、参考资料1、背景在Holt、Holt-Winters(ExponentialSmoothing)、ARMA、ARIMA这样的自回归模型中，模型对时间序列数据的平稳是有要求的

文章共5,246字 · 阅读需要大约18分钟

一键AI生成摘要，助你高效阅读

问答

条件漫步

18632人浏览 · 2021-03-18 13:05:44

条件漫步 · 2021-03-18 13:05:44 发布

@创建于：20210318
@修改于：20210318

文章目录

1、背景

在Holt、Holt-Winters(ExponentialSmoothing)、ARMA、ARIMA这样的自回归模型中，模型对时间序列数据的平稳是有要求的，因此，需要对数据或者数据的n阶差分进行平稳检验，而一种常见的方法就是ADF检验，即单位根检验。

在数学中，平稳随机过程（Stationary random process）或者严平稳随机过程（Strictly-sense stationary random process），又称狭义平稳过程。

平稳随机过程是在固定时间和位置的概率分布与所有时间和位置的概率分布相同的随机过程，即随机过程的统计特性不随时间的推移而变化，因此数学期望和方差这些参数不随时间和位置变化。

Ref：百度百科平稳随机过程

2、单位根检验（Unit Root Test）理论

单位根检验（Unit Root Test）单位根检验是针对宏观经济数据序列、货币金融数据序列中是否具有某种统计特性而提出的一种平稳性检验的特殊方法，单位根检验的方法有很多种，包括ADF检验、PP检验、NP检验等。
Ref：MBA智库百科

单位根检验的零假设是原序列是非平稳的。

在这里插入图片描述

Ref：单位根检验详解

3、python接口解释

3.1 adfuller API介绍

Ref：官网资料 statsmodels.tsa.stattools.adfuller

pip install statsmodels

from statsmodels.tsa.stattools import adfuller
或者
statsmodels.tsa.stattools.adfuller()

adfuller(
    x,
    maxlag=None,
    regression="c",
    autolag="AIC",
    store=False,
    regresults=False,
)

Ref：如何查看adfuller()函数的模型拟合系数

3.2 参数

x：array_like，1d，要测试的数据系列。
maxlag：测试中包含的最大延迟，默认为12 *（nobs / 100）^ {1/4}。
regression：{‘c’，‘ct’，‘ctt’，‘nc’}，包含在回归中的常量和趋势顺序。‘c’：仅限常量（默认值）。 ‘ct’：恒定和趋势。 ‘ctt’：常数，线性和二次趋势。 ‘nc’：没有恒定，没有趋势。
autolag： {‘AIC’，‘BIC’，‘t-stat’，None}自动确定滞后时使用的方法。如果为None，则使用maxlag滞后。如果是’AIC’（默认值）或’BIC’，则选择滞后数以最小化相应的信息标准。基于’t-stat’的maxlag选择。从maxlag开始并使用5％大小的测试来降低延迟，直到最后一个滞后长度的t统计量显着为止。
store：bool，如果为True，则另外返回adf统计信息的结果实例。默认值为False。
regresults：bool，optional，如果为True，则返回完整的回归结果。默认值为False。

Parameters
    ----------
    x : array_like, 1d
        The data series to test.
    maxlag : int
        Maximum lag which is included in test, default 12*(nobs/100)^{1/4}.
    regression : {"c","ct","ctt","nc"}
        Constant and trend order to include in regression.

        * "c" : constant only (default).
        * "ct" : constant and trend.
        * "ctt" : constant, and linear and quadratic trend.
        * "nc" : no constant, no trend.

    autolag : {"AIC", "BIC", "t-stat", None}
        Method to use when automatically determining the lag length among the
        values 0, 1, ..., maxlag.

        * If "AIC" (default) or "BIC", then the number of lags is chosen
          to minimize the corresponding information criterion.
        * "t-stat" based choice of maxlag.  Starts with maxlag and drops a
          lag until the t-statistic on the last lag length is significant
          using a 5%-sized test.
        * If None, then the number of included lags is set to maxlag.
    store : bool
        If True, then a result instance is returned additionally to
        the adf statistic. Default is False.
    regresults : bool, optional
        If True, the full regression results are returned. Default is False.

3.2 输出内容

ADF：float，测试统计。
pvalue：float，probability value：MacKinnon基于MacKinnon的近似p值（1994年，2010年）。
usedlag：int，使用的滞后数量。
NOBS：int，用于ADF回归和计算临界值的观察数。
critical values：dict，测试统计数据的临界值为1％，5％和10％。基于MacKinnon（2010）。
icbest：float，如果autolag不是None，则最大化信息标准。
resstore：ResultStore, optional，一个虚拟类，其结果作为属性附加。

Returns
    -------
    adf : float
        The test statistic.
    pvalue : float
        MacKinnon"s approximate p-value based on MacKinnon (1994, 2010).
    usedlag : int
        The number of lags used.
    nobs : int
        The number of observations used for the ADF regression and calculation
        of the critical values.
    critical values : dict
        Critical values for the test statistic at the 1 %, 5 %, and 10 %
        levels. Based on MacKinnon (2010).
    icbest : float
        The maximized information criterion if autolag is not None.
    resstore : ResultStore, optional
        A dummy class with results attached as attributes.

4、实例化参数介绍

4.1 程序实现

# -*- coding:UTF-8 -*- 
from statsmodels.tsa.stattools import adfuller
import numpy as np
import pandas as pd

seq = np.array([1, 2, 3, 4, 5, 7, 5, 1, 54, 3, 6, 87, 45, 14, 24])

result = adfuller(seq, autolag='AIC')
print("\nresult is\n{}".format(result))

result_fromat = pd.Series(result[0:4], index=['Test Statistic','p-value','Lags Used','Number of Observations Used'])
for k, v in result[4].items():
    result_fromat['Critical Value (%s)' % k] = v
result_fromat['The maximized information criterion if autolag is not None.'] = result[5]
print("\nresult_fromat is\n{}".format(result_fromat))

print("\n\n===== adfuller()的回归模型系数 =====")
[t, p, c, r] = adfuller(x=seq, regression='ctt', regresults=True)

print("r.resols.summary() is")
print(r.resols.summary())

print("\nr.resols.params are")
print(r.resols.params)

4.2 运行结果

result is
(-0.012544765454616165, 0.9574822652420663, 5, 9, {'1%': -4.473135048010974, '5%': -3.28988060356653, '10%': -2.7723823456790124}, 84.80988245795896)

result_fromat is
Test Statistic                                                 -0.012545
p-value                                                         0.957482
Lags Used                                                       5.000000
Number of Observations Used                                     9.000000
Critical Value (1%)                                            -4.473135
Critical Value (5%)                                            -3.289881
Critical Value (10%)                                           -2.772382
The maximized information criterion if autolag is not None.    84.809882
dtype: float64


===== adfuller()的回归模型系数 =====
C:\ProgramData\Anaconda3\envs\tsp\lib\site-packages\scipy\stats\stats.py:1603: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=11
  warnings.warn("kurtosistest only valid for n>=20 ... continuing "
r.resols.summary() is
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.935
Model:                            OLS   Adj. R-squared:                  0.838
Method:                 Least Squares   F-statistic:                     9.598
Date:                Thu, 18 Mar 2021   Prob (F-statistic):             0.0232
Time:                        13:02:48   Log-Likelihood:                -40.193
No. Observations:                  11   AIC:                             94.39
Df Residuals:                       4   BIC:                             97.17
Df Model:                           6                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1           -13.5130      3.840     -3.519      0.024     -24.176      -2.850
x2            10.1172      3.266      3.098      0.036       1.049      19.185
x3             6.1048      2.185      2.794      0.049       0.039      12.170
x4             2.1014      0.991      2.120      0.101      -0.650       4.853
const         45.0063     25.406      1.771      0.151     -25.532     115.545
x5           -13.2599     10.376     -1.278      0.270     -42.068      15.549
x6             5.8638      2.057      2.850      0.046       0.152      11.575
==============================================================================
Omnibus:                        0.004   Durbin-Watson:                   1.371
Prob(Omnibus):                  0.998   Jarque-Bera (JB):                0.214
Skew:                           0.004   Prob(JB):                        0.899
Kurtosis:                       2.317   Cond. No.                         397.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

r.resols.params are
[-13.51297728  10.11721623   6.10477404   2.10143413  45.00633172
 -13.25988671   5.86376013]

下面的结果是基于==statsmodels 0.12.2==版本运行，与原来连接结果有所不同。
Ref：Python ADF 单位根检验如何查看结果的实现
 Ref：如何查看adfuller()函数的模型拟合系数

5、参考资料

References
    ----------
    .. [1] W. Green.  "Econometric Analysis," 5th ed., Pearson, 2003.

    .. [2] Hamilton, J.D.  "Time Series Analysis".  Princeton, 1994.

    .. [3] MacKinnon, J.G. 1994.  "Approximate asymptotic distribution functions for
        unit-root and cointegration tests.  `Journal of Business and Economic
        Statistics` 12, 167-76.

    .. [4] MacKinnon, J.G. 2010. "Critical Values for Cointegration Tests."  Queen"s
        University, Dept of Economics, Working Papers.  Available at
        http://ideas.repec.org/p/qed/wpaper/1227.html