python计算协方差矩阵_numpy协方差矩阵numpy.cov

Estimate a covariance matrix, given data and weights.Covariance indicates the level to which two variables vary together. If we examine N-dimensional samples,, then the covariance matrix elementis t..

weixin_39842271

957人浏览 · 2020-11-28 13:09:02

weixin_39842271 · 2020-11-28 13:09:02 发布

Estimate a covariance matrix, given data and weights.

Covariance indicates the level to which two variables vary together. If we examine N-dimensional samples,

$b3ab85e218058b4c30da2e0a11924c92136f0aa6.svg$ , then the covariance matrix element

$4ed77223efc2c4d9753baa716e137897ded9d6a6.svg$ is the covariance of

$7720e563212e11bf72de255ab82c2a3b97c1a7f5.svg$ and

$bdb2d04d69b82c2288f5ef46664d548355e130af.svg$ . The element

$559d3e41d69c86f7dcb2ad7d0adbdb43d00ad259.svg$ is the variance of

$7720e563212e11bf72de255ab82c2a3b97c1a7f5.svg$ .

See the notes for an outline of the algorithm.

Parameters:

m : array_like

A 1-D or 2-D array containing multiple variables and observations. Each row （行） of m represents a variable（变量）, and each column（列） a single observation of all those variables（样本）. Also see rowvar below.

y : array_like, optional

An additional set of variables and observations. y has the same form as that of m.

rowvar : bool, optional

If rowvar is True (default), then each row represents a variable, with observations in the columns. Otherwise, the relationship is transposed: each column represents a variable, while the rows contain observations.

bias : bool, optional

Default normalization (False) is by (N - 1), where N is the number of observations given (unbiased estimate). If bias is True, then normalization is by N. These values can be overridden by using the keyword ddof in numpy versions >= 1.5.

ddof : int, optional

If not None the default value implied by bias is overridden. Note that ddof=1 will return the unbiased estimate, even if both fweights and aweights are specified, and ddof=0 will return the simple average. See the notes for the details. The default value is None.

New in version 1.5.

fweights : array_like, int, optional

1-D array of integer freguency weights; the number of times each observation vector should be repeated.

New in version 1.10.

aweights : array_like, optional

1-D array of observation vector weights. These relative weights are typically large for observations considered “important” and smaller for observations considered less “important”. If ddof=0 the array of weights can be used to assign probabilities to observation vectors.

New in version 1.10.

Returns:

out : ndarray

The covariance matrix of the variables.

See also

Normalized covariance matrix

Notes

Assume that the observations are in the columns of the observation array m and let f = fweights and a = aweights for brevity. The steps to compute the weighted covariance are as follows:

>>> w = f * a

>>> v1 = np.sum(w)

>>> v2 = np.sum(w * a)

>>> m -= np.sum(m * w, axis=1, keepdims=True) / v1

>>> cov = np.dot(m * w, m.T) * v1 / (v1**2 - ddof * v2)

Note that when a == 1, the normalization factor v1 / (v1**2 - ddof * v2) goes over to 1 / (np.sum(f) - ddof) as it should.

Examples

Consider two variables,

$307e583980f527b3f26e1e159435e0a8d262736b.svg$ and