Estimate a covariance matrix, given data and weights.

Covariance indicates the level to which two variables vary together. If we examine N-dimensional samples,

b3ab85e218058b4c30da2e0a11924c92136f0aa6.svg, then the covariance matrix element

4ed77223efc2c4d9753baa716e137897ded9d6a6.svg is the covariance of

7720e563212e11bf72de255ab82c2a3b97c1a7f5.svg and

bdb2d04d69b82c2288f5ef46664d548355e130af.svg. The element

559d3e41d69c86f7dcb2ad7d0adbdb43d00ad259.svg is the variance of

7720e563212e11bf72de255ab82c2a3b97c1a7f5.svg.

See the notes for an outline of the algorithm.

Parameters:

m : array_like

A 1-D or 2-D array containing multiple variables and observations. Each row (行) of m represents a variable(变量), and each column(列) a single observation of all those variables(样本). Also see rowvar below.

y : array_like, optional

An additional set of variables and observations. y has the same form as that of m.

rowvar : bool, optional

If rowvar is True (default), then each row represents a variable, with observations in the columns. Otherwise, the relationship is transposed: each column represents a variable, while the rows contain observations.

bias : bool, optional

Default normalization (False) is by (N - 1), where N is the number of observations given (unbiased estimate). If bias is True, then normalization is by N. These values can be overridden by using the keyword ddof in numpy versions >= 1.5.

ddof : int, optional

If not None the default value implied by bias is overridden. Note that ddof=1 will return the unbiased estimate, even if both fweights and aweights are specified, and ddof=0 will return the simple average. See the notes for the details. The default value is None.

New in version 1.5.

fweights : array_like, int, optional

1-D array of integer freguency weights; the number of times each observation vector should be repeated.

New in version 1.10.

aweights : array_like, optional

1-D array of observation vector weights. These relative weights are typically large for observations considered “important” and smaller for observations considered less “important”. If ddof=0 the array of weights can be used to assign probabilities to observation vectors.

New in version 1.10.

Returns:

out : ndarray

The covariance matrix of the variables.

See also

Normalized covariance matrix

Notes

Assume that the observations are in the columns of the observation array m and let f = fweights and a = aweights for brevity. The steps to compute the weighted covariance are as follows:

>>> w = f * a

>>> v1 = np.sum(w)

>>> v2 = np.sum(w * a)

>>> m -= np.sum(m * w, axis=1, keepdims=True) / v1

>>> cov = np.dot(m * w, m.T) * v1 / (v1**2 - ddof * v2)

Note that when a == 1, the normalization factor v1 / (v1**2 - ddof * v2) goes over to 1 / (np.sum(f) - ddof) as it should.

Examples

Consider two variables,

307e583980f527b3f26e1e159435e0a8d262736b.svg and

fdf6501bec5987984965f15d09235c645fe06ccb.svg, which correlate perfectly, but in opposite directions:

>>> x = np.array([[0, 2], [1, 1], [2, 0]]).T

>>> x

array([[0, 1, 2],

[2, 1, 0]])

Note how

307e583980f527b3f26e1e159435e0a8d262736b.svg increases while

fdf6501bec5987984965f15d09235c645fe06ccb.svg decreases. The covariance matrix shows this clearly:

>>> np.cov(x)

array([[ 1., -1.],

[-1., 1.]])

Note that element

495bbd585c230cf7bec09029cf05dfe3af27a9b4.svg, which shows the correlation between

307e583980f527b3f26e1e159435e0a8d262736b.svg and

fdf6501bec5987984965f15d09235c645fe06ccb.svg, is negative.

Further, note how x and y are combined:

>>> x = [-2.1, -1, 4.3]

>>> y = [3, 1.1, 0.12]

>>> X = np.stack((x, y), axis=0)

>>> print(np.cov(X))

[[ 11.71 -4.286 ]

[ -4.286 2.14413333]]

>>> print(np.cov(x, y))

[[ 11.71 -4.286 ]

[ -4.286 2.14413333]]

>>> print(np.cov(x))

11.71

总结

理解协方差矩阵的关键就在于牢记它的计算是不同维度之间的协方差,而不是不同样本之间。拿到一个样本矩阵,最先要明确的就是一行是一个样本还是一个维度,心中明确整个计算过程就会顺流而下,这么一来就不会迷茫了。

点击阅读全文
Logo

腾讯云面向开发者汇聚海量精品云计算使用和开发经验,营造开放的云计算技术生态圈。

更多推荐