Covariance provides a measure of the strength of the correlation between two or more sets of random variates. The covariance for two random variates and
,
each with sample size
, is defined by the expectation
value
|
(1)
| |||
|
(2)
|
where
and
are the respective means,
which can be written out explicitly as
|
(3)
|
For uncorrelated variates,
|
(4)
|
so the covariance is zero. However, if the variables are correlated in some way, then their covariance will be nonzero. In fact, if , then
tends to increase as
increases, and if
, then
tends to decrease as
increases. Note that while statistically independent variables
are always uncorrelated, the converse is not necessarily true.
In the special case of ,
|
(5)
| |||
|
(6)
|
so the covariance reduces to the usual variance . This motivates the use
of the symbol
,
which then provides a consistent way of denoting the variance
as
, where
is the standard deviation.
The derived quantity
|
(7)
| |||
|
(8)
|
is called statistical correlation of and
.
The covariance is especially useful when looking at the variance of the sum of two random variates, since
|
(9)
|
The covariance is symmetric by definition since
|
(10)
|
Given
random variates denoted
,
...,
, the covariance
of
and
is defined by
|
(11)
| |||
|
(12)
|
where
and
are the means
of
and
, respectively. The matrix
of the quantities
is called the covariance
matrix.
The covariance obeys the identities
|
(13)
| |||
|
(14)
| |||
|
(15)
| |||
|
(16)
|
By induction, it therefore follows that
|
(17)
| |||
|
(18)
| |||
|
(19)
| |||
|
(20)
| |||
|
(21)
|