Covariance provides a measure of the strength of the correlation between two or more sets of random variates. The covariance for two random variates and
,
each with sample size
, is defined by the expectation
value
(1)
| |||
(2)
|
where
and
are the respective means,
which can be written out explicitly as
(3)
|
For uncorrelated variates,
(4)
|
so the covariance is zero. However, if the variables are correlated in some way, then their covariance will be nonzero. In fact, if , then
tends to increase as
increases, and if
, then
tends to decrease as
increases. Note that while statistically independent variables
are always uncorrelated, the converse is not necessarily true.
In the special case of ,
(5)
| |||
(6)
|
so the covariance reduces to the usual variance . This motivates the use
of the symbol
,
which then provides a consistent way of denoting the variance
as
, where
is the standard deviation.
The derived quantity
(7)
| |||
(8)
|
is called statistical correlation of and
.
The covariance is especially useful when looking at the variance of the sum of two random variates, since
(9)
|
The covariance is symmetric by definition since
(10)
|
Given
random variates denoted
,
...,
, the covariance
of
and
is defined by
(11)
| |||
(12)
|
where
and
are the means
of
and
, respectively. The matrix
of the quantities
is called the covariance
matrix.
The covariance obeys the identities
(13)
| |||
(14)
| |||
(15)
| |||
(16)
|
By induction, it therefore follows that
(17)
| |||
(18)
| |||
(19)
| |||
(20)
| |||
(21)
|