Correlation Coefficient
The correlation coefficient, sometimes also called the cross-correlation coefficient, is a quantity that gives the quality of a least
squares fitting to the original data. To define the correlation coefficient,
first consider the sum of squared values
,
, and
of a set of
data points
about their respective means,
|
(1)
| |||
|
(2)
| |||
|
(3)
| |||
|
(4)
| |||
|
(5)
| |||
|
(6)
| |||
|
(7)
| |||
|
(8)
| |||
|
(9)
| |||
|
(10)
| |||
|
(11)
| |||
|
(12)
|
These quantities are simply unnormalized forms of the variances and covariance of
and
given by
|
(13)
| |||
|
(14)
| |||
|
(15)
|
For linear least squares fitting, the coefficient
in
|
(16)
|
is given by
|
(17)
| |||
|
(18)
|
and the coefficient
in
|
(19)
|
is given by
|
(20)
|
The correlation coefficient
(sometimes also
denoted
) is then defined by
|
(21)
| |||
|
(22)
|
The correlation coefficient is also known as the product-moment coefficient of correlation or Pearson's correlation. The correlation coefficients for linear fits to increasingly noisy data are shown above.
The correlation coefficient has an important physical interpretation. To see this, define
|
(23)
|
and denote the "expected" value for
as
. Sums of
are then
|
(24)
| |||
|
(25)
| |||
|
(26)
| |||
|
(27)
| |||
|
(28)
| |||
|
(29)
| |||
![]() |
(30)
| ||
|
(31)
| |||
|
(32)
| |||
|
(33)
|
The sum of squared errors is then
|
(34)
| |||
|
(35)
| |||
|
(36)
| |||
|
(37)
| |||
|
(38)
| |||
|
(39)
| |||
|
(40)
| |||
|
(41)
|
and the sum of squared residuals is
|
(42)
| |||
|
(43)
| |||
|
(44)
| |||
|
(45)
| |||
|
(46)
|
But
|
(47)
| |||
|
(48)
|
so
|
(49)
| |||
|
(50)
| |||
|
(51)
| |||
|
(52)
|
and
|
(53)
|
The square of the correlation coefficient
is therefore
given by
|
(54)
| |||
|
(55)
| |||
|
(56)
|
In other words,
is the proportion
of
which is accounted for by the
regression.
If there is complete correlation, then the lines obtained by solving for best-fit
and
coincide
(since all data points lie on them), so solving (◇) for
and equating to
(◇) gives
|
(57)
|
Therefore,
and
, giving
|
(58)
|
The correlation coefficient is independent of both origin and scale, so
|
(59)
|
where
|
(60)
| |||
|
(61)
|
![A^2[ny^_^2(sumx^2)^2-n^2x^_^2y^_^2(sumx^2)-2nx^_y^_(sumxy)(sumx^2)+2n^2x^_^3y^_(sumxy)+(sumx^2)(sumxy)^2-nx^_^2(sumxy)]](/images/equations/CorrelationCoefficient/Inline92.gif)
statistics




