Correlation Coefficient

EXPLORE THIS TOPIC IN the MathWorld Classroom

The correlation coefficient, sometimes also called the cross-correlation coefficient, Pearson correlation coefficient (PCC), Pearson's r, the Perason product-moment correlation coefficient (PPMCC), or the bivariate correlation, is a quantity that gives the quality of a least squares fitting to the original data. To define the correlation coefficient, first consider the sum of squared values ss_(xx), ss_(xy), and ss_(yy) of a set of n data points (x_i,y_i) about their respective means,

ss_(xx)=sum(x_i-x^_)^2
(1)
=sumx^2-2x^_sumx+sumx^_^2
(2)
=sumx^2-2nx^_^2+nx^_^2
(3)
=sumx^2-nx^_^2
(4)
ss_(yy)=sum(y_i-y^_)^2
(5)
=sumy^2-2y^_sumy+sumy^_^2
(6)
=sumy^2-2ny^_^2+ny^_^2
(7)
=sumy^2-ny^_^2
(8)
ss_(xy)=sum(x_i-x^_)(y_i-y^_)
(9)
=sum(x_iy_i-x^_y_i-x_iy^_+x^_y^_)
(10)
=sumxy-nx^_y^_-nx^_y^_+nx^_y^_
(11)
=sumxy-nx^_y^_.
(12)

These quantities are simply unnormalized forms of the variances and covariance of X and Y given by

ss_(xx)=Nvar(X)
(13)
ss_(yy)=Nvar(Y)
(14)
ss_(xy)=Ncov(X,Y).
(15)

For linear least squares fitting, the coefficient b in

 y=a+bx
(16)

is given by

b=(nsumxy-sumxsumy)/(nsumx^2-(sumx)^2)
(17)
=(ss_(xy))/(ss_(xx)),
(18)

and the coefficient b^' in

 x=a^'+b^'y
(19)

is given by

 b^'=(nsumxy-sumxsumy)/(nsumy^2-(sumy)^2).
(20)
CorrelationCoefficient

The correlation coefficient r (sometimes also denoted R) is then defined by

r^2=bb^'
(21)
=(ss_(xy)^2)/(ss_(xx)ss_(yy)).
(22)

The correlation coefficient is also known as the product-moment coefficient of correlation or Pearson's correlation. The correlation coefficients for linear fits to increasingly noisy data are shown above.

The correlation coefficient has an important physical interpretation. To see this, define

 A=[sumx^2-nx^_^2]^(-1)
(23)

and denote the "expected" value for y_i as y^^_i. Sums of y^^_i are then

y^^_i=a+bx_i
(24)
=y^_-bx^_+bx_i
(25)
=y^_+b(x_i-x^_)
(26)
=A(y^_sumx^2-x^_sumxy+x_isumxy-nx^_y^_x_i)
(27)
=A[y^_sumx^2+(x_i-x^_)sumxy-nx^_y^_x_i]
(28)
sumy^^_i=A(ny^_sumx^2-n^2x^_^2y^_)
(29)
sumy^^_i^2=A^2[ny^_^2(sumx^2)^2-n^2x^_^2y^_^2(sumx^2)-2nx^_y^_(sumxy)(sumx^2)+2n^2x^_^3y^_(sumxy)+(sumx^2)(sumxy)^2-nx^_^2(sumxy)]
(30)
sumy_iy^^_i=Asum[y_iy^_sumx^2+y_i(x_i-x^_)sumxy-nx^_y^_x_iy_i]
(31)
=A[ny^_^2sumx^2+(sumxy)^2-nx^_y^_sumxy-nx^_y^_(sumxy)]
(32)
=A[ny^_^2sumx^2+(sumxy)^2-2nx^_y^_sumxy].
(33)

The sum of squared errors is then

SSE=sum(y^^_i-y^_)^2
(34)
=sum(y^^_i^2-2y^_y^^_i+y^_^2)
(35)
=A^2(sumxy-nx^_y^_)^2(sumx^2-nx^_^2)
(36)
=((sumxy-nx^_y^_)^2)/(sumx^2-nx^_^2)
(37)
=bss_(xy)
(38)
=(ss_(xy)^2)/(ss_(xx))
(39)
=ss_(yy)r^2
(40)
=b^2ss_(xx),
(41)

and the sum of squared residuals is

SSR=sum(y_i-y^^_i)^2
(42)
=sum(y_i-y^_+bx^_-bx_i)^2
(43)
=sum[y_i-y^_-b(x_i-x^_)]^2
(44)
=sum(y_i-y^_)^2+b^2sum(x_i-x^_)^2-2bsum(x_i-x^_)(y_i-y^_)
(45)
=ss_(yy)+b^2ss_(xx)-2bss_(xy).
(46)

But

b=(ss_(xy))/(ss_(xx))
(47)
r^2=(ss_(xy)^2)/(ss_(xx)ss_(yy)),
(48)

so

SSR=ss_(yy)+(ss_(xy)^2)/(ss_(xx)^2)ss_(xx)-2(ss_(xy))/(ss_(xx))ss_(xy)
(49)
=ss_(yy)-(ss_(xy)^2)/(ss_(xx))
(50)
=ss_(yy)(1-(ss_(xy)^2)/(ss_(xx)ss_(yy)))
(51)
=ss_(yy)(1-r^2),
(52)

and

 SSE+SSR=ss_(yy)(1-r^2)+ss_(yy)r^2=ss_(yy).
(53)

The square of the correlation coefficient r^2 is therefore given by

r^2=(SSR)/(ss_(yy))
(54)
=(ss_(xy)^2)/(ss_(xx)ss_(yy))
(55)
=((sumxy-nx^_y^_)^2)/((sumx^2-nx^_^2)(sumy^2-ny^_^2)).
(56)

In other words, r^2 is the proportion of ss_(yy) which is accounted for by the regression.

If there is complete correlation, then the lines obtained by solving for best-fit (a,b) and (a^',b^') coincide (since all data points lie on them), so solving (◇) for y and equating to (◇) gives

 y=-(a^')/(b^')+x/(b^')=a+bx.
(57)

Therefore, a=-a^'/b^' and b=1/b^', giving

 r^2=bb^'=1.
(58)

The correlation coefficient is independent of both origin and scale, so

 r(u,v)=r(x,y),
(59)

where

u=(x-x_0)/h
(60)
v=(y-y_0)/h.
(61)

Wolfram Web Resources

Mathematica »

The #1 tool for creating Demonstrations and anything technical.

Wolfram|Alpha »

Explore anything with the first computational knowledge engine.

Wolfram Demonstrations Project »

Explore thousands of free applications across science, mathematics, engineering, technology, business, art, finance, social sciences, and more.

Computerbasedmath.org »

Join the initiative for modernizing math education.

Online Integral Calculator »

Solve integrals with Wolfram|Alpha.

Step-by-step Solutions »

Walk through homework problems step-by-step from beginning to end. Hints help you try the next step on your own.

Wolfram Problem Generator »

Unlimited random practice problems and answers with built-in Step-by-step solutions. Practice online or make a printable study sheet.

Wolfram Education Portal »

Collection of teaching and learning tools built by Wolfram education experts: dynamic textbook, lesson plans, widgets, interactive Demonstrations, and more.

Wolfram Language »

Knowledge-based programming for everyone.