TOPICS
Search

Correlation Coefficient--Bivariate Normal Distribution


For a bivariate normal distribution, the distribution of correlation coefficients is given by

P(r)=1/pi(N-2)(1-r^2)^((N-4)/2)(1-rho^2)^((N-1)/2)int_0^infty(dbeta)/((coshbeta-rhor)^(N-1))
(1)
=1/pi(N-2)(1-r^2)^((N-4)/2)(1-rho^2)^((N-1)/2)sqrt(pi/2)(Gamma(N-1))/(Gamma(N-1/2))×(1-rhor)^(-(N-3/2))_2F_1(1/2,1/2,(2N-1)/2;(rhor+1)/2)
(2)
=((N-2)Gamma(N-1)(1-rho^2)^((N-1)/2)(1-r^2)^((N-4)/2))/(sqrt(2pi)Gamma(N-1/2)(1-rhor)^(N-3/2))×[1+1/4(rhor+1)/(2N-1)+9/(16)((rhor+1)^2)/((2N-1)(2N+1))+...],
(3)

where rho is the population correlation coefficient, _2F_1(a,b;c;x) is a hypergeometric function, and Gamma(z) is the gamma function (Kenney and Keeping 1951, pp. 217-221). The moments are

<r>=rho-(rho(1-rho^2))/(2n)
(4)
var(r)=((1-rho^2)^2)/n(1+(11rho^2)/(2n)+...)
(5)
gamma_1=(6rho)/(sqrt(n))(1+(77rho^2-30)/(12n)+...)
(6)
gamma_2=6/n(12rho^2-1)+...,
(7)

where n=N-1. If the variates are uncorrelated, then rho=0 and

_2f_1(1/2,1/2,(2n-1)/2;(rhor+1)/2)=_2F_1(1/2,1/2,(2N-1)/2;1/2)
(8)
=(Gamma(N-1/2)2^(3/2-N)sqrt(pi))/([Gamma(N/2)]^2),
(9)

so

P(r)=((N-2)Gamma(N-1))/(sqrt(2pi)Gamma(N-1/2))(1-r^2)^((N-4)/2)(Gamma(N-1/2)2^(3/2-N)sqrt(pi))/([Gamma(N/2)]^2)
(10)
=(2^(1-N)(N-2)Gamma(N-1))/([Gamma(N/2)]^2)(1-r^2)^((N-4)/2).
(11)

But from the Legendre duplication formula,

 sqrt(pi)Gamma(N-1)=2^(N-2)Gamma(N/2)Gamma((N-1)/2),
(12)

so

P(r)=((2^(1-N))(2^(N-2))(N-2)Gamma(N/2)Gamma((N-1)/2))/(sqrt(pi)[Gamma(N/2)]^2)(1-r^2)^((N-4)/2)
(13)
=((N-2)Gamma((N-1)/2))/(2sqrt(pi)Gamma(N/2))(1-r^2)^((N-4)/2)
(14)
=1/(sqrt(pi))(nu/2Gamma((nu+1)/2))/(Gamma(nu/2+1))(1-r^2)^((nu-2)/2)
(15)
=1/(sqrt(pi))(Gamma((nu+1)/2))/(Gamma(nu/2))(1-r^2)^((nu-2)/2).
(16)

The uncorrelated case can be derived more simply by letting beta be the true slope, so that eta=alpha+betax. Then

 t=(b-beta)(s_x)/(s_y)sqrt((N-2)/(1-r^2))=((b-beta)r)/bsqrt((N-2)/(1-r^2))
(17)

is distributed as Student's t with nu=N-2 degrees of freedom. Let the population regression coefficient rho be 0, then beta=0, so

 t=rsqrt(nu/(1-r^2)),
(18)

and the distribution is

 P(t)dt=1/(sqrt(nupi))(Gamma((nu+1)/2))/(Gamma(nu/2)(1+(t^2)/nu)^((nu+1)/2))dt.
(19)

Plugging in for t and using

dt=sqrt(nu)[(sqrt(1-r^2)-r(1/2)(-2r)(1-r^2)^(-1/2))/(1-r^2)]dr
(20)
=sqrt(nu/(1-r^2))((1-r^2+r^2)/(1-r^2))dr
(21)
=sqrt(nu/((1-r)^3))dr
(22)

gives

P(t)dt=1/(sqrt(nupi))(Gamma((nu+1)/2))/(Gamma(nu/2)[1+(r^2nu)/((1-r^2)nu)]^((nu+1)/2))sqrt(nu/((1-r)^3))dr
(23)
=((1-r^2)^(-3/2))/(sqrt(pi))(Gamma((nu+1)/2))/(Gamma(nu/2)(1/(1-r^2))^((nu+1)/2))dr
(24)
=1/(sqrt(pi))(Gamma((nu+1)/2))/(Gamma(nu/2))(1-r^2)^(-3/2)(1-r^2)^((nu+1)/2)dr
(25)
=1/(sqrt(pi))(Gamma((nu+1)/2))/(Gamma(nu/2))(1-r^2)^((nu-2)/2)dr,
(26)

so

 P(r)=1/(sqrt(pi))(Gamma((nu+1)/2))/(Gamma(nu/2))(1-r^2)^((nu-2)/2)
(27)

as before. See Bevington (1969, pp. 122-123) or Pugh and Winslow (1966, §12-8). If we are interested instead in the probability that a correlation coefficient would be obtained >=|r|, where r is the observed coefficient, then

P_c(r,N)=2int_(|r|)^1P(r^',N)dr^'
(28)
=1-2int_0^(|r|)P(r^',N)dr^'
(29)
=1-2/(sqrt(pi))(Gamma((nu+1)/2))/(Gamma(nu/2))int_0^(|r|)(1-r^2)^((nu-2)/2)dr.
(30)

Let I=1/2(nu-2). For even nu, the exponent I is an integer so, by the binomial theorem,

 (1-r^2)^I=sum_(k=0)^I(I; k)(-r^2)^k
(31)

and

P_c(r)=1-2/(sqrt(pi))(Gamma((nu+1)/2))/(Gamma(nu/2))(-1)^k(I!)/((I-k)!k!)int_0^(|r|)sum_(k=0)^(I)r^('2k)dr^'
(32)
=1-2/(sqrt(pi))(Gamma((nu+1)/2))/(Gamma(nu/2))sum_(k=0)^(I)[(-1)^k(I!)/((I-k)!k!)(|r|^(2k+1))/(2k+1)].
(33)

For odd nu, the integral is

P_c(r)=1-2int_0^(|r|)P(r^')dr^'
(34)
=1-2/(sqrt(pi))(Gamma((nu+1)/2))/(Gamma(nu/2))int_0^(|r|)(sqrt(1-r^2))^(nu-2)dr.
(35)

Let r=sinx so dr=cosxdx, then

P_c(r)=1-2/(sqrt(pi))(Gamma[((nu+1)/2)])/(Gamma(nu/2))int_0^(sin^(-1)|r|)cos^(nu-2)xcosxdx
(36)
=1-2/(sqrt(pi))(Gamma((nu+1)/2))/(Gamma(nu/2))+int_0^(sin^(-1)|r|)cos^(nu-1)xdx.
(37)

But nu is odd, so nu-1=2n is even. Therefore

 2/(sqrt(pi))(Gamma((nu+1)/2))/(Gamma(nu/2))=2/pi((2n)!!)/((2n-1)!!).
(38)

Combining with the result from the cosine integral gives

 P_c(r)=1-2/pi((2n)!!(2n-1)!!)/((2n-1)!!(2n)!!)[sinxsum_(k=0)^(n-1)((2k)!!)/((2k+1)!!)cos^(2k+1)x+x]_0^(sin^(-1)|r|).
(39)

Use

 cos^(2k-1)x=(1-r^2)^((2k-1)/2)=(1-r^2)^((k-1/2)),
(40)

and define J=n-1=(nu-3)/2, then

 P_c(r)=1-2/pi[sin^(-1)|r|+|r|sum_(k=0)^J((2k)!!)/((2k+1)!!)(1-r^2)^(k+1/2)].
(41)

(In Bevington 1969, this is given incorrectly.) Combining the correct solutions

 P_c(r)={1-2/(sqrt(pi))(Gamma[(nu+1)/2])/(Gamma(nu/2))sum_(k=0)^I[(-1)^k(I!)/((I-k)!k!)(|r|^(2k+1))/(2k+1)];   for nu even; 1-2/pi[sin^(-1)|r|+|r|sum_(k=0)^J((2k)!!)/((2k+1)!!)(1-r^2)^(k+1/2)];   for nu odd
(42)

If rho!=0, a skew distribution is obtained, but the variable z defined by

 z=tanh^(-1)r
(43)

is approximately normal with

mu_z=tanh^(-1)rho
(44)
sigma_z^2=1/(N-3)
(45)

(Kenney and Keeping 1962, p. 266).

Let b_j be the slope of a best-fit line, then the multiple correlation coefficient is

 R^2=sum_(j=1)^n(b_j(s_(jy)^2)/(s_y^2))=sum_(j=1)^n(b_j(s_j)/(s_y)r_(jy)),
(46)

where s_(jy) is the sample variance.

On the surface of a sphere,

 r=(intfgdOmega)/(intfdOmegaintgdOmega),
(47)

where dOmega is a differential solid angle. This definition guarantees that -1<r<1. If f and g are expanded in real spherical harmonics,

f(theta,phi)=sum_(l=0)^(infty)sum_(m=0)^(l)[C_l^mY_l^m^c(theta,phi)sin(mphi)+S_l^mY_l^m^s(theta,phi)]
(48)
g(theta,phi)=sum_(l=0)^(infty)sum_(m=0)^(l)[A_l^mY_l^m^c(theta,phi)sin(mphi)+B_l^mY_l^m^s(theta,phi)].
(49)

Then

 r_l=(sum_(m=0)^(l)(C_l^mA_l^m+S_l^mB_l^m))/(sqrt(sum_(m=0)^(l)(C_l^m^2+S_l^m^2))sqrt(sum_(m=0)^(l)(A_l^m^2+B_l^m^2))).
(50)

The confidence levels are then given by

G_1(r)=r
(51)
G_2(r)=r(1+1/2s^2)=1/2r(3-r^2)
(52)
G_3(r)=r[1+1/2s^2(1+3/4s^2)]=1/8r(15-10r^2+3r^4)
(53)
G_4(r)=r{1+1/2s^2[1+3/4s^2(1+5/6s^2)]}
(54)
=1/(16)r(35-35r^2+21r^4-5r^6),
(55)

where

 s=sqrt(1-r^2)
(56)

(Eckhardt 1984).


See also

Correlation Coefficient, Fisher's z-'-Transformation, Spearman Rank Correlation Coefficient, Spherical Harmonic

Explore with Wolfram|Alpha

References

Bevington, P. R. Data Reduction and Error Analysis for the Physical Sciences. New York: McGraw-Hill, 1969.Eckhardt, D. H. "Correlations Between Global Features of Terrestrial Fields." Math. Geology 16, 155-171, 1984.Kenney, J. F. and Keeping, E. S. Mathematics of Statistics, Pt. 2, 2nd ed. Princeton, NJ: Van Nostrand, 1951.Kenney, J. F. and Keeping, E. S. Mathematics of Statistics, Pt. 1, 3rd ed. Princeton, NJ: Van Nostrand, 1962.Pugh, E. M. and Winslow, G. H. The Analysis of Physical Measurements. Reading, MA: Addison-Wesley, 1966.

Cite this as:

Weisstein, Eric W. "Correlation Coefficient--Bivariate Normal Distribution." From MathWorld--A Wolfram Web Resource. https://mathworld.wolfram.com/CorrelationCoefficientBivariateNormalDistribution.html

Subject classifications