TOPICS
Search

Sample Variance Distribution


Let N samples be taken from a population with central moments mu_n. The sample variance m_2 is then given by

 m_2=1/Nsum_(i=1)^N(x_i-m)^2,
(1)

where m=x^_ is the sample mean.

The expected value of m_2 for a sample size N is then given by

 <s^2>=<m_2>=(N-1)/Nmu_2.
(2)

Similarly, the expected variance of the sample variance is given by

<var(s^2)>=<var(m_2)>
(3)
=((N-1)^2)/(N^3)mu_4-((N-1)(N-3)mu_2^2)/(N^3)
(4)

(Kenney and Keeping 1951, p. 164; Rose and Smith 2002, p. 264).

The algebra of deriving equation (4) by hand is rather tedious, but can be performed as follows. Begin by noting that

 var(x)=<x^2>-<x>^2,
(5)

so

 var(s^2)=<s^4>-<s^2>^2.
(6)

The value of <s^2> is already known from equation (◇), so it remains only to find <s^4>. The algebra is simplified considerably by immediately transforming variables to x_i^'=x_i-mu and performing computations with respect to these central variables. Since the variance does not depend on the mean mu of the underlying distribution, the result obtained using the transformed variables will give an identical result while immediately eliminating expectation values of sums of terms containing odd powers of x_i (which equal 0). To determine <s^4>, expand equation (6) to obtain

<s^4>=<(s^2)^2>
(7)
=<(<x^2>-<x>^2)^2>
(8)
=<[1/Nsumx_i^2-(1/Nsumx_i)^2]^2>
(9)
=1/(N^2)<(sumx_i^2)^2>-2/(N^3)<sumx_i^2(sumx_i)^2>+1/(N^4)<(sumx_i)^4>.
(10)

Working on the first term of (10),

<(sumx_i^2)^2>=<sumx_i^4+sum_(i!=j)x_i^2x_j^2>
(11)
=<sumx_i^4>+<sum_(i!=j)x_i^2x_j^2>
(12)
=N<x_i^4>+N(N-1)<x_i^2><x_j^2>
(13)
=Nmu_4+N(N-1)mu_2^2.
(14)

The second term of (◇) is given by

<sumx_i^2(sumx_j)^2>=<sumx_i^4+sum_(i!=j)x_i^2x_j^2+2sum_(i!=j)x_i^3x_j+sum_(i!=j!=k)x_i^2x_jx_k>
(15)
=Nmu_4+N(N-1)mu_2^2,
(16)

and the third term by

<(sumx_i)^4>=<sumx_i^4+3sum_(i!=j)x_i^2x_j^2+4sum_(i!=j)x_i^3x_j+6sum_(i!=j!=k)x_i^2x_jx_k+sum_(i!=j!=k!=l)x_ix_jx_kx_l>
(17)
=<sumx_i^4>+3<sum_(i!=j)x_i^2x_j^2>
(18)
=Nmu_4+3N(N-1)mu_2^2.
(19)

Plugging (◇)-(19) into (◇) then gives

<s^4>=1/(N^2)[Nmu_4+N(N-1)mu_2^2]-2/(N^3)[Nmu_4+N(N-1)mu_2^2]+1/(N^4)[Nmu_4+3N(N-1)mu_2^2]
(20)
=(1/N-2/(N^2)+1/(N^3))mu_4+[(N-1)/N-(2(N-1))/(N^2)+(3(N-1))/(N^3)]mu_2^2
(21)
=((N^2-2N+1)/(N^3))mu_4+((N-1)(N^2-2N+3))/(N^3)mu_2^2
(22)
=((N-1)[(N-1)mu_4+(N^2-2N+3)mu_2^2])/(N^3)
(23)

(Kenney and Keeping 1951, p. 164). Plugging (◇) and (23) into (◇) then gives

var(s^2)=<s^4>-<s^2>^2
(24)
=((N-1)[(N-1)mu_4-(N-3)mu_2^2])/(N^3),
(25)

as before.

SampleVarianceDistribution

For a normal distribution, mu_2=sigma^2 and mu_4=3sigma^4, so

m_1(s_(Gaussian)^2)=((N-1)sigma^2)/N
(26)
m_2(s_(Gaussian)^2)=(2(N-1)sigma^4)/(N^2).
(27)

The third ane fourth moments of s_(Gaussian)^2 are given by

m_3(s_(Gaussian)^2)=(8(N-1)sigma^6)/(N^3)
(28)
m_4(s_(Gaussian)^2)=(12(N-1)(N+3)sigma^8)/(N^4),
(29)

giving the skewness and kurtosis excess of the distribution of the s_(Gaussian)^2 as

gamma_1(s_(Gaussian)^2)=sqrt(8/(N-1))
(30)
gamma_2(s_(Gaussian)^2)=(12)/(N-1),
(31)

as computed by Student. Student also conjectured that the underlying distribution is Pearson type III distribution

 f(s^2)=((N/(2sigma^2))^((N-1)/2))/(Gamma((N-1)/2))(s^2)^((N-3)/2)e^(-Ns^2/(2sigma^2)),
(32)

where Gamma(z) is the gamma function--a conjecture that was subsequently proven by R. A. Fisher. Curves are illustrated above for sigma=1 and N varying from N=1 to 10.


See also

Mean Distribution, Sample, Sample Variance, Sample Variance Computation, Standard Deviation Distribution, Variance

Explore with Wolfram|Alpha

References

Kenney, J. F. and Keeping, E. S. Mathematics of Statistics, Pt. 2, 2nd ed. Princeton, NJ: Van Nostrand, 1951.Rose, C. and Smith, M. D. Mathematical Statistics with Mathematica. New York: Springer-Verlag, 2002.

Referenced on Wolfram|Alpha

Sample Variance Distribution

Cite this as:

Weisstein, Eric W. "Sample Variance Distribution." From MathWorld--A Wolfram Web Resource. https://mathworld.wolfram.com/SampleVarianceDistribution.html

Subject classifications