TOPICS
Search

Sample Variance Computation


When computing the sample variance s numerically, the mean must be computed before s^2 can be determined. This requires storing the set of sample values. However, it is possible to calculate s^2 using a recursion relationship involving only the last sample as follows. This means mu itself need not be precomputed, and only a running set of values need be stored at each step.

In the following, use the somewhat less than optimal notation mu_j to denote mu calculated from the first j samples (i.e., not the jth moment)

 mu_j=(sum_(i=1)^(j)x_i)/j,
(1)

and let s_j^2 denotes the value for the bias-corrected sample variance s_(N-1)^2 calculated from the first j samples. The first few values calculated for the mean are

mu_1=x_1
(2)
mu_2=(1·mu_1+x_2)/2
(3)
mu_3=(2mu_2+x_3)/3.
(4)

Therefore, for j=2, 3 it is true that

 mu_j=((j-1)mu_(j-1)+x_j)/j.
(5)

Therefore, by induction,

mu_(j+1)=([(j+1)-1]mu_((j+1)-1)+x_(j+1))/(j+1)
(6)
=(jmu_j+x_(j+1))/(j+1)
(7)
mu_(j+1)(j+1)=(j+1)mu_j+(x_(j+1)-mu_j)
(8)
mu_(j+1)=mu_j+(x_(j+1)-mu_j)/(j+1).
(9)

By the definition of the sample variance,

 s_j^2=(sum_(i=1)^(j)(x_i-mu_j)^2)/(j-1)
(10)

for j>=2. Defining s_1=0, s_j can then be computed using the recurrence equation

js_(j+1)^2=j(sum_(i=1)^(j+1)(x_i-mu_(j+1))^2)/j
(11)
=sum_(i=1)^(j+1)(x_i-mu_(j+1))^2
(12)
=sum_(i=1)^(j+1)[(x_i-mu_j)+(mu_j-mu_(j+1))]^2
(13)
=sum_(i=1)^(j+1)(x_i-mu_j)^2+sum_(i=1)^(j+1)(mu_j-mu_(j+1))^2+2sum_(i=1)^(j+1)(x_i-mu_j)(mu_j-mu_(j+1)).
(14)

Working on the first term,

sum_(i=1)^(j+1)(x_i-mu_j)^2=sum_(i=1)^(j)(x_i-mu_j)^2+(x_(j+1)-mu_j)^2
(15)
=(j-1)s_j^2+(x_(j+1)-mu_j)^2.
(16)

Use (◇) to write

 x_(j+1)-mu_j=(j+1)(mu_(j+1)-mu_j),
(17)

so

 sum_(i=1)^(j+1)(x_i-mu_j)^2=(j-1)s_j^2+(j+1)^2(mu_(j+1)-mu_j)^2.
(18)

Now work on the second term in (◇),

 sum_(i=1)^(j+1)(mu_j-mu_(j+1))^2=(j+1)(mu_j-mu_(j+1))^2.
(19)

Considering the third term in (◇),

sum_(i=1)^(j+1)(x_i-mu_j)(mu_j-mu_(j+1))=(mu_j-mu_(j+1))sum_(i=1)^(j+1)(x_i-mu_j)
(20)
=(mu_j-mu_(j+1))[sum_(i=1)^(j)(x_i-mu_j)+(x_(j+1)-mu_j)]
(21)
=(mu_j-mu_(j+1))(x_(j+1)-mu_j-jmu_j+sum_(i=1)^(j)x_i).
(22)

But

 sum_(i=1)^jx_i=jmu_j,
(23)

so

(mu_j-mu_(j+1))(x_(j+1)-mu_j)=(mu_j-mu_(j+1))(j+1)(mu_(j+1)-mu_j)
(24)
=-(j+1)(mu_j-mu_(j+1))^2.
(25)

Finally, plugging (◇), (◇), and (◇) into (◇),

js_(j+1)^2=[(j-1)s_j^2+(j+1)^2(mu_(j+1)-mu_j)^2]+[(j+1)(mu_j-mu_(j+1))^2]+2[-(j+1)(mu_j-mu_(j+1))^2]
(26)
=(j-1)s_j^2+(j+1)^2(mu_(j+1)-mu_j)^2-(j+1)(mu_j-mu_(j+1))^2
(27)
=(j-1)s_j^2+(j+1)[(j+1)-1](mu_(j+1)-mu_j)^2
(28)
=(j-1)s_j^2+j(j+1)(mu_(j+1)-mu_j)^2,
(29)

gives the desired expression for s_(j+1) in terms of s_j, mu_(j+1), and mu_j,

 s_(j+1)^2=(1-1/j)s_j^2+(j+1)(mu_(j+1)-mu_j)^2.
(30)

See also

Sample, Sample Variance, Sample Variance Distribution, Variance

Explore with Wolfram|Alpha

Cite this as:

Weisstein, Eric W. "Sample Variance Computation." From MathWorld--A Wolfram Web Resource. https://mathworld.wolfram.com/SampleVarianceComputation.html

Subject classifications