TOPICS
Search

Hypergeometric Distribution


Let there be n ways for a "good" selection and m ways for a "bad" selection out of a total of n+m possibilities. Take N samples and let x_i equal 1 if selection i is successful and 0 if it is not. Let x be the total number of successful selections,

 x=sum_(i=1)^Nx_i.
(1)

The probability of i successful selections is then

P(x=i)=([# ways for i successes][# ways for N-i failures])/([total number of ways to select])
(2)
=((n; i)(m; N-i))/((m+n; N))
(3)
=(m!n!N!(m+n-N)!)/(i!(n-i)!(m+i-N)!(N-i)!(m+n)!).
(4)

The hypergeometric distribution is implemented in the Wolfram Language as HypergeometricDistribution[N, n, m+n].

The problem of finding the probability of such a picking problem is sometimes called the "urn problem," since it asks for the probability that i out of N balls drawn are "good" from an urn that contains n "good" balls and m "bad" balls. It therefore also describes the probability of obtaining exactly i correct balls in a pick-N lottery from a reservoir of r balls (of which n=N are "good" and m=r-N are "bad"). For example, for N=6 and r=36, the probabilities of obtaining i correct balls are given in the following table.

number correctprobabilityodds
00.30482.280:1
10.43901.278:1
20.21103.738:1
30.0416922.99:1
40.003350297.5:1
59.241×10^(-5)10820:1
65.134×10^(-7)1.948×10^6:1

The ith selection has an equal likelihood of being in any trial, so the fraction of acceptable selections p is

 p=n/(m+n),
(5)

i.e.,

 P(x_i=1)=n/(m+n).
(6)

The expectation value of x is therefore simply

mu=<sum_(i=1)^(N)x_i>
(7)
=sum_(i=1)^(N)<x_i>
(8)
=sum_(i=1)^(N)n/(m+n)
(9)
=(nN)/(m+n).
(10)

This can also be computed by direct summation as

mu=sum_(i=0)^(N)i((n; i)(m; N-i))/((n+m; N))
(11)
=(nN)/(m+n).
(12)

The variance is

 var(x)=sum_(i=1)^Nvar(x_i)+sum_(i=1)^Nsum_(j=1; j!=i)^Ncov(x_i,x_j).
(13)

Since x_i is a Bernoulli variable,

var(x_i)=p(1-p)
(14)
=n/(n+m)(1-n/(n+m))
(15)
=n/(n+m)(1-n/(n+m))
(16)
=n/(n+m)((n+m-n)/(n+m))
(17)
=(nm)/((n+m)^2),
(18)

so

 sum_(i=1)^Nvar(x_i)=(Nnm)/((n+m)^2).
(19)

For i<j, the covariance is

 cov(x_i,x_j)=<x_ix_j>-<x_i><x_j>.
(20)

The probability that both i and j are successful for i!=j is

P(x_i=1,x_j=1)=P(x_i=1)P(x_j=1|x_i=1)
(21)
=n/(n+m)(n-1)/(n+m-1)
(22)
=(n(n-1))/((n+m)(n+m-1)).
(23)

But since x_i and x_j are random Bernoulli variables (each 0 or 1), their product is also a Bernoulli variable. In order for x_ix_j to be 1, both x_i and x_j must be 1,

<x_ix_j>=P(x_ix_j=1)=P(x_i=1,x_j=1)
(24)
=n/(n+m)(n-1)/(n+m-1)
(25)
=(n(n-1))/((n+m)(n+m-1)).
(26)

Combining (26) with

<x_i><x_j>=n/(n+m)n/(n+m)
(27)
=(n^2)/((n+m)^2),
(28)

gives

cov(x_i,x_j)=((n+m)(n^2-n)-n^2(n+m-1))/((n+m)^2(n+m-1))
(29)
=-(mn)/((n+m)^2(n+m-1)).
(30)

There are a total of N^2 terms in a double summation over N. However, i=j for N of these, so there are a total of N^2-N=N(N-1) terms in the covariance summation

 sum_(i=1)^Nsum_(j=1; j!=i)^Ncov(x_i,x_j)=-(N(N-1)mn)/((n+m)^2(n+m-1)).
(31)

Combining equations (◇), (◇), (◇), and (◇) gives the variance

var(x)=(Nmn)/((n+m)^2)-(N(N-1)mn)/((n+m)^2(n+m-1))
(32)
=(Nmn(n+m-N))/((n+m)^2(n+m-1)),
(33)

so the final result is

 <x>=Np
(34)

and, since

 1-p=m/(n+m)
(35)

and

 np(1-p)=(mn)/((n+m)^2),
(36)

we have

sigma^2=var(x)
(37)
=Np(1-p)(1-(N-1)/(n+m-1))
(38)
=(mnN(m+n-N))/((m+n)^2(m+n-1)).
(39)

This can also be computed directly from the sum

sigma^2=sum_(i=0)^(N)((n; i)(m; N-i))/((n+m; N))(i-mu)^2
(40)
=(mnN(m+n-N))/((m+n)^2(m+n-1)).
(41)

The skewness is

gamma_1=(q-p)/(sqrt(npq))sqrt((N-1)/(N-m))((N-2n)/(N-2))
(42)
=((m-n)(m+n-2N))/(m+n-2)sqrt((m+n-1)/(mnN(m+n-N))),
(43)

and the kurtosis excess is given by a complicated expression.

The generating function is

 phi(t)=((m; N))/((n+m; N))_2F_1(-N,-n;m-N+1;e^(it)),
(44)

where _2F_1(a,b;c;z) is the hypergeometric function.

If the hypergeometric distribution is written

 h_n(x,s)=((np; x)(nq; s-x))/((n; s)),
(45)

then

 sum_(x=0)^sh_n(x,s)u^x=A_2F_1(-s,-np;nq-s+1;u),
(46)

where A is a constant.


See also

Multichoose

Explore with Wolfram|Alpha

References

Beyer, W. H. CRC Standard Mathematical Tables, 28th ed. Boca Raton, FL: CRC Press, pp. 532-533, 1987.Feller, W. "The Hypergeometric Series." §2.6 in An Introduction to Probability Theory and Its Applications, Vol. 1, 3rd ed. New York: Wiley, pp. 41-45, 1968.Spiegel, M. R. Theory and Problems of Probability and Statistics. New York: McGraw-Hill, pp. 113-114, 1992.

Referenced on Wolfram|Alpha

Hypergeometric Distribution

Cite this as:

Weisstein, Eric W. "Hypergeometric Distribution." From MathWorld--A Wolfram Web Resource. https://mathworld.wolfram.com/HypergeometricDistribution.html

Subject classifications