TOPICS
Search

Zipf's Law


In the English language, the probability of encountering the rth most common word is given roughly by P(r)=0.1/r for r up to 1000 or so. The law breaks down for less frequent words, since the harmonic series diverges. Pierce's (1980, p. 87) statement that sumP(r)>1 for r=8727 is incorrect. Goetz states the law as follows: The frequency of a word is inversely proportional to its statistical rank r such that

 P(r) approx 1/(rln(1.78R)),

where R is the number of different words.


See also

Harmonic Series, Statistical Rank, Zipf Distribution

Explore with Wolfram|Alpha

WolframAlpha

More things to try:

References

Bogomolny, A. "Benford's Law and Zipf's Law." http://www.cut-the-knot.org/do_you_know/zipfLaw.shtml.Update a linkGoetz, P. "Phil Goetz's Complexity Dictionary." http://www.cs.buffalo.edu/~goetz/dict.htmlLi, W. "Zipf's Law." http://linkage.rockefeller.edu/wli/zipf/.Pierce, J. R. Introduction to Information Theory: Symbols, Signals, and Noise, 2nd rev. ed. New York: Dover, pp. 86-87 and 238-239, 1980.

Referenced on Wolfram|Alpha

Zipf's Law

Cite this as:

Weisstein, Eric W. "Zipf's Law." From MathWorld--A Wolfram Web Resource. https://mathworld.wolfram.com/ZipfsLaw.html

Subject classifications