K-Means Clustering Algorithm

An algorithm for partitioning (or clustering) N data points into K disjoint subsets S_j containing N_j data points so as to minimize the sum-of-squares criterion

 J=sum_(j=1)^Ksum_(n in S_j)|x_n-mu_j|^2,

where x_n is a vector representing the nth data point and mu_j is the geometric centroid of the data points in S_j. In general, the algorithm does not achieve a global minimum of J over the assignments. In fact, since the algorithm uses discrete assignment rather than a set of continuous parameters, the "minimum" it reaches cannot even be properly called a local minimum. Despite these limitations, the algorithm is used fairly frequently as a result of its ease of implementation.

The algorithm consists of a simple re-estimation procedure as follows. Initially, the data points are assigned at random to the K sets. For step 1, the centroid is computed for each set. In step 2, every point is assigned to the cluster whose centroid is closest to that point. These two steps are alternated until a stopping criterion is met, i.e., when there is no further change in the assignment of the data points.

See also

Global Minimum, Local Minimum, Minimum

Explore with Wolfram|Alpha


Bishop, C. M. Neural Networks for Pattern Recognition. Oxford, England: Oxford University Press, 1995.

Referenced on Wolfram|Alpha

K-Means Clustering Algorithm

Cite this as:

Weisstein, Eric W. "K-Means Clustering Algorithm." From MathWorld--A Wolfram Web Resource.

Subject classifications