An algorithm for partitioning (or clustering) data points into disjoint subsets containing data points so as to minimize the sum-of-squares criterion

where is a vector representing the th data point and is the geometric centroid of the data points in . In general, the algorithm does not achieve a global minimum of over the assignments. In fact, since the algorithm uses discrete assignment rather than a set of continuous parameters, the "minimum" it reaches cannot even be properly called a local minimum. Despite these limitations, the algorithm is used fairly frequently as a result of its ease of implementation.

The algorithm consists of a simple re-estimation procedure as follows. Initially, the data points are assigned at random to the sets. For step 1, the centroid is computed for each set. In step 2, every point is assigned to the cluster whose centroid is closest to that point. These two steps are alternated until a stopping criterion is met, i.e., when there is no further change in the assignment of the data points.