# Minimum sum-of-squares clustering

See the PDF, page 4, for the problem description. Find $C$ cluster centers $z_j$ to minimize the sum of distances from each point to its cluster center.
MSSC is similar to the problem we are trying to solve in the $kn$ to $k$ groups of $n$ matching problem.  Given $kn$ points, we would like to find the $n$ cluster centers that minimize the distance of any point $p_i$ to its nearest center $z_j$ (the centroid of that cluster).   For fixed $n$ and dimensionality $d$ , it has been shown MSSC is solvable in $O((N)^{nd+1})$ , for $N$ input points.  For general $d$ and input parameter $n$ (as in our case), the problem is shown to be NP-hard.
MSSC, however, differs from our problem statement, since we require the size of any cluster to be size $k$ such that the cluster center $z_i$ provides a good representation for $k$ points.  The points in each cluster may be divided up to form $n$ groups, where each member comes from a unique cluster, thereby removing bias between the groups.  Also, our initial problem statement proposes a greedy method, such that the smallest clusters are found first, whereas MSSC minimizes the sizes of all clusters.