
HP Labs Technical Reports
Click here for full text:
KHarmonic Means  A Data Clustering Algorithm
Zhang, Bin; Hsu, Meichun; Dayal, Umeshwar
HPL1999124
Keyword(s): clustering; KMeans; KHarmonic Means; data mining
Abstract: Data clustering is one of the common techniques used in data mining. A popular performance function for measuring goodness of data clustering is the total withincluster variance, or the total meansquare quantization error (MSE). The KMeans (KM) algorithm is a popular algorithm which attempts to find a K clustering which minimizes MSE. The KMeans algorithm is a centerbased clustering algorithm. The dependency of the KMeans performance on the initialization of the centers is a major problem; a similiar issue exists for an alternative algorithm, Expectation Maximization(EM), although to a lesser extent. In this paper, we propose a new clustering method called the KHarmonic Means algorithm (KHM). KHM is a center based clustering algorithm which uses the Harmonic Averages of the distances from each data point to the centers as components to its performance function. It is demonstrated that KHarmonic Means is essentially insensitive to the initialization of the centers. In certain cases, KHarmonic Means significantly improves the quality of clustering results comparing with both KMeans and EM, which are the two most popular clustering algorithms used in data exploration and data compression. A unified view of the three performance functions, KMeans', KHarmonic Means' and EM's, are given for comparison. Experimental results of KHM comparing with KM on high dimensional data and visualization of the animation of the convergence of all three algorithms using 2dimensional data are given.
25 Pages
Back to Index
