hp home products & services support solutions how to buy
hp logo - invent
corner hp labs corner
search search
contact hp contact hp
hp labs home hp labs home
about hp labs about hp labs
research research
news and events news and events
careers @ labs careers @ labs
technical reports technical reports
talks and speeches talks and speeches
worldwide sites worldwide sites
corner corner
HP Labs Technical Reports

Click here for full text: Postscript PDF

Generalized K-Harmonic Means -- Boosting in Unsupervised Learning

Zhang, Bin


Keyword(s): clustering; K-Means; K-Harmonic Means; EM; data mining; data compression

Abstract: We propose a new class of center-based iterative clustering algorithms, K-Harmonic Means (KHM(subscripted)p), which is essentially insensitive to the initialization of the centers, demonstrated through many experiments. The insensitivity to initialization is attributed to a boosting function, which increases the importance of the data points that are far from any centers in the next iteration. The dependency of the K-Means' and EM's performance on the initialization of the centers has been a major problem. Many have tried to generate good initializations to solve the sensitivity problem. KHM(subscript)p addresses the intrinsic problem by replacing the minimum distance from a data point to the centers, used in K-Means, by the Harmonic Averages of the distances from the data point to all centers. KHM(subscript)p significantly improves the quality of clustering results comparing with both K-Means and EM. The KHM(subscript)p algorithms have been implemented in both sequential and parallel languages and tested on hundreds of randomly generated datasets with different data distribution and clustering characteristics.

12 Pages

Back to Index

printing icon
printing instructions printing instructions
Privacy Statement Legal Notices © 1994-2000 Hewlett-Packard Company