Thumbnail
Access Restriction
Subscribed

Author Ostrovsky, Rafail ♦ Rabani, Yuval ♦ Schulman, Leonard J. ♦ Swamy, Chaitanya
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Copyright Year ©2013
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword Randomized algorithms ♦ Approximation algorithms
Abstract We investigate variants of Lloyd's heuristic for clustering high-dimensional data in an attempt to explain its popularity (a half century after its introduction) among practitioners, and in order to suggest improvements in its application. We propose and justify a $\textit{clusterability}$ criterion for data sets. We present variants of Lloyd's heuristic that quickly lead to provably near-optimal clustering solutions when applied to well-clusterable instances. This is the first performance guarantee for a variant of Lloyd's heuristic. The provision of a guarantee on output quality does not come at the expense of speed: some of our algorithms are candidates for being faster in practice than currently used variants of Lloyd's method. In addition, our other algorithms are faster on well-clusterable instances than recently proposed approximation algorithms, while maintaining similar guarantees on clustering quality. Our main algorithmic contribution is a novel probabilistic seeding process for the starting configuration of a Lloyd-type iteration.
ISSN 00045411
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2013-01-09
Publisher Place New York
e-ISSN 1557735X
Journal Journal of the ACM (JACM)
Volume Number 59
Issue Number 6
Page Count 22
Starting Page 1
Ending Page 22


Open content in new tab

   Open content in new tab
Source: ACM Digital Library