Access Restriction

Author Arthur, David ♦ Manthey, Bodo ♦ Rglin, Heiko
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Copyright Year ©2011
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword Data clustering ♦ K-means method ♦ Smoothed analysis
Abstract The $\textit{k}-means$ method is one of the most widely used clustering algorithms, drawing its popularity from its speed in practice. Recently, however, it was shown to have exponential worst-case running time. In order to close the gap between practical performance and theoretical analysis, the $\textit{k}-means$ method has been studied in the model of smoothed analysis. But even the smoothed analyses so far are unsatisfactory as the bounds are still super-polynomial in the number $\textit{n}$ of data points. In this article, we settle the smoothed running time of the $\textit{k}-means$ method. We show that the smoothed number of iterations is bounded by a polynomial in $\textit{n}$ and $1/\textit{σ},$ where $\textit{σ}$ is the standard deviation of the Gaussian perturbations. This means that if an arbitrary input data set is randomly perturbed, then the $\textit{k}-means$ method will run in expected polynomial time on that input set.
ISSN 00045411
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2011-10-01
Publisher Place New York
e-ISSN 1557735X
Journal Journal of the ACM (JACM)
Volume Number 58
Issue Number 5
Page Count 31
Starting Page 1
Ending Page 31

Open content in new tab

   Open content in new tab
Source: ACM Digital Library