Thumbnail
Access Restriction
Subscribed

Author Balcan, Maria-Florina ♦ Blum, Avrim ♦ Gupta, Anupam
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Copyright Year ©2013
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword k-Means ♦ k-Median ♦ Approximation Algorithms ♦ Clustering ♦ Clustering Accuracy ♦ Min-Sum
Abstract A common approach to clustering data is to view data objects as points in a metric space, and then to optimize a natural distance-based objective such as the $\textit{k}-median,$ $\textit{k}-means,$ or min-sum score. For applications such as clustering proteins by function or clustering images by subject, the implicit hope in taking this approach is that the optimal solution for the chosen objective will closely match the desired “target” clustering (e.g., a correct clustering of proteins by function or of images by who is in them). However, most distance-based objectives, including those mentioned here, are NP-hard to optimize. So, this assumption by itself is not sufficient, assuming P ≠ NP, to achieve clusterings of low-error via polynomial time algorithms. In this article, we show that we can bypass this barrier if we slightly extend this assumption to ask that for some small constant $\textit{c},$ not only the optimal solution, but also all $\textit{c}-approximations$ to the optimal solution, differ from the target on at most some ε fraction of points—we call this $\textit{(c,ε)-approximation-stability}.$ We show that under this condition, it is possible to efficiently obtain low-error clusterings even if the property holds only for values $\textit{c}$ for which the objective is known to be NP-hard to approximate. Specifically, for any constant c > 1, (c,ε)-approximation-stability of $\textit{k}-median$ or $\textit{k}-means$ objectives can be used to efficiently produce a clustering of error $\textit{O}(ε)$ with respect to the target clustering, as can stability of the min-sum objective if the target clusters are sufficiently large. Thus, we can perform nearly as well in terms of agreement with the target clustering as if we could approximate these objectives to this NP-hard value.
ISSN 00045411
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2013-05-03
Publisher Place New York
e-ISSN 1557735X
Journal Journal of the ACM (JACM)
Volume Number 60
Issue Number 2
Page Count 34
Starting Page 1
Ending Page 34


Open content in new tab

   Open content in new tab
Source: ACM Digital Library