### Clustering under approximation stabilityClustering under approximation stability

Access Restriction
Subscribed

 Author Balcan, Maria-Florina ♦ Blum, Avrim ♦ Gupta, Anupam Source ACM Digital Library Content type Text Publisher Association for Computing Machinery (ACM) File Format PDF Copyright Year ©2013 Language English
 Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science Subject Keyword k-Means ♦ k-Median ♦ Approximation Algorithms ♦ Clustering ♦ Clustering Accuracy ♦ Min-Sum Abstract A common approach to clustering data is to view data objects as points in a metric space, and then to optimize a natural distance-based objective such as the $\textit{k}-median,$ $\textit{k}-means,$ or min-sum score. For applications such as clustering proteins by function or clustering images by subject, the implicit hope in taking this approach is that the optimal solution for the chosen objective will closely match the desired “target” clustering (e.g., a correct clustering of proteins by function or of images by who is in them). However, most distance-based objectives, including those mentioned here, are NP-hard to optimize. So, this assumption by itself is not sufficient, assuming P ≠ NP, to achieve clusterings of low-error via polynomial time algorithms. In this article, we show that we can bypass this barrier if we slightly extend this assumption to ask that for some small constant $\textit{c},$ not only the optimal solution, but also all $\textit{c}-approximations$ to the optimal solution, differ from the target on at most some ε fraction of points—we call this $\textit{(c,ε)-approximation-stability}.$ We show that under this condition, it is possible to efficiently obtain low-error clusterings even if the property holds only for values $\textit{c}$ for which the objective is known to be NP-hard to approximate. Specifically, for any constant c > 1, (c,ε)-approximation-stability of $\textit{k}-median$ or $\textit{k}-means$ objectives can be used to efficiently produce a clustering of error $\textit{O}(ε)$ with respect to the target clustering, as can stability of the min-sum objective if the target clusters are sufficiently large. Thus, we can perform nearly as well in terms of agreement with the target clustering as if we could approximate these objectives to this NP-hard value. ISSN 00045411 Age Range 18 to 22 years ♦ above 22 year Educational Use Research Education Level UG and PG Learning Resource Type Article Publisher Date 2013-05-03 Publisher Place New York e-ISSN 1557735X Journal Journal of the ACM (JACM) Volume Number 60 Issue Number 2 Page Count 34 Starting Page 1 Ending Page 34

#### Open content in new tab

Source: ACM Digital Library