### Polynomial-time approximation schemes for geometric min-sum median clusteringPolynomial-time approximation schemes for geometric min-sum median clustering

Access Restriction
Subscribed

 Author Ostrovsky, Rafail ♦ Rabani, Yuval Source ACM Digital Library Content type Text Publisher Association for Computing Machinery (ACM) File Format PDF Copyright Year ©2002 Language English
 Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science Subject Keyword Clustering ♦ High-dimensional data ♦ Polynomial-time approximation schemes Abstract The Johnson--Lindenstrauss lemma states that $\textit{n}$ points in ahigh-dimensional Hilbert space can be embedded with smalldistortion of the distances into an $\textit{O}(log$ $\textit{n})dimensional$ space by applying a random linear transformation. Weshow that similar (though weaker) properties hold for certainrandom linear transformations over the Hamming cube. We use thesetransformations to solve NP-hard clustering problems in the cube aswell as in geometric settings.More specifically, we address thefollowing clustering problem. Given $\textit{n}$ points in a larger set(e.g., $ℝ^{d})$ endowed with a distance function $(e.g.,L^{2}$ distance), we would like to partition the dataset into $\textit{k}$ disjoint clusters, each with a "cluster center,"so as to minimize the sum over all data points of the distancebetween the point and the center of the cluster containing thepoint. The problem is provably NP-hard in some high-dimensionalgeometric settings, even for $\textit{k}$ = 2. We give polynomial-timeapproximation schemes for this problem in several settings,including the binary cube ${0,1}^{d}$ with Hamming distance,and $ℝ^{d}$ either with $L^{1}$ distance,or with $L^{2}$ distance, or with the square $ofL^{2}$ distance. In all these settings, the bestprevious results were constant factor approximation guarantees.Wenote that our problem is similar in flavor to the $\textit{k}-medianproblem$ (and the related facility location problem), which has beenconsidered in graph-theoretic and fixed dimensional geometricsettings, where it becomes hard when $\textit{k}$ is part of the input.In contrast, we study the problem when $\textit{k}$ is fixed, but thedimension is part of the input. ISSN 00045411 Age Range 18 to 22 years ♦ above 22 year Educational Use Research Education Level UG and PG Learning Resource Type Article Publisher Date 2002-03-01 Publisher Place New York e-ISSN 1557735X Journal Journal of the ACM (JACM) Volume Number 49 Issue Number 2 Page Count 18 Starting Page 139 Ending Page 156

#### Open content in new tab

Source: ACM Digital Library