Thumbnail
Access Restriction
Subscribed

Author Ostrovsky, Rafail ♦ Rabani, Yuval
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Copyright Year ©2002
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword Clustering ♦ High-dimensional data ♦ Polynomial-time approximation schemes
Abstract The Johnson--Lindenstrauss lemma states that $\textit{n}$ points in ahigh-dimensional Hilbert space can be embedded with smalldistortion of the distances into an $\textit{O}(log$ $\textit{n})dimensional$ space by applying a random linear transformation. Weshow that similar (though weaker) properties hold for certainrandom linear transformations over the Hamming cube. We use thesetransformations to solve NP-hard clustering problems in the cube aswell as in geometric settings.More specifically, we address thefollowing clustering problem. Given $\textit{n}$ points in a larger set(e.g., $ℝ^{d})$ endowed with a distance function $(e.g.,L^{2}$ distance), we would like to partition the dataset into $\textit{k}$ disjoint clusters, each with a "cluster center,"so as to minimize the sum over all data points of the distancebetween the point and the center of the cluster containing thepoint. The problem is provably NP-hard in some high-dimensionalgeometric settings, even for $\textit{k}$ = 2. We give polynomial-timeapproximation schemes for this problem in several settings,including the binary cube ${0,1}^{d}$ with Hamming distance,and $ℝ^{d}$ either with $L^{1}$ distance,or with $L^{2}$ distance, or with the square $ofL^{2}$ distance. In all these settings, the bestprevious results were constant factor approximation guarantees.Wenote that our problem is similar in flavor to the $\textit{k}-medianproblem$ (and the related facility location problem), which has beenconsidered in graph-theoretic and fixed dimensional geometricsettings, where it becomes hard when $\textit{k}$ is part of the input.In contrast, we study the problem when $\textit{k}$ is fixed, but thedimension is part of the input.
ISSN 00045411
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2002-03-01
Publisher Place New York
e-ISSN 1557735X
Journal Journal of the ACM (JACM)
Volume Number 49
Issue Number 2
Page Count 18
Starting Page 139
Ending Page 156


Open content in new tab

   Open content in new tab
Source: ACM Digital Library