Thumbnail
Access Restriction
Subscribed

Author Valiant, Gregory
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Copyright Year ©2015
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword Correlations ♦ Approximate closest pair ♦ Asymmetric embeddings ♦ Learning juntas ♦ Locality sensitive hashing ♦ Metric embedding ♦ Nearest neighbor ♦ Parity with noise
Abstract Given a set of $\textit{n}$ $\textit{d}-dimensional$ Boolean vectors with the promise that the vectors are chosen uniformly at random with the exception of two vectors that have Pearson correlation coefficient ρ (Hamming distance $\textit{d}ċ$ 1™ρ&frac;2), how quickly can one find the two correlated vectors? We present an algorithm which, for any constant &eps;>0, and constant ρ>0, runs in expected time $O(n^{5&mins;ω&frac;4&mins;ω+&eps;}$ +nd) < $O(n^{1.62}$ +nd), where ω < 2.4 is the exponent of matrix multiplication. This is the first subquadratic--time algorithm for this problem for which ρ does not appear in the exponent of $\textit{n},$ and improves upon $O(n^{2&mins;O(ρ)}),$ given by Paturi et al. [1989], the Locality Sensitive Hashing approach of Motwani [1998] and the Bucketing Codes approach of Dubiner [2008]. Applications and extensions of this basic algorithm yield significantly improved algorithms for several other problems. Approximate Closest Pair. For any sufficiently small constant &eps;>0, given $\textit{n}$ $\textit{d}-dimensional$ vectors, there exists an algorithm that returns a pair of vectors whose Euclidean (or Hamming) distance differs from that of the closest pair by a factor of at most 1+&eps;, and runs in time $O(n^{2&mins;Θ(&sqrt;&eps;})).$ The best previous algorithms (including Locality Sensitive Hashing) have runtime $O(n^{2&mins;O(&eps;)}).$ Learning Sparse Parities with Noise. Given samples from an instance of the learning parities with noise problem where each example has length $\textit{n},$ the true parity set has size at most $\textit{k}$ « $\textit{n},$ and the noise rate is η, there exists an algorithm that identifies the set of $\textit{k}$ indices in time $\textit{n}ω+&eps;&frac;3$ k $\textit{poly(1&frac;1&mins;2η)}$ < $n^{0.8k}$ poly(1&frac;1&mins;2 η). This is the first algorithm with no dependence on η in the exponent of $\textit{n},$ aside from the trivial $\textit{O}&big;(n$ &choose; k)&big; ≈ $O(n^{k})$ brute-force algorithm, and for large noise rates (η > 0.4), improves upon the results of Grigorescu et al. [2011] that give a runtime of $\textit{n}(1+(2$ $η)}^{2}$ + $\textit{o}(1))k&frac;2$ $\textit{poly}(1&frac;1&mins;2η).$ Learning $\textit{k}-Juntas$ with Noise. Given uniformly random length $\textit{n}$ Boolean vectors, together with a label, which is some function of just $\textit{k}$ « $\textit{n}$ of the bits, perturbed by noise rate η, return the set of relevant indices. Leveraging the reduction of Feldman et al. [2009], our result for learning $\textit{k}-parities$ implies an algorithm for this problem with runtime $\textit{n}ω+&eps;&frac;3$ k $\textit{poly}(1&frac;1&mins;2η)$ < $n^{0.8k}$ $\textit{poly}(1&frac;1&mins;2$ η), which is the first runtime for this problem of the form $n^{ck}$ with an absolute constant $\textit{c}$ < 1. Learning $\textit{k}-Juntas$ without Noise. Given uniformly random length $\textit{n}$ Boolean vectors, together with a label, which is some function of $\textit{k}$ « $\textit{n}$ of the bits, return the set of relevant indices. Using a modification of the algorithm of Mossel et al. [2004], and employing our algorithm for learning sparse parities with noise via the reduction of Feldman et al. [2009], we obtain an algorithm for this problem with runtime $\textit{n}ω+$ &eps;&frac;4 k $\textit{poly(n)}$ < $n^{0.6k}$ $\textit{poly(n)},$ which improves on the previous best of $n^{ω+1&frac;ωk}$ ≈ $n^{0.7k}$ $\textit{poly(n)}$ of Mossel et al. [2004].
ISSN 00045411
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2015-05-06
Publisher Place New York
e-ISSN 1557735X
Journal Journal of the ACM (JACM)
Volume Number 62
Issue Number 2
Page Count 45
Starting Page 1
Ending Page 45


Open content in new tab

   Open content in new tab
Source: ACM Digital Library