Finding Correlations in Subquadratic Time, with Applications to Learning Parities and the Closest Pair ProblemFinding Correlations in Subquadratic Time, with Applications to Learning Parities and the Closest Pair Problem

Access Restriction
Subscribed

 Author Valiant, Gregory Source ACM Digital Library Content type Text Publisher Association for Computing Machinery (ACM) File Format PDF Copyright Year ©2015 Language English
 Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science Subject Keyword Correlations ♦ Approximate closest pair ♦ Asymmetric embeddings ♦ Learning juntas ♦ Locality sensitive hashing ♦ Metric embedding ♦ Nearest neighbor ♦ Parity with noise Abstract Given a set of $\textit{n}$ $\textit{d}-dimensional$ Boolean vectors with the promise that the vectors are chosen uniformly at random with the exception of two vectors that have Pearson correlation coefficient ρ (Hamming distance $\textit{d}ċ$ 1™ρ&frac;2), how quickly can one find the two correlated vectors? We present an algorithm which, for any constant &eps;>0, and constant ρ>0, runs in expected time $O(n^{5&mins;ω&frac;4&mins;ω+&eps;}$ +nd) < $O(n^{1.62}$ +nd), where ω < 2.4 is the exponent of matrix multiplication. This is the first subquadratic--time algorithm for this problem for which ρ does not appear in the exponent of $\textit{n},$ and improves upon $O(n^{2&mins;O(ρ)}),$ given by Paturi et al. [1989], the Locality Sensitive Hashing approach of Motwani [1998] and the Bucketing Codes approach of Dubiner [2008]. Applications and extensions of this basic algorithm yield significantly improved algorithms for several other problems. Approximate Closest Pair. For any sufficiently small constant &eps;>0, given $\textit{n}$ $\textit{d}-dimensional$ vectors, there exists an algorithm that returns a pair of vectors whose Euclidean (or Hamming) distance differs from that of the closest pair by a factor of at most 1+&eps;, and runs in time $O(n^{2&mins;Θ(&sqrt;&eps;})).$ The best previous algorithms (including Locality Sensitive Hashing) have runtime $O(n^{2&mins;O(&eps;)}).$ Learning Sparse Parities with Noise. Given samples from an instance of the learning parities with noise problem where each example has length $\textit{n},$ the true parity set has size at most $\textit{k}$ « $\textit{n},$ and the noise rate is η, there exists an algorithm that identifies the set of $\textit{k}$ indices in time $\textit{n}ω+&eps;&frac;3$ k $\textit{poly(1&frac;1&mins;2η)}$ < $n^{0.8k}$ poly(1&frac;1&mins;2 η). This is the first algorithm with no dependence on η in the exponent of $\textit{n},$ aside from the trivial $\textit{O}&big;(n$ &choose; k)&big; ≈ $O(n^{k})$ brute-force algorithm, and for large noise rates (η > 0.4), improves upon the results of Grigorescu et al. [2011] that give a runtime of $\textit{n}(1+(2$ $η)}^{2}$ + $\textit{o}(1))k&frac;2$ $\textit{poly}(1&frac;1&mins;2η).$ Learning $\textit{k}-Juntas$ with Noise. Given uniformly random length $\textit{n}$ Boolean vectors, together with a label, which is some function of just $\textit{k}$ « $\textit{n}$ of the bits, perturbed by noise rate η, return the set of relevant indices. Leveraging the reduction of Feldman et al. [2009], our result for learning $\textit{k}-parities$ implies an algorithm for this problem with runtime $\textit{n}ω+&eps;&frac;3$ k $\textit{poly}(1&frac;1&mins;2η)$ < $n^{0.8k}$ $\textit{poly}(1&frac;1&mins;2$ η), which is the first runtime for this problem of the form $n^{ck}$ with an absolute constant $\textit{c}$ < 1. Learning $\textit{k}-Juntas$ without Noise. Given uniformly random length $\textit{n}$ Boolean vectors, together with a label, which is some function of $\textit{k}$ « $\textit{n}$ of the bits, return the set of relevant indices. Using a modification of the algorithm of Mossel et al. [2004], and employing our algorithm for learning sparse parities with noise via the reduction of Feldman et al. [2009], we obtain an algorithm for this problem with runtime $\textit{n}ω+$ &eps;&frac;4 k $\textit{poly(n)}$ < $n^{0.6k}$ $\textit{poly(n)},$ which improves on the previous best of $n^{ω+1&frac;ωk}$ ≈ $n^{0.7k}$ $\textit{poly(n)}$ of Mossel et al. [2004]. ISSN 00045411 Age Range 18 to 22 years ♦ above 22 year Educational Use Research Education Level UG and PG Learning Resource Type Article Publisher Date 2015-05-06 Publisher Place New York e-ISSN 1557735X Journal Journal of the ACM (JACM) Volume Number 62 Issue Number 2 Page Count 45 Starting Page 1 Ending Page 45

Open content in new tab

Source: ACM Digital Library