Thumbnail
Access Restriction
Subscribed

Author Moraru, Iulian ♦ Andersen, David G.
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Copyright Year ©2012
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Computer programming, programs & data
Subject Keyword Bloom filters ♦ CPU caches ♦ Pattern matching ♦ Memory efficiency
Abstract This article presents a new, memory efficient and cache-optimized algorithm for simultaneously searching for a large number of patterns in a very large corpus. This algorithm builds upon the Rabin-Karp string search algorithm and incorporates a new type of Bloom filter that we call a feed-forward Bloom filter. While it retains the asymptotic time complexity of previous multiple pattern matching algorithms, we show that this technique, along with a CPU architecture-aware design of the Bloom filter, can provide speed-ups between 2× and 30×, and memory consumption reductions as large as 50× when compared with grep. Our algorithm is also well suited for implementations on GPUs: A modern GPU can search for 3 million patterns at a rate of 580MB/s, and for 100 million patterns (a prohibitive number for traditional algorithms) at a rate of 170MB/s.
ISSN 10846654
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2012-09-01
Publisher Place New York
e-ISSN 10846654
Journal Journal of Experimental Algorithmics (JEA)
Volume Number 17
Page Count 18
Starting Page 3.1
Ending Page 3.18


Open content in new tab

   Open content in new tab
Source: ACM Digital Library