Access Restriction

Author Hyyr, Heikki ♦ Fredriksson, Kimmo ♦ Navarro, Gonzalo
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Copyright Year ©2005
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Computer programming, programs & data
Subject Keyword Approximate string matching ♦ Bit-parallelism ♦ Multiple string matching
Abstract Bit-parallelism permits executing several operations simultaneously over a set of bits or numbers stored in a single computer word. This technique permits searching for the approximate occurrences of a pattern of length $\textit{m}$ in a text of length $\textit{n}$ in time $\textit{O}(⌈\textit{m}/\textit{w}⌉\textit{n}),$ where $\textit{w}$ is the number of bits in the computer word. Although this is asymptotically the optimal bit-parallel speedup over the basic $\textit{O}(\textit{mn})$ time algorithm, it wastes bit-parallelism's power in the common case where $\textit{m}$ is much smaller than $\textit{w},$ since $\textit{w}™\textit{m}$ bits in the computer words are unused. In this paper, we explore different ways to increase the bit-parallelism when the search pattern is short. First, we show how multiple patterns can be packed into a single computer word so as to search for all them simultaneously. Instead of spending $\textit{O}(\textit{rn})$ time to search for $\textit{r}$ patterns of length $\textit{m}≤\textit{w}/2,$ we need $\textit{O}(⌈\textit{rm}/\textit{w}⌉\textit{n})$ time. Second, we show how the mechanism permits boosting the search for a single pattern of length $\textit{m}≤\textit{w}/2,$ which can be searched for in $\textit{O}(⌈\textit{n}/⌊\textit{w}/\textit{m}⌋⌉)$ bit-parallel steps instead of $\textit{O}(\textit{n}).$ Third, we show how to extend these algorithms so that the time bounds essentially depend on $\textit{k}$ instead of $\textit{m},$ where $\textit{k}$ is the maximum number of differences permitted. Finally, we show how the ideas can be applied to other problems such as multiple exact string matching and one-against-all computation of edit distance and longest common subsequences. Our experimental results show that the new algorithms work well in practice, obtaining significant speedups over the best existing alternatives, especially on short patterns and moderate number of differences allowed. This work fills an important gap in the field, where little work has focused on very short patterns.
ISSN 10846654
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2005-12-01
Publisher Place New York
e-ISSN 10846654
Journal Journal of Experimental Algorithmics (JEA)
Volume Number 10

Open content in new tab

   Open content in new tab
Source: ACM Digital Library