### Increased bit-parallelism for approximate and multiple string matchingIncreased bit-parallelism for approximate and multiple string matching

Access Restriction
Subscribed

 Author Hyyr, Heikki ♦ Fredriksson, Kimmo ♦ Navarro, Gonzalo Source ACM Digital Library Content type Text Publisher Association for Computing Machinery (ACM) File Format PDF Copyright Year ©2005 Language English
 Subject Domain (in DDC) Computer science, information & general works ♦ Computer programming, programs & data Subject Keyword Approximate string matching ♦ Bit-parallelism ♦ Multiple string matching Abstract Bit-parallelism permits executing several operations simultaneously over a set of bits or numbers stored in a single computer word. This technique permits searching for the approximate occurrences of a pattern of length $\textit{m}$ in a text of length $\textit{n}$ in time $\textit{O}(⌈\textit{m}/\textit{w}⌉\textit{n}),$ where $\textit{w}$ is the number of bits in the computer word. Although this is asymptotically the optimal bit-parallel speedup over the basic $\textit{O}(\textit{mn})$ time algorithm, it wastes bit-parallelism's power in the common case where $\textit{m}$ is much smaller than $\textit{w},$ since $\textit{w}™\textit{m}$ bits in the computer words are unused. In this paper, we explore different ways to increase the bit-parallelism when the search pattern is short. First, we show how multiple patterns can be packed into a single computer word so as to search for all them simultaneously. Instead of spending $\textit{O}(\textit{rn})$ time to search for $\textit{r}$ patterns of length $\textit{m}≤\textit{w}/2,$ we need $\textit{O}(⌈\textit{rm}/\textit{w}⌉\textit{n})$ time. Second, we show how the mechanism permits boosting the search for a single pattern of length $\textit{m}≤\textit{w}/2,$ which can be searched for in $\textit{O}(⌈\textit{n}/⌊\textit{w}/\textit{m}⌋⌉)$ bit-parallel steps instead of $\textit{O}(\textit{n}).$ Third, we show how to extend these algorithms so that the time bounds essentially depend on $\textit{k}$ instead of $\textit{m},$ where $\textit{k}$ is the maximum number of differences permitted. Finally, we show how the ideas can be applied to other problems such as multiple exact string matching and one-against-all computation of edit distance and longest common subsequences. Our experimental results show that the new algorithms work well in practice, obtaining significant speedups over the best existing alternatives, especially on short patterns and moderate number of differences allowed. This work fills an important gap in the field, where little work has focused on very short patterns. ISSN 10846654 Age Range 18 to 22 years ♦ above 22 year Educational Use Research Education Level UG and PG Learning Resource Type Article Publisher Date 2005-12-01 Publisher Place New York e-ISSN 10846654 Journal Journal of Experimental Algorithmics (JEA) Volume Number 10

#### Open content in new tab

Source: ACM Digital Library