Thumbnail
Access Restriction
Subscribed

Author Boytsov, Leonid
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Copyright Year ©2011
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Computer programming, programs & data
Subject Keyword k-errata tree ♦ q-gram ♦ q-sample ♦ Damerau-Levenshtein distance ♦ Levenshtein distance ♦ NR-grep ♦ Agrep ♦ Approximate searching ♦ Frequency distance ♦ Frequency vector trie ♦ Metric trees ♦ Neighborhood generation ♦ Trie
Abstract The primary goal of this article is to survey state-of-the-art indexing methods for approximate dictionary searching. To improve understanding of the field, we introduce a taxonomy that classifies all methods into direct methods and sequence-based filtering methods. We focus on infrequently updated dictionaries, which are used primarily for retrieval. Therefore, we consider indices that are optimized for retrieval rather than for update. The indices are assumed to be associative, that is, capable of storing and retrieving auxiliary information, such as string identifiers. All solutions are lossless and guarantee retrieval of strings within a specified edit distance $\textit{k}.$ Benchmark results are presented for the practically important cases of $\textit{k}=1,$ 2, and 3. We concentrate on natural language datasets, which include synthetic English and Russian dictionaries, as well as dictionaries of frequent words extracted from the ClueWeb09 collection. In addition, we carry out experiments with dictionaries containing DNA sequences. The article is concluded with a discussion of benchmark results and directions for future research.
ISSN 10846654
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2011-05-01
Publisher Place New York
e-ISSN 10846654
Journal Journal of Experimental Algorithmics (JEA)
Volume Number 16
Page Count 91
Starting Page 1.1
Ending Page 1.91


Open content in new tab

   Open content in new tab
Source: ACM Digital Library