On the sorting-complexity of suffix tree constructionOn the sorting-complexity of suffix tree construction

Access Restriction
Subscribed

 Author Farach-Colton, Martin ♦ Ferragina, Paolo ♦ Muthukrishnan, S. Source ACM Digital Library Content type Text Publisher Association for Computing Machinery (ACM) File Format PDF Copyright Year ©2000 Language English
 Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science Subject Keyword DAM model ♦ RAM model ♦ External-memory data structures ♦ Sorting complexity ♦ Suffix array ♦ Suffix tree Abstract The suffix tree of a string is the fundamental data structure of combinatorial pattern matching. We present a recursive technique for building suffix trees that yields optimal algorithms in different computational models. Sorting is an inherent bottleneck in building suffix trees and our algorithms match the sorting lower bound. Specifically, we present the following results. (1) Weiner [1973], who introduced the data structure, gave an optimal $\textit{0}(\textit{n})-time$ algorithm for building the suffix tree of an $\textit{n}-character$ string drawn from a constant-size alphabet. In the comparison model, there is a trivial $&Ogr;(\textit{n}$ log $\textit{n})-time$ lower bound based on sorting, and Weiner's algorithm matches this bound. For integer alphabets, the fastest known algorithm is the $\textit{O}(\textit{n}$ log $\textit{n})time$ comparison-based algorithm, but no super-linear lower bound is known. Closing this gap is the main open question in stringology. We settle this open problem by giving a linear time reduction to sorting for building suffix trees. Since sorting is a lower-bound for building suffix trees, this algorithm is time-optimal in every alphabet mode. In particular, for an alphabet consisting of integers in a polynomial range we get the first known linear-time algorithm. (2) All previously known algorithms for building suffix trees exhibit a marked absence of locality of reference, and thus they tend to elicit many page faults (I/Os) when indexing very long strings. They are therefore unsuitable for building suffix trees in secondary storage devices, where I/Os dominate the overall computational cost. We give a linear-I/O reduction to sorting for suffix tree construction. Since sorting is a trivial I/O-lower bound for building suffix trees, our algorithm is I/O-optimal. ISSN 00045411 Age Range 18 to 22 years ♦ above 22 year Educational Use Research Education Level UG and PG Learning Resource Type Article Publisher Date 2000-11-01 Publisher Place New York e-ISSN 1557735X Journal Journal of the ACM (JACM) Volume Number 47 Issue Number 6 Page Count 25 Starting Page 987 Ending Page 1011

Open content in new tab

Source: ACM Digital Library