Thumbnail
Access Restriction
Subscribed

Author Farach-Colton, Martin ♦ Ferragina, Paolo ♦ Muthukrishnan, S.
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Copyright Year ©2000
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword DAM model ♦ RAM model ♦ External-memory data structures ♦ Sorting complexity ♦ Suffix array ♦ Suffix tree
Abstract The suffix tree of a string is the fundamental data structure of combinatorial pattern matching. We present a recursive technique for building suffix trees that yields optimal algorithms in different computational models. Sorting is an inherent bottleneck in building suffix trees and our algorithms match the sorting lower bound. Specifically, we present the following results. (1) Weiner [1973], who introduced the data structure, gave an optimal $\textit{0}(\textit{n})-time$ algorithm for building the suffix tree of an $\textit{n}-character$ string drawn from a constant-size alphabet. In the comparison model, there is a trivial $&Ogr;(\textit{n}$ log $\textit{n})-time$ lower bound based on sorting, and Weiner's algorithm matches this bound. For integer alphabets, the fastest known algorithm is the $\textit{O}(\textit{n}$ log $\textit{n})time$ comparison-based algorithm, but no super-linear lower bound is known. Closing this gap is the main open question in stringology. We settle this open problem by giving a linear time reduction to sorting for building suffix trees. Since sorting is a lower-bound for building suffix trees, this algorithm is time-optimal in every alphabet mode. In particular, for an alphabet consisting of integers in a polynomial range we get the first known linear-time algorithm. (2) All previously known algorithms for building suffix trees exhibit a marked absence of locality of reference, and thus they tend to elicit many page faults (I/Os) when indexing very long strings. They are therefore unsuitable for building suffix trees in secondary storage devices, where I/Os dominate the overall computational cost. We give a linear-I/O reduction to sorting for suffix tree construction. Since sorting is a trivial I/O-lower bound for building suffix trees, our algorithm is I/O-optimal.
ISSN 00045411
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2000-11-01
Publisher Place New York
e-ISSN 1557735X
Journal Journal of the ACM (JACM)
Volume Number 47
Issue Number 6
Page Count 25
Starting Page 987
Ending Page 1011


Open content in new tab

   Open content in new tab
Source: ACM Digital Library