Thumbnail
Access Restriction
Subscribed

Author Gonzlez, Rodrigo ♦ Navarro, Gonzalo ♦ Ferrada, Hctor
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Copyright Year ©2014
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Computer programming, programs & data
Subject Keyword Compressed suffix arrays ♦ Re-Pair ♦ Text databases
Abstract We introduce a compression technique for suffix arrays. It is sensitive to the compressibility of the text and $\textit{local},$ meaning that random portions of the suffix array can be decompressed by accessing mostly contiguous memory areas. This makes decompression very fast, especially when various contiguous cells must be accessed. Our main technical contributions are the following. First, we show that runs of consecutive values that are known to appear in function $Ψ(\textit{i})$ = $A^{™1}[A[i]$ + 1] of suffix arrays $\textit{A}$ of compressible texts also show up as repetitions in the differential suffix array $\textit{A}'[\textit{i}]$ = $\textit{A}[\textit{i}]$ ™ $\textit{A}[\textit{i}™1].$ Second, we use Re-Pair, a grammar-based compressor, to compress the differential suffix array, and upper bound its compression ratio in terms of the number of runs. Third, we show how to compact the space used by the grammar rules by up to 50%, while still permitting direct access to the rules. Fourth, we develop specific variants of Re-Pair that work using knowledge of Ψ, and use much less space than the general Re-Pair compressor, while achieving almost the same compression ratios. Fifth, we implement the scheme and compare it exhaustively with previous work, including the first implementations of previous theoretical proposals.
ISSN 10846654
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2015-01-07
Publisher Place New York
e-ISSN 10846654
Journal Journal of Experimental Algorithmics (JEA)
Volume Number 19
Page Count 30
Starting Page 1.1
Ending Page 1.30


Open content in new tab

   Open content in new tab
Source: ACM Digital Library