Thumbnail
Access Restriction
Subscribed

Author Ferragina, Paolo ♦ Luccio, Fabrizio ♦ Manzini, Giovanni ♦ Muthukrishnan, S.
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Copyright Year ©2009
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword Burrows-Wheeler transform ♦ XML compression ♦ XML indexing ♦ Labeled tree compression ♦ Labeled tree indexing ♦ Path searching ♦ Tree navigation
Abstract Consider an ordered, static tree $\textit{T}$ where each node has a label from alphabet Σ. Tree $\textit{T}$ may be of arbitrary degree and shape. Our goal is designing a compressed storage scheme of $\textit{T}$ that supports basic $\textit{navigational}$ operations among the immediate neighbors of a node (i.e. parent, $\textit{i}th$ child, or any child with some label,…) as well as more sophisticated $\textit{path}-based$ search operations over its labeled structure. We present a novel approach to this problem by designing what we call the XBW-transform of the tree in the spirit of the well-known Burrows-Wheeler transform for strings [1994]. The XBW-transform uses path-sorting to linearize the labeled tree $\textit{T}$ into $\textit{two}$ coordinated arrays, one capturing the structure and the other the labels. For the first time, by using the properties of the XBW-transform, our compressed indexes go beyond the information-theoretic lower bound, and support navigational and path-search operations over labeled trees within (near-)optimal time bounds and entropy-bounded space. Our XBW-transform is simple and likely to spur new results in the theory of tree compression and indexing, as well as interesting application contexts. As an example, we use the XBW-transform to design and implement a compressed index for XML documents whose compression ratio is significantly better than the one achievable by state-of-the-art tools, and its query time performance is order of magnitudes faster.
ISSN 00045411
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2009-11-01
Publisher Place New York
e-ISSN 1557735X
Journal Journal of the ACM (JACM)
Volume Number 57
Issue Number 1
Page Count 33
Starting Page 1
Ending Page 33


Open content in new tab

   Open content in new tab
Source: ACM Digital Library