Thumbnail
Access Restriction
Subscribed

Author Matias, Yossi ♦ Rajpoot, Nasir ♦ Sahinalp, Cenk
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Copyright Year ©2001
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Computer programming, programs & data
Abstract We report on the performance evaluation of greedy parsing with asingle step lookahead (which we call flexible Parsing or $\textit{FP}as$ an alternative to the commonly used greedy parsing (withno-lookaheads) scheme. Greedy parsing is the basis of most popularcompression programs including $\textbf{UNIX$ $compress}$ $and\textbf{gzip},$ however it usually results in far from optimalparsing/compression with regard to the dictionary constructionscheme in use. Flexible parsing, however, is optimal [MS99], i.e.partitions any given input to the smallest number of phrasespossible, for dictionary construction schemes which satisfy theprefix property throughout their execution.We focus on the application of $\textit{FP}$ in the context of theLZW variant of the Lempel-Ziv'78 dictionary construction method[Wel84, ZL78], which is of considerable practical interest. Weimplement two compression algorithms which use (1) $\textit{FP}$ withLZW dictionary $(LZW-\textit{FP}),$ and (2) $\textit{FP}$ with analternative flexible dictionary (FPA as introduced in [Hor95]). Ourimplementations are based on novel on-line data structures enablingus to use linear time and space. We test our implementations on acollection of input sequences which includes textual files, DNAsequences, medical images, and pseudorandom binary files, andcompare our results with two of the most popular compressionprograms $\textbf{UNIX$ $compress}$ and $\textbf{gzip}.$ Our resultsdemonstrate that flexible parsing is especially useful fornon-textual data, on which it improves over the compression ratesof $\textbf{compress}$ and $\textbf{gzip}$ by up to 20% and 35%,respectively.
ISSN 10846654
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2001-12-01
Publisher Place New York
e-ISSN 10846654
Journal Journal of Experimental Algorithmics (JEA)
Volume Number 6


Open content in new tab

   Open content in new tab
Source: ACM Digital Library