The relationship between greedy parsing and symbolwise text compressionThe relationship between greedy parsing and symbolwise text compression

Access Restriction
Subscribed

 Author Bell, Timothy C. ♦ Witten, Ian H. Source ACM Digital Library Content type Text Publisher Association for Computing Machinery (ACM) File Format PDF Copyright Year ©1994 Language English
 Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science Subject Keyword Ziv-Lempel compression ♦ Adaptive modelling ♦ Context modeling Abstract Text compression methods can be divided into two classes: $\textit{symbolwise}$ and $\textit{parsing}.$ Symbolwise methods assign codes to individual symbols, while parsing methods assign codes to groups of consecutive symbols (phrases). The set of phrases available to a parsing method is referred to as a $\textit{dictionary}.$ The vast majority of parsing methods in the literature use $\textit{greedy}$ parsing (including nearly all variations of the popular Ziv-Lempel methods). When greedy parsing is used, the coder processes a string from left to right, at each step encoding as many symbols as possible with a phrase from the dictionary. This parsing strategy is not optimal, but an optimal method cannot guarantee a bounded coding delay.An important problem in compression research has been to establish the relationship between symbolwise methods and parsing methods. This paper extends prior work that shows that there are symbolwise methods that simulate a subset of greedy parsing methods. We provide a more general algorithm that takes any $\textit{nonadaptive}$ greedy parsing method and constructs a symbolwise method that achieves exactly the same compression. Combined with the existence of symbolwise equivalents for two of the most significant $\textit{adaptive}$ parsing methods, this result gives added weight to the idea that research aimed at increasing compression should concentrate on symbolwise methods, while parsing methods should be chosen for speed or temporary storage considerations. ISSN 00045411 Age Range 18 to 22 years ♦ above 22 year Educational Use Research Education Level UG and PG Learning Resource Type Article Publisher Date 1994-07-01 Publisher Place New York e-ISSN 1557735X Journal Journal of the ACM (JACM) Volume Number 41 Issue Number 4 Page Count 17 Starting Page 708 Ending Page 724

Open content in new tab

Source: ACM Digital Library