Thumbnail
Access Restriction
Subscribed

Author Buchsbaum, Adam L. ♦ Fowler, Glenn S. ♦ Giancarlo, Raffaele
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Copyright Year ©2003
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword Asymmetric traveling salesman problem ♦ Dynamic programming ♦ Experimental algorithmics ♦ Table compression
Abstract We study the problem of compressing massive tables within the partition-training paradigm introduced by Buchsbaum et al. [2000], in which a table is partitioned by an off-line training procedure into disjoint intervals of columns, each of which is compressed separately by a standard, on-line compressor like gzip. We provide a new theory that unifies previous experimental observations on partitioning and heuristic observations on column permutation, all of which are used to improve compression rates. Based on this theory, we devise the first on-line training algorithms for table compression, which can be applied to individual files, not just continuously operating sources; and also a new, off-line training algorithm, based on a link to the asymmetric traveling salesman problem, which improves on prior work by rearranging columns prior to partitioning. We demonstrate these results experimentally. On various test files, the on-line algorithms provide 35--55% improvement over gzip with negligible slowdown; the off-line reordering provides up to 20% further improvement over partitioning alone. We also show that a variation of the table compression problem is MAX-SNP hard.
ISSN 00045411
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2003-11-01
Publisher Place New York
e-ISSN 1557735X
Journal Journal of the ACM (JACM)
Volume Number 50
Issue Number 6
Page Count 27
Starting Page 825
Ending Page 851


Open content in new tab

   Open content in new tab
Source: ACM Digital Library