Thumbnail
Access Restriction
Subscribed

Author Sinha, Ranjan ♦ Zobel, Justin
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Copyright Year ©2005
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Computer programming, programs & data
Subject Keyword Sorting ♦ Cache-aware ♦ Cache-conscious ♦ Data structure ♦ In-memory ♦ String
Abstract Algorithms for sorting large datasets can be made more efficient with careful use of memory hierarchies and reduction in the number of costly memory accesses. In earlier work, we introduced burstsort, a new string-sorting algorithm that on large sets of strings is almost twice as fast as previous algorithms, primarily because it is more cache efficient. Burstsort dynamically builds a small trie that is used to rapidly allocate each string to a bucket. In this paper, we introduce new variants of our algorithm: SR-burstsort, DR-burstsort, and DRL-burstsort. These algorithms use a random sample of the strings to construct an approximation to the trie prior to sorting. Our experimental results with sets of over 30 million strings show that the new variants reduce, by up to 37%, cache misses further than did the original burstsort, while simultaneously reducing instruction counts by up to 24%. In pathological cases, even further savings can be obtained.
ISSN 10846654
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2005-12-01
Publisher Place New York
e-ISSN 10846654
Journal Journal of Experimental Algorithmics (JEA)
Volume Number 10


Open content in new tab

   Open content in new tab
Source: ACM Digital Library