Access Restriction

Author Thorup, Mikkel
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Language English
Abstract Randomized algorithms are often enjoyed for their simplicity, but the hash functions employed to yield the desired probabilistic guarantees are often too complicated to be practical. Here, we survey recent results on how simple hashing schemes based on tabulation provide unexpectedly strong guarantees. Simple tabulation hashing dates back to Zobrist (A new hashing method with application for game playing. Technical Report 88, Computer Sciences Department, University of Wisconsin). Keys are viewed as consisting of c characters and we have precomputed character tables $h_{1},$ . . ., $h_{c}$ mapping characters to random hash values. A key x = $(x_{1},$ . . ., $x_{c})$ is hashed to $h_{1}[x_{1}]$ ⊕ $h_{2}[x_{2}].$ ⊕ $h_{c}[x_{c}]$ This schemes is very fast with character tables in cache. Although simple tabulation is not even four-independent, it does provide many of the guarantees that are normally obtained via higher independence, for example, linear probing and Cuckoo hashing. Next, we consider twisted tabulation where one input character is "twisted" in a simple way. The resulting hash function has powerful distributional properties: Chernoff-style tail bounds and a very small bias for minwise hashing. This is also yields an extremely fast pseudorandom number generator that is provably good for many classic randomized algorithms and data-structures. Finally, we consider double tabulation where we compose two simple tabulation functions, applying one to the output of the other, and show that this yields very high independence in the classic framework of Wegman and Carter. In fact, w.h.p., for a given set of size proportional to that of the space consumed, double tabulation gives fully random hashing. We also mention some more elaborate tabulation schemes getting near-optimal independence for given time and space. Although these tabulation schemes are all easy to implement and use, their analysis is not.
Description Affiliation: University of Copenhagen, Denmark (Thorup, Mikkel)
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2005-08-01
Publisher Place New York
Journal Communications of the ACM (CACM)
Volume Number 60
Issue Number 7
Page Count 8
Starting Page 94
Ending Page 101

Open content in new tab

   Open content in new tab
Source: ACM Digital Library