Thumbnail
Access Restriction
Open

Author Keith H., R. ♦ Stata, Raymie ♦ Wickremesinghe, Rajiv ♦ Wiener, Janet L.
Source CiteSeerX
Content type Text
File Format PDF
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword Link Database ♦ Fast Access ♦ Web Page ♦ Billion-edge Graph ♦ Wide Range ♦ Well-known Compression Method ♦ Schema Model ♦ Connectivity Server ♦ Connected Component ♦ Many Page ♦ Hyperlink Access Time ♦ Kleinberg Hit ♦ Decompression Speed ♦ Common Set ♦ Real Time ♦ Compression Ratio ♦ Special-purpose Database ♦ Graph Graph ♦ Graph Algorithm ♦ Second Technique ♦ First Version ♦ Host Share Hyperlink ♦ First Compression Technique ♦ Web Graph ♦ Space Requirement ♦ Performance Number
Abstract The Connectivity Server is a special-purpose database whose schema models the Web as a graph graph where URLs are nodes and hyperlinks are directed edges. The Link Database provides fast access to the hyperlinks. To support a wide range of graph algorithms, we find it important to fit the Link Database into memory. In the first version of the Link Database, we achieved this fit by using machines with lots of memory (8GB), and storing each hyperlink in 32 bits. However, this approach was limited to roughly 100 million Web pages. This paper presents techniques to compress the links to accommodate larger graphs. Our techniques combine well-known compression methods with methods that depend on the properties of the web graph. The first compression technique takes advantage of the fact that most hyperlinks on most Web pages point to other pages on the same host as the page itself. The second technique takes advantage of the fact that many pages on the same host share hyperlinks, that is, they tend to point to a common set of pages. Together, these techniques reduce space requirements to under 6 bits per link. While (de)compression adds latency to the hyperlink access time, we can still compute the strongly connected components of a 6 billion-edge graph in under 20 minutes and run applications such as Kleinberg’s HITS in real time. This paper describes our techniques for compressing the Link Database, and provides performance numbers for compression ratios and decompression speed. 2 1
Educational Role Student ♦ Teacher
Age Range above 22 year
Educational Use Research
Education Level UG and PG ♦ Career/Technical Study
Learning Resource Type Article
Publisher Date 2001-01-01