Access Restriction

Author An, Avishek ♦ Bedathur, Srikanta ♦ Berberich, Klaus ♦ Schenkel, Ralf
Source CiteSeerX
Content type Text
File Format PDF
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword Time-travel Text Search ♦ Index Maintenance ♦ Web Archive ♦ New Document Version ♦ Novel Index Structure ♦ Read Index Entry ♦ Web Evolves ♦ Append-only Operation ♦ Sharded Index Organization ♦ Temporal Predicate ♦ Time Interval ♦ Index Structure ♦ Document Version ♦ Present Experiment ♦ Keyword Query ♦ Small In-memory Buffer ♦ Large-scale Real-world Datasets ♦ Query-processing Performance ♦ Different Index Structure
Abstract Time-travel text search enriches standard text search by temporal predicates, so that users of web archives can easily retrieve document versions that are considered relevant to a given keyword query and existed during a given time interval. Different index structures have been proposed to efficiently support time-travel text search. None of them, however, can easily be updated as the Web evolves and new document versions are added to the web archive. In this work, we describe a novel index structure that efficiently supports time-travel text search and can be maintained incrementally as new document versions are added to the web archive. Our solution uses a sharded index organization, bounds the number of spuriously read index entries per shard, and can be maintained using small in-memory buffers and append-only operations. We present experiments on two large-scale real-world datasets demonstrating that maintaining our novel index structure is an order of magnitude more efficient than periodically rebuilding one of the existing index structures, while query-processing performance is not adversely affected.
Educational Role Student ♦ Teacher
Age Range above 22 year
Educational Use Research
Education Level UG and PG ♦ Career/Technical Study