Thumbnail
Access Restriction
Subscribed

Author Geck, Gaetano ♦ Neven, Frank ♦ Schwentick, Thomas ♦ Ameloot, Tom J. ♦ Ketsman, Bas
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Language English
Abstract Evaluating queries over massive amounts of data is a major challenge in the big data era. Modern massively parallel systems, such as, Spark, organize query answering as a sequence of rounds each consisting of a distinct communication phase followed by a computation phase. The communication phase redistributes data over the available servers, while in the subsequent computation phase each server performs the actual computation on its local data. There is a growing interest in single-round algorithms for evaluating multiway joins where data is first reshuffled over the servers and then evaluated in a parallel but communication-free way. As the amount of communication induced by a reshuffling of the data is a dominating cost in such systems, we introduce a framework for reasoning about data partitioning to detect when we can avoid the data reshuffling step. Specifically, we formalize the decision problems parallel-correctness and transfer of parallel-correctness, provide semantical characterizations, and obtain tight complexity bounds.
Description Affiliation: Hasselt University and Transnational University of Limburg, Hasselt, Belgium (Ameloot, Tom J.; Ketsman, Bas; Neven, Frank) || TU Dortmund University, Dortmund, Germany (Geck, Gaetano; Schwentick, Thomas)
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2005-08-01
Publisher Place New York
Journal Communications of the ACM (CACM)
Volume Number 60
Issue Number 3
Page Count 8
Starting Page 93
Ending Page 100


Open content in new tab

   Open content in new tab
Source: ACM Digital Library