Access Restriction

Author Stricker, T. ♦ Gross, T.
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Abstract Communication in a parallel system frequently involves moving data from the memory of one node to the memory of another; this is the standard communication model employed in message passing systems. Depending on the application, we observe a variety of patterns as part of communication steps, e.g., regular (i.e. blocks of data), strided, or irregular (indexed) memory accesses. The effective speed of these communication steps is determined by the network bandwidth and the memory bandwidth, and measurements on current parallel supercomputers indicate that the performance is limited by the memory bandwidth rather than the network bandwidth.Current systems provide a wealth of options to perform communication, and a compiler or user is faced with the difficulty of finding the communication operations that best use the available memory and network bandwidth. This paper provides a framework to evaluate different solutions for inter-node communication and presents the copy-transfer model; this model captures the contributions of the memory system to inter-node communication. We demonstrate the usefulness of this simple model by applying it to two commercial parallel systems, the Cray T3D and the Intel Paragon.In particular we identify two methods to transfer data between nodes in these two machines. In buffer-packing transfers, a contiguous block of data is transferred across the network. If the data are not stored contiguously, they are copied to (gathering) or from (scattering) buffers in local memory before and after the transfer. Chained transfers perform gathering, transfer and scattering in one step, reading the data elements with some non-sequential pattern and immediately transferring them on to the destination.Our model and measurements indicate that chaining of the gather, transfer, and scatter operations results in better performance than buffer packing for many important access patterns. Most standard message passing libraries (like MPI, PVM or NX) force the parallelizing compiler (or the programmer) to employ the buffer-packing communication operations. However, the addition of hardware support dedicated to communication (e.g., DMAs, line-transfer units) now gives the compiler a wider range of options.
Description Affiliation: School of Computer Science, Carnegie Mellon University, Pittsburgh, PA and Institut fuer Computer Systeme, ETH Zuerich, CH 8092 Zuerich, Switzerland (Gross, T.) || School of Computer Science, Carnegie Mellon University, Pittsburgh, PA (Stricker, T.)
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 1981-04-01
Publisher Place New York
Journal ACM SIGARCH Computer Architecture News (CARN)
Volume Number 23
Issue Number 2
Page Count 12
Starting Page 308
Ending Page 319

Open content in new tab

   Open content in new tab
Source: ACM Digital Library