Access Restriction

Author Subramoni, Hari ♦ Krishna, K. ♦ Sur, Sayantan ♦ Dhabaleswar K., P.
Source CiteSeerX
Content type Text
File Format PDF
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword Generalized Collectivecommunication Primitive ♦ Usingconnectx-2 Offload Engine ♦ Collective Communication Call ♦ Simultaneous Computation ♦ Collective Operation ♦ Offload Feature ♦ Current Generation Mpi ♦ Non-blocking Collective Communication ♦ Reduction Operation ♦ Collective Communication Primitive ♦ Point-to-point Operation ♦ Generalized Primitive ♦ Connectx-2 Offload Feature ♦ Good Overlap Characteristic ♦ Offload Capability ♦ Evaluation Reveals ♦ Various Collective Algorithm ♦ Network Interfacecard ♦ Host Processor ♦ Thread-based Design ♦ Offload Mechanism ♦ Message Passing Interface ♦ Collective Communication Operation ♦ Current Mpi Standard ♦ Large Scale ♦ Varyingcollective Groupsizes Andcommunicationmessage Size ♦ Communication Primitive ♦ Scientific Application ♦ Connectx-2 Infiniband Adapter
Abstract Collective communication operations provided by The Message Passing Interface (MPI) are heavily used by scientific applications at large scale. The current MPI standard, MPI-2.2, only defines blocking collective communication calls and it is expected that MPI-3 will allow for non-blocking collective communication. While it is possible to allow simultaneous computation and communication through thread-based designs, resource sharing across the threads is always a concern. The newly introduced ConnectX-2 InfiniBand adapter from Mellanox features an offload mechanism that enables the Network InterfaceCard(NIC)toperformaseriesofcommunication and reduction operations without the involvement of the host processor. Current generation MPI stacks implement each collective operation using point-to-point operations. To take advantage of offload feature in a rapidly changing architecturalenvironmentforallMPIcollectives,theymust be re-designed using flexible and generalized primitives. The primitives can then be used to compose various collective algorithms. The primitives must provide increased overlap with adapters supporting offload capabilities with varyingcollective groupsizes andcommunicationmessage sizes. In this paper, we take on the challenge of designing collective communication primitives with good overlap characteristics and evaluate their performance using ConnectX-2 offload feature. We also show how collectives such as Barrier can be designed using our communication primitives. Our evaluation reveals that we can achieve near perfect (94 %- 100%) overlap of computation and communication by using our primitives. Additionally, we
Educational Role Student ♦ Teacher
Age Range above 22 year
Educational Use Research
Education Level UG and PG ♦ Career/Technical Study