Access Restriction

Author Langston, Harper ♦ Ying, Lexing ♦ Biros, George ♦ Shringarpure, Aashay ♦ Vuduc, Richard ♦ Zorin, Denis ♦ Chandramowlishwaran, Aparna ♦ Lashuk, Ilya ♦ Nguyen, Tuan-Anh ♦ Sampath, Rahul
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Language English
Abstract We describe a parallel fast multipole method (FMM) for highly nonuniform distributions of particles. We employ both distributed memory parallelism (via MPI) and shared memory parallelism (via OpenMP and GPU acceleration) to rapidly evaluate two-body nonoscillatory potentials in three dimensions on heterogeneous high performance computing architectures. We have performed scalability tests with up to 30 billion particles on 196,608 cores on the AMD/CRAY-based Jaguar system at ORNL. On a GPU-enabled system (NSF's Keeneland at Georgia Tech/ORNL), we observed 30× speedup over a single core CPU and 7× speedup over a multicore CPU implementation. By combining GPUs with MPI, we achieve less than 10 ns/particle and six digits of accuracy for a run with 48 million nonuniformly distributed particles on 192 GPUs.
Description Affiliation: University of Texas at Austin, TX (Ying, Lexing) || College of Computing, Atlanta, GA (Chandramowlishwaran, Aparna; Langston, Harper; Nguyen, Tuan-Anh; Vuduc, Richard) || Lawrence Livermore National Laboratory, Livermore, CA (Lashuk, Ilya) || The University of Texas at Austin, TX (Biros, George) || Oak Ridge National Laboratory, Oak Ridge, TN (Sampath, Rahul) || New York University, New York, NY (Zorin, Denis)
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2005-08-01
Publisher Place New York
Journal Communications of the ACM (CACM)
Volume Number 55
Issue Number 5
Page Count 9
Starting Page 101
Ending Page 109

Open content in new tab

   Open content in new tab
Source: ACM Digital Library