Thumbnail
Access Restriction
Subscribed

Author Penner, Michael ♦ Prasanna, Viktor K.
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Copyright Year ©2007
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Computer programming, programs & data
Subject Keyword Data structures ♦ Floyd--Warshall algorithm ♦ Systolic array algorithms
Abstract The topic of cache performance has been well studied in recent years. Compiler optimizations exist and optimizations have been done for many problems. Much of this work has focused on dense linear algebra problems. At first glance, the Floyd--Warshall algorithm appears to fall into this category. In this paper, we begin by applying two standard cache-friendly optimizations to the Floyd--Warshall algorithm and show limited performance improvements. We then discuss the unidirectional space time representation (USTR). We show analytically that the USTR can be used to reduce the amount of processor-memory traffic by a factor of O(&sqrt;C), where C is the cache size, for a large class of algorithms. Since the USTR leads to a tiled implementation, we develop a tile size selection heuristic to intelligently narrow the search space for the tile size that minimizes total execution time. Using the USTR, we develop a cache-friendly implementation of the Floyd--Warshall algorithm. We show experimentally that this implementation minimizes the level-1 and level-2 cache misses and TLB misses and, therefore, exhibits the best overall performance. Using this implementation, we show a 2x improvement in performance over the best compiler optimized implementation on three different architectures. Finally, we show analytically that our implementation of the Floyd--Warshall algorithm is asymptotically optimal with respect to processor-memory traffic. We show experimental results for the Pentium III, Alpha, and MIPS R12000 machines using problem sizes between 1024 and 2048 vertices. We demonstrate improved cache performance using the Simplescalar simulator.
ISSN 10846654
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2007-02-09
Publisher Place New York
e-ISSN 10846654
Journal Journal of Experimental Algorithmics (JEA)
Volume Number 11


Open content in new tab

   Open content in new tab
Source: ACM Digital Library