Thumbnail
Access Restriction
Subscribed

Author Soderquist, Peter ♦ Leeser, Miriam
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Copyright Year ©1996
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword FPU ♦ SRT ♦ Area and performance tradeoffs ♦ Division ♦ Floating-point ♦ Square root
Abstract Floating-point divide and square-root operations are essential to many scientific and engineering applications, and are required in all computer systems that support the IEEE floating-point standard. Yet many current microprocessors provide only weak support for these operations. The latency and throughput of division are typically far inferior to those of floating-point addition and multiplication, and square-root performance is often even lower. This article argues the case for high-performance division and square root. It also explains the algorithms and implementations of the primary techniques, subtractive and multiplicative methods, employed in microprocessor floating-point units with their associated area/performance tradeoffs. Case studies of representative floating-point unit configurations are presented, supported by simulation results using a carefully selected benchmark, Givens rotation, to show the dynamic performance impact of the various implementation alternatives. The topology of the implementation is found to be an important performance factor. Multiplicative algorithms, such as the Newton-Raphson method and Goldschmidt's algorithm, can achieve low latencies. However, these implementations serialize multiply, divide, and square root operations through a single pipeline, which can lead to low throughput. While this hardware sharing yields low size requirements for baseline implementations, lower-latency versions require many times more area. For these reasons, multiplicative implementations are best suited to cases where subtractive methods are precluded by area constraints, and modest performance on divide and square root operations is tolerable. Subtractive algorithms, exemplified by radix-4 SRT and radix-16 SRT, can be made to execute in parallel with other floating-point operations.
ISSN 03600300
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 1996-09-01
Publisher Place New York
e-ISSN 15577341
Journal ACM Computing Surveys (CSUR)
Volume Number 28
Issue Number 3
Page Count 47
Starting Page 518
Ending Page 564


Open content in new tab

   Open content in new tab
Source: ACM Digital Library