Thumbnail
Access Restriction
Open

Author Sassone, Peter G. ♦ Wills, D. Scott
Source CiteSeerX
Content type Text
File Format PDF
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword Pipeline Communication ♦ Speculative Dependence Chain ♦ Dynamic Strand ♦ Critical Path ♦ Reorder Buffer ♦ Dynamic Alu Instruction ♦ Wire-dominated Architecture ♦ Modern Era ♦ Speculative Execution ♦ Issue Queue ♦ Out-of-order Pipeline ♦ Normal Alus ♦ Issue Queue Occupancy ♦ Dynamic Detection ♦ Simple Performance Optimization ♦ Ipc Penalty ♦ Binary Compatibility ♦ Average Ipc Increase ♦ Specific Effort ♦ Highly-connected Element ♦ Average Reorder Buffer Occupancy ♦ Instruction Strand ♦ Dependency Pressure ♦ Execution Target ♦ Issue Logic ♦ Multicycle Issue ♦ Several Property ♦ Bypass Network ♦ Intermediate Fan-out ♦ Atomic Macro-instructions ♦ Mediabench Application ♦ Eight-wide Machine ♦ Self-bypass Mode ♦ Clock Frequency Gain ♦ Four-wide Machine ♦ Needle Communication ♦ Dependent Instruction
Abstract In the modern era of wire-dominated architectures, specific effort must be made to reduce needless communication within out-of-order pipelines while still maintaining binary compatibility. To ease pressure on highly-connected elements such as the issue logic and bypass network, we propose the dynamic detection and speculative execution of instruction strands--linear chains of dependent instructions without intermediate fan-out. The hardware required for detecting these chains is simple and resides off the critical path of the pipeline, and the execution targets are the normal ALUs with a self-bypass mode. By collapsing these strings of dependencies into atomic macro-instructions, the efficiency of the issue queue and reorder buffer can be increased. Our results show that over 25% of all dynamic ALU instructions can be grouped, decreasing both the average reorder buffer occupancy and issue queue occupancy by over a third. Additionally, these strands have several properties which make them amenable to simple performance optimizations. Our experiments show average IPC increases of 17% on a four-wide machine and 20% on an eight-wide machine in Spec2000int and Mediabench applications. Finally, strands ease the IPC penalties of multicycle issue and bypass by reducing dependency pressures, providing opportunity for clock frequency gains as well.
Educational Role Student ♦ Teacher
Age Range above 22 year
Educational Use Research
Education Level UG and PG ♦ Career/Technical Study
Publisher Date 2004-01-01