iccd’03 1 distributed reorder buffer schemes for low power * *supported in part by darpa through...
Post on 21-Dec-2015
219 views
TRANSCRIPT
![Page 1: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/1.jpg)
ICCD’03
1
Distributed Reorder Buffer Schemes for Low Power *
*supported in part by DARPA through the PAC-C program and NSF
Gurhan Kucuk, Oguz Ergin, Dmitry Ponomarev, Kanad GhoseDepartment of Computer Science
State University of New YorkBinghamton, NY 13902-6000
http://www.cs.binghamton.edu/~lowpower
21st International Conference on Computer Design (ICCD’03), October 14th 2003
![Page 2: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/2.jpg)
ICCD’03
2
– Reorder Buffer (ROB) complexities– Motivation for the low-complexity ROB– Low-complexity ROB designs
Fully Distributed ROB Retention Latches (RLs) revisited (ICS’02) Combined Scheme
– Results– Concluding remarks
Outline
![Page 3: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/3.jpg)
ICCD’03
3
P6-style Superscalar Datapath
IQ
FunctionUnitsInstruction Issue
F1 D1
FU1
FU2
FUm
ARF
Result/status forwarding buses
EX
Instruction dispatch
Architectural Register File
F2
Fetch Decode/Dispatch
D2
ROB
![Page 4: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/4.jpg)
ICCD’03
4
IQ
FunctionUnitsInstruction Issue
F1 D1
FU1
FU2
FUm
ARF
Result/status forwarding buses
EX
Instruction dispatch
Architectural Register File
F2
Fetch Decode/Dispatch
D2ROB
RB
PPC 620-style Superscalar Datapath
![Page 5: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/5.jpg)
ICCD’03
5
ROB Port Requirements for a W-way CPU
ROB
WritebackW write portsto write results
Dispatch/Issue2W read ports
to read the source operands
Decode/DispatchW write portsto setup entries
CommitW read portsfor instruction commitment
![Page 6: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/6.jpg)
ICCD’03
6
What This Work is All About
– ROB complexity reduction is important for reducing power and improving performance
ROB dissipates a non-trivial fraction of the total chip power ROB accesses stretch over several cycles
– Goal of this work: Reduce the complexity and power dissipation of the ROB without sacrificing performance
![Page 7: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/7.jpg)
ICCD’03
7
Comparison of ROB Bitcells (0.18µ, TSMC)
Layout of a 32-ported SRAM bitcell
Layout of a 16-ported SRAM bitcell
Area Reduction – 71%
Shorter bit and wordlines
![Page 8: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/8.jpg)
ICCD’03
8
Instruction dispatch
P6-style Superscalar Datapath
IQ
FunctionUnitsInstruction Issue
F1 D1
FU1
FU2
FUm
ARF
Result/status forwarding buses
EX
Architectural Register File
F2
Fetch Decode/Dispatch
D2
ROB
![Page 9: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/9.jpg)
ICCD’03
9
Reorder Buffer Distribution
IQ
FunctionUnitsInstruction Issue
F1 D1
FU1
FU2
FUm
ARF
Result/status forwarding buses
EX
Instruction dispatch
Architectural Register File
F2
Fetch Decode/Dispatch
D2
ROBC 1
ROBC 2
ROBC m
ROB
Holds pointers to entries within
ROBCs
ROB Components
(ROBCs)
![Page 10: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/10.jpg)
ICCD’03
10
Impact of Distributing the ROB
– Each ROBC is effectively is a small Rename Buffer Smaller read/write access energy Faster access time
– Distributing physical storage in this manner allows FUs to use shorter buses to write their respective ROBCs
Lower energy dissipation on the wires (We have NOT accounted for energy savings from using shorter wires)
– Fits in naturally with a multi-clustered datapath design
![Page 11: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/11.jpg)
ICCD’03
11
– Port conflicts result in performance penalty
– Interconnection network is more complex
Problems with the earlier Multi-banked RF Schemes
![Page 12: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/12.jpg)
ICCD’03
12
– Port conflicts result in performance penaltyTotally avoid write port conflictsMinimize read port conflicts at commitment
– Interconnection network is more complex
and some good news!
Problems with the earlier Multi-banked RF Schemes
![Page 13: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/13.jpg)
ICCD’03
13
– Port conflicts result in performance penaltyTotally avoid write port conflictsMinimize read port conflicts at commitment
– Interconnection network is more complexCompletely remove source read ports
and some good news!
Problems with the earlier Multi-banked RF Schemes
![Page 14: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/14.jpg)
ICCD’03
14
Problems with the earlier Multi-banked RF Schemes
– Port conflicts result in performance penaltyTotally avoid write port conflictsMinimize read port conflicts at commitmentTotally avoid source read port conflicts
– Interconnection network is more complexCompletely remove source read ports
and some good news!
![Page 15: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/15.jpg)
ICCD’03
15
ROBCs Assigned to Each Function Unit
1
2
3
4
n
ROBC #11 1
2
3
1
ROBC #21
2
3
4
m 1
2 1
ROBC #m1FU #m
FU #2
FU #1
Centralized ROB Distributed ROBCs
FU_id offset
![Page 16: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/16.jpg)
ICCD’03
16
Good News:Write port conflicts are avoided
ROBC #11
2
3
ROBC #21
2
3
4
ROBC #m1FU #m
FU #2
FU #1
1 write port
Distributed ROBCs
1
2
3
4
n
1 1
m 1
2 1
Centralized ROB
FU_id offset
![Page 17: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/17.jpg)
ICCD’03
17
Round Robin Scheduling at Dispatch Time
1
2
3
4
n
Int ADDROBC #1
1
2
FU_id offset
Centralized ROB Distributed ROBCs
Int ADDROBC #2
1
2
Int ADDROBC #3
1
2
Int ADDROBC #4
1
2
instruction
5
![Page 18: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/18.jpg)
ICCD’03
18
Round Robin Scheduling at Dispatch Time
1
2
3
4
n
Int ADDROBC #1
1
2
FU_id offset
Centralized ROB Distributed ROBCs
Int ADDROBC #2
1
2
Int ADDROBC #3
1
2
Int ADDROBC #4
1
2
ADDinstruction
5
![Page 19: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/19.jpg)
ICCD’03
19
Round Robin Scheduling at Dispatch Time
1
2
3
4
n
Int ADDROBC #1
1
2
FU_id offset
Centralized ROB Distributed ROBCs
Int ADDROBC #2
1
2
Int ADDROBC #3
1
2
Int ADDROBC #4
1
2
ADDreserved
instruction
5
![Page 20: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/20.jpg)
ICCD’03
20
Round Robin Scheduling at Dispatch Time
1
2
3
4
n
Int ADDROBC #1
1
2
FU_id offset
Centralized ROB Distributed ROBCs
Int ADDROBC #2
1
2
Int ADDROBC #3
1
2
Int ADDROBC #4
1
2
ADD1 1
instruction
reserved
5
ADD
![Page 21: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/21.jpg)
ICCD’03
21
Round Robin Scheduling at Dispatch Time
1
2
3
4
n
Int ADDROBC #1
1
2
FU_id offset
Centralized ROB Distributed ROBCs
Int ADDROBC #2
1
2
Int ADDROBC #3
1
2
Int ADDROBC #4
1
2
ADD1 1
instruction
reservedSUB
5
![Page 22: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/22.jpg)
ICCD’03
22
Round Robin Scheduling at Dispatch Time
1
2
3
4
n
Int ADDROBC #1
1
2
FU_id offset
Centralized ROB Distributed ROBCs
Int ADDROBC #2
1
2
Int ADDROBC #3
1
2
Int ADDROBC #4
1
2
ADD1 1
instruction
reservedSUB
reserved
5
![Page 23: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/23.jpg)
ICCD’03
23
Round Robin Scheduling at Dispatch Time
1
2
3
4
n
Int ADDROBC #1
1
2
FU_id offset
Centralized ROB Distributed ROBCs
Int ADDROBC #2
1
2
Int ADDROBC #3
1
2
Int ADDROBC #4
1
2
ADD1 1
instruction
reserved
reserved
SUB2 1
5
SUB
![Page 24: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/24.jpg)
ICCD’03
24
Round Robin Scheduling at Dispatch Time
1
2
3
4
n
Int ADDROBC #1
1
2
FU_id offset
Centralized ROB Distributed ROBCs
Int ADDROBC #2
1
2
Int ADDROBC #3
1
2
Int ADDROBC #4
1
2
ADD1 1
instruction
reserved
reserved
SUB2 1AND
5
![Page 25: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/25.jpg)
ICCD’03
25
Round Robin Scheduling at Dispatch Time
1
2
3
4
n
Int ADDROBC #1
1
2
FU_id offset
Centralized ROB Distributed ROBCs
Int ADDROBC #2
1
2
Int ADDROBC #3
1
2
Int ADDROBC #4
1
2
ADD1 1
instruction
reserved
reserved
SUB2 1
reserved
AND
5
![Page 26: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/26.jpg)
ICCD’03
26
Round Robin Scheduling at Dispatch Time
1
2
3
4
n
Int ADDROBC #1
1
2
FU_id offset
Centralized ROB Distributed ROBCs
Int ADDROBC #2
1
2
Int ADDROBC #3
1
2
Int ADDROBC #4
1
2
ADD1 1
instruction
reserved
reserved
SUB2 1
reserved
AND13
5
AND
![Page 27: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/27.jpg)
ICCD’03
27
Good News:Avoiding Read Port Conflicts
1
2
3
4
n
1
2
FU_id offset
Centralized ROB Distributed ROBCs
1
2
1
2
1
2
ADD1 1
instruction
reserved
reserved
SUB2 1
1 read port
Tocommitment
3 1 AND
reserved
5
![Page 28: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/28.jpg)
ICCD’03
28
Round Robin Scheduling at Dispatch Time
1
2
3
4
n
FU_id offset
Centralized ROB Distributed ROBCs
1
2
ADD1 1
instruction
SUB2 1
AND13MUL
5
IntMUL/DIVROBC #5
![Page 29: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/29.jpg)
ICCD’03
29
Round Robin Scheduling at Dispatch Time
1
2
3
4
n
FU_id offset
Centralized ROB Distributed ROBCs
2
1
ADD1 1
instruction
SUB2 1
AND13MUL
5
reserved
IntMUL/DIVROBC #5
![Page 30: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/30.jpg)
ICCD’03
30
Round Robin Scheduling at Dispatch Time
1
2
3
4
n
FU_id offset
Centralized ROB Distributed ROBCs
1
2
ADD1 1
instruction
reserved
SUB2 1
AND13
5
5 1 MUL
IntMUL/DIVROBC #5
MUL
![Page 31: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/31.jpg)
ICCD’03
31
Round Robin Scheduling at Dispatch Time
1
2
3
4
n
FU_id offset
Centralized ROB Distributed ROBCs
ADD1 1
instruction
SUB2 1
AND13
DIV5
5 1 MUL1
2reserved
IntMUL/DIVROBC #5
![Page 32: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/32.jpg)
ICCD’03
32
Round Robin Scheduling at Dispatch Time
1
2
3
4
n
FU_id offset
Centralized ROB Distributed ROBCs
ADD1 1
instruction
SUB2 1
AND13
DIV5
5 1 MUL1
2reservedreserved
IntMUL/DIVROBC #5
![Page 33: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/33.jpg)
ICCD’03
33
Round Robin Scheduling at Dispatch Time
1
2
3
4
n
FU_id offset
Centralized ROB Distributed ROBCs
ADD1 1
instruction
SUB2 1
AND13
5
5 1 MUL
5 2 DIV
1
2reservedreserved
IntMUL/DIVROBC #5
DIV
![Page 34: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/34.jpg)
ICCD’03
34
Read Port Conflicts at Commitment
1
2
3
4
n
FU_id offset
Centralized ROB Distributed ROBCs
ADD1 1
instruction
SUB2 1
AND13
5
5 1 MUL
5 2 DIV
1
2reserved
IntMUL/DIVROBC #5
reserved Tocommitment
CONFLICT:If MUL and DIV wantsto commit in the same cycle
1 read port
DIV
![Page 35: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/35.jpg)
ICCD’03
35
Distributed ROB Design 1
ROBC
Writeback1 write port
to write results
![Page 36: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/36.jpg)
ICCD’03
36
Distributed ROB Design 1
ROBC
Writeback1 write port
to write results
Commit1 read port
for instruction commitment
![Page 37: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/37.jpg)
ICCD’03
37
Distributed ROB Design 1: with source read ports
ROBC
Writeback1 write port
to write resultsDispatch/Issue1 read port
to read the source operands
Commit1 read port
for instruction commitment
![Page 38: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/38.jpg)
ICCD’03
38
Experimental Setup: the AccuPower (DATE’02)Compiled
SPEC benchmarks
Datapathspecs
Performance stats
VLSI layoutdata
SPICEdeck
SPICE
MicroarchitecturalSimulator(Rooted in
SimpleScalar)
Energy/PowerEstimator
Power/energystats
SPICE measures ofenergy per transition
Transition counts,Context information
![Page 39: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/39.jpg)
ICCD’03
39
Configuration of the Simulated System
Machine width 4-way
Issue Queue 32 entries
96 entriesReorder Buffer
Load/Store Queue 32 entries
Simulated the execution of SPEC2000 benchmarks
![Page 40: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/40.jpg)
ICCD’03
40
Peak/Average demands on the number of ROBC entries
ROBC type IntADD#1, #2, #3, #4
IntMUL/DIV
FPADD#1, #2, #3, #4
FPMUL/DIV
Load
SPEC 2000Integer Average 16.9 4.4 4.1 0.1 1.6 0.04 3.8 0.04 28.6 9.3
SPEC 2000FP Average 14.2 4.9 3.2 0.8 3.8 0.6 6.7 1.1 23.5 7.5
SPEC 2000Average 15.7 4.6 3.7 0.4 2.6 0.3 5.0 0.5 26.4 8.5
peak peakpeak peak peak avg.avg.avg.avg.avg.
![Page 41: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/41.jpg)
ICCD’03
41
Peak/Average demands on the number of ROBC entries
ROBC type IntADD#1, #2, #3, #4
IntMUL/DIV
FPADD#1, #2, #3, #4
FPMUL/DIV
Load
SPEC 2000Integer Average 16.9 4.4 4.1 0.1 1.6 0.04 3.8 0.04 28.6 9.3
SPEC 2000FP Average 14.2 4.9 3.2 0.8 3.8 0.6 6.7 1.1 23.5 7.5
SPEC 2000Average 15.7 4.6 3.7 0.4 2.6 0.3 5.0 0.5 26.4 8.5
peak peakpeak peak peak avg.avg.avg.avg.avg.
8 8 8 8 4 4 4 4 4 4 16Number of entriesassigned to eachROBC
![Page 42: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/42.jpg)
ICCD’03
42
Peak/Average demands on the number of ROBC entries
ROBC type IntADD#1, #2, #3, #4
IntMUL/DIV
FPADD#1, #2, #3, #4
FPMUL/DIV
Load
SPEC 2000Integer Average 16.9 4.4 4.1 0.1 1.6 0.04 3.8 0.04 28.6 9.3
SPEC 2000FP Average 14.2 4.9 3.2 0.8 3.8 0.6 6.7 1.1 23.5 7.5
SPEC 2000Average 15.7 4.6 3.7 0.4 2.6 0.3 5.0 0.5 26.4 8.5
peak peakpeak peak peak avg.avg.avg.avg.avg.
8 8 8 8 4 4 4 4 4 4 16+ + + + + + + + + + = 72entry
8_4_4_4_16 configuration
Number of entriesassigned to eachROBC
![Page 43: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/43.jpg)
ICCD’03
43
Percentage of cycles when dispatch blocks for 8_4_4_4_16
ROBC type IntADD#1, #2, #3, #4
IntMUL/DIV
FPADD#1, #2, #3, #4
FPMUL/DIV
Load
SPEC 2000Integer Average 0.9 0.1 0 0 5.2
SPEC 2000FP Average 1.5 1.0 0.1 0.8 1.9
SPEC 2000Average 1.2 0.5 0 0.4 3.8
Average IPC drop% with 8_4_4_4_16 configuration = 4.8%
![Page 44: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/44.jpg)
ICCD’03
44
Percentage of cycles when dispatch blocks for 8_4_4_4_16
ROBC type IntADD#1, #2, #3, #4
IntMUL/DIV
FPADD#1, #2, #3, #4
FPMUL/DIV
Load
SPEC 2000Integer Average 0.9 0.1 0 0 5.2
SPEC 2000FP Average 1.5 1.0 0.1 0.8 1.9
SPEC 2000Average 1.2 0.5 0 0.4 3.8
8 8 8 8 4 4 4 4 4 4 16+ + + + + + + + + + = 72entry
Number of entriesassigned to eachROBC
![Page 45: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/45.jpg)
ICCD’03
45
Reducing performance penalty: 12_6_4_6_20 Configuration
ROBC type IntADD#1, #2, #3, #4
IntMUL/DIV
FPADD#1, #2, #3, #4
FPMUL/DIV
Load
SPEC 2000Integer Average 0.9 0.1 0 0 5.2
SPEC 2000FP Average 1.5 1.0 0.1 0.8 1.9
SPEC 2000Average 1.2 0.5 0 0.4 3.8
12 12 12 12 6 4 4 4 4 6 20+ + + + + + + + + + = 96entry
12_6_4_6_20 configuration
Number of entriesassigned to eachROBC
![Page 46: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/46.jpg)
ICCD’03
46
0
1
2
3
Base, 2-cycle RO B access and full bypass 2 read ports, 12_6_4_6_20
Performance Results for 12_6_4_6_20 Configuration
0
1
2
3
gap gcc gzip parser perl twolf Int Avg.vortex vpr
applu art mesa mgrid swim wupwise FP Avg.
IPC
Average IPC drop% with 12_6_4_6_20 configuration = 2.4%
![Page 47: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/47.jpg)
ICCD’03
47
Distributed ROB Design 1: with source read ports
ROBC
Writeback1 write port
to write resultsDispatch/Issue1 read port
to read the source operands
Commit1 read port
for instruction commitment
![Page 48: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/48.jpg)
ICCD’03
48
Eliminating All Source Read Ports
ROBC
Writeback1 write port
to write resultsDispatch/Issue1 read port
to read the source operands
Commit1 read port
for instruction commitment
![Page 49: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/49.jpg)
ICCD’03
49
Eliminating All Source Read Ports
ROBC
Writeback1 write port
to write results
Commit1 read port
for instruction commitment
![Page 50: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/50.jpg)
ICCD’03
50
Where are the Source Values Coming From?
IQ
FunctionUnitsInstruction Issue
F1 D1
FU1
FU2
FUm
ARF
Result/status forwarding buses
EX
Instruction dispatch
Architectural Register File
F2
Fetch Decode/Dispatch
D2
ROB
12
3
![Page 51: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/51.jpg)
ICCD’03
51
Where are the Source Values Coming From ?
0%
20%
40%
60%
80%
100%
Forwarding ARF ROB
96-entry ROB, 4-way processorSPEC2K Benchmarks
62% 32%32% 6%
![Page 52: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/52.jpg)
ICCD’03
52
How Efficiently are the Ports Used ?
ROB
WritebackW write portsto write results
Dispatch/Issue2W read ports
to read the source operands
Decode/DispatchW write portsto setup entries
CommitW read portsfor instruction commitment
6%
![Page 53: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/53.jpg)
ICCD’03
53
Our Solution: Elimination of Read Ports
IQ
FunctionUnitsInstruction Issue
F1 D1
FU1
FU2
FUm
ARF
Result/status forwarding buses
EX
Instruction dispatch
Architectural Register File
F2
Fetch Decode/Dispatch
D2
ROB
12
3
![Page 54: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/54.jpg)
ICCD’03
54
Our Solution: Elimination of Read Ports
IQ
FunctionUnitsInstruction Issue
F1 D1
FU1
FU2
FUm
ARF
Result/status forwarding buses
EX
Instruction dispatch
Architectural Register File
F2
Fetch Decode/Dispatch
D2
ROB
12
3
![Page 55: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/55.jpg)
ICCD’03
55
Our Solution: Elimination of Read Ports
IQ
FunctionUnitsInstruction Issue
F1 D1
FU1
FU2
FUm
ARF
Result/status forwarding buses
EX
Instruction dispatch
Architectural Register File
F2
Fetch Decode/Dispatch
D2
1
3
ROB
![Page 56: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/56.jpg)
ICCD’03
56
Distributed Reorder Buffer Scheme
IQ
FunctionUnitsInstruction Issue
F1 D1
FU1
FU2
FUm
ARF
Result/status forwarding buses
EX
Instruction dispatch
Architectural Register File
F2
Fetch Decode/Dispatch
D2
ROBC 1
ROBC 2
ROBC m
ROB
Holds pointers to entries within
ROBCs
ROBCs
![Page 57: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/57.jpg)
ICCD’03
57
Elimination of Source Read Ports
IQ
FunctionUnitsInstruction Issue
F1 D1
FU1
FU2
FUm
ARF
Result/status forwarding buses
EX
Instruction dispatch
Architectural Register File
F2
Fetch Decode/Dispatch
D2
ROBC 1
ROBC 2
ROBC m
ROB
ROBCs
Holds pointers to entries within
ROBCs
![Page 58: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/58.jpg)
ICCD’03
58
Elimination of Source Read Ports
IQ
FunctionUnitsInstruction Issue
F1 D1
FU1
FU2
FUm
ARF
Result/status forwarding buses
EX
Instruction dispatch
Architectural Register File
F2
Fetch Decode/Dispatch
D2
ROBC 1
ROBC 2
ROBC m
ROB
ROBCs
Holds pointers to entries within
ROBCs
![Page 59: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/59.jpg)
ICCD’03
59
Completely Eliminating the Source Read Ports on the ROBCs
– The Problem: Issue of instructions that require a value stored in a ROBC will stall
– Solutions:Forward the value to the waiting instruction at the
time of committing the value: LATE FORWARDING
![Page 60: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/60.jpg)
ICCD’03
60
Late Forwarding: Use the Normal Forwarding Buses!
IQ
FunctionUnitsInstruction Issue
F1 D1
FU1
FU2
FUm
ARF
Result/status forwarding buses
EX
Instruction dispatch
Architectural Register File
F2
Fetch Decode/Dispatch
D2
ROBC 1
ROBC 2
ROBC m
ROB
ROBCs
Holds pointers to entries within
ROBCs
![Page 61: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/61.jpg)
ICCD’03
61
Late Forwarding: Use the Normal Forwarding Buses!
IQ
FunctionUnitsInstruction Issue
F1 D1
FU1
FU2
FUm
ARF
Result/status forwarding buses
EX
Instruction dispatch
Architectural Register File
F2
Fetch Decode/Dispatch
D2
ROBC 1
ROBC 2
ROBC m
ROB
Late Forwarding
ROBCs
Holds pointers to entries within
ROBCs
![Page 62: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/62.jpg)
ICCD’03
62
0
8
16
24
No ROBC source read ports with Late Forwarding
Performance Drop of Simplified ROBC Design
Per
form
ance
Dro
p %
0
8
16
24
32
40
48
9.6%Average IPC Drop:
bzip2 gap gcc gzip mcf parser perl twolf Int Avg.vortex vpr
applu apsi art equake mesa mgrid swim wupwise FP Avg.
37%
17%
![Page 63: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/63.jpg)
ICCD’03
63
IPC Penalty:Source Value Not Accessible within the ROBC
ForwardingLate Forwarding/
Commitment
Lifetime of a Result Value
ResultGeneration
time
Valuewithin ARF
Valuewithin a ROBC
![Page 64: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/64.jpg)
ICCD’03
64
Improving IPC with No Read Ports
– Cache recently generated values in a set of RETENTION LATCHES (RL)
– Retention Latches are SMALL and FASTOnly 8 to 16 latches needed in the setEntire set has 1 or 2 read ports
![Page 65: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/65.jpg)
ICCD’03
65
Adding Retention Latches into the Picture
IQ
FunctionUnitsInstruction Issue
F1 D1
FU1
FU2
FUm
ARF
Result/status forwarding buses
EX
Instruction dispatch
Architectural Register File
F2
Fetch Decode/Dispatch
D2
ROBC 1
ROBC 2
ROBC m
ROB
Late Forwarding
ROBCs
Holds pointers to entries within
ROBCs
![Page 66: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/66.jpg)
ICCD’03
66
Adding Retention Latches into the Picture
IQ
FunctionUnitsInstruction Issue
F1 D1
FU1
FU2
FUm
ARF
Result/status forwarding buses
EX
Instruction dispatch
Architectural Register File
F2
Fetch Decode/Dispatch
D2
ROBC 1
ROBC 2
ROBC m
ROB
Late Forwarding
RETENTION LATCHES
Holds pointers to entries within
ROBCs
![Page 67: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/67.jpg)
ICCD’03
67
Eliminating All Source Read Ports
ROBC
Writeback1 write port
to write results
Commit1 read port
for instruction commitment
![Page 68: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/68.jpg)
ICCD’03
68
Distributed ROB Design 2: with Retention Latches
ROBC
Writeback1 write port
to write results
Commit1 read port
for instruction commitment
Eight,2-ported
FIFORLs
![Page 69: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/69.jpg)
ICCD’03
69
0
1
2
3
Base, 2-cycle RO B access and full bypass 2 read ports, 12_6_4_6_20
Performance Results for 12_6_4_6_20 Configuration
0
1
2
3
gap gcc gzip parser perl twolf Int Avg.vortex vpr
applu art mesa mgrid swim wupwise FP Avg.
IPC
Average IPC drop% with 12_6_4_6_20 configuration = 2.4%
![Page 70: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/70.jpg)
ICCD’03
70
0
1
2
3
gap gcc gzip pars perl twolf vortex vpr
Base, 2-cycle ROB access and full bypassDesign 1: 2 read ports, 12_6_4_6_20Design 2: Eight 2-ported FIFO RLs, 12_6_4_6_20 with 1 read port
Performance Results for 12_6_4_6_20 Configuration
0
1
2
3
gap gcc gzip parser perl twolf Int Avg.vortex vpr
applu art mesa mgrid swim wupwise FP Avg.
IPC
Average IPC drop% with 12_6_4_6_20 configuration = 1.7%
![Page 71: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/71.jpg)
ICCD’03
71
0
1
2
3
gap gcc gzip pars perl twolf vortex vpr
Base, 1-cycle ROB access and full bypassDesign 1: 2 read ports, 12_6_4_6_20Design 2: Eight 2-ported FIFO RLs, 12_6_4_6_20 with 1 read port
Performance Results for 12_6_4_6_20 Configuration
0
1
2
3
gap gcc gzip parser perl twolf Int Avg.vortex vpr
applu art mesa mgrid swim wupwise FP Avg.
IPC
Average IPC drop% with 12_6_4_6_20 configuration = 3.8%
![Page 72: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/72.jpg)
ICCD’03
72
0
10
20
30
40
50
60
Eight 2-ported FIFO latchesDesign 1: 2 read ports, 12_6_4_6_20Design 2: Eight 2-ported FIFO RLs, 12_6_4_6_20 with 1 read port
Power Results for 12_6_4_6_20 Configuration
0
10
20
30
40
50
60
gap gcc gzip parser perl twolf Int Avg.vortex vpr
applu art mesa mgrid swim wupwise FP Avg.
Pow
er S
avin
gs %
Power savings%: 49% 47%23%
![Page 73: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/73.jpg)
ICCD’03
73
0
10
20
30
40
50
60
Eight 2-ported FIFO latchesDesign 1: 2 read ports, 12_6_4_6_20Design 2: Eight 2-ported FIFO RLs, 12_6_4_6_20 with 1 read port
Power Results for 12_6_4_6_20 Configuration(Compared to Baseline case with 64 entry Rename Buffers)
0
10
20
30
40
50
60
gap gcc gzip parser perl twolf Int Avg.vortex vpr
applu art mesa mgrid swim wupwise FP Avg.
Pow
er S
avin
gs %
Power savings%: 39% 37%20%
![Page 74: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/74.jpg)
ICCD’03
74
Summary of Results
– Low performance degradation: 1.7% IPC drop on the average (compared to 2-cycle ROB) 3.8% IPC drop on the average (compared to 1-cycle ROB)
– ROB Power savings: as high as 49% are realized (compared to P6-style datapath: 96
entry ROB) as high as 39% (compared to Rename Buffer design: 96 entry
ROB, 64 entry RB)
![Page 75: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/75.jpg)
ICCD’03
75
Conclusions
– We introduced a conflict-free distributed Reorder Buffer design
– ROB power savings of as high as 49% are realized with only a small (1.7%) performance penalty
– ROB complexity is drastically reduced by Distributing the ROB into multiple banks Reducing the port requirements to no more than 2 ports for
each ROB components
![Page 76: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/76.jpg)
ICCD’03
76
~ Thank You~
![Page 77: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/77.jpg)
ICCD’03
77
Distributed Reorder Buffer Schemes for Low Power *
*supported in part by DARPA through the PAC-C program and NSF
Gurhan Kucuk, Oguz Ergin, Dmitry Ponomarev, Kanad GhoseDepartment of Computer Science
State University of New YorkBinghamton, NY 13902-6000
http://www.cs.binghamton.edu/~lowpower
21st International Conference on Computer Design (ICCD’03), October 14th 2003
![Page 78: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/78.jpg)
ICCD’03
78
Related Work
– Replicated (Kessler, IEEE Micro) and distributed (Canal et.al, HPCA’00 and Farkas et.al, MICRO’97) RFs in a clustered organization
– Multiple Register Banks (Cruz et.al., ISCA’00 & Balasubramonian et.al., MICRO’01)
– Multiple Register Banks with additional pipeline stage to avoid complex arbitration logic (Tseng et.al, ISCA’03
– Multiple Register Banks without write port conflicts (Wallase et.al, PACT’96)
![Page 79: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/79.jpg)
ICCD’03
79
ROB Port Requirements for a W-way CPU
ROB
WritebackW write portsto write results
Dispatch/Issue2W read ports
to read the source operands
Decode/DispatchW write portsto setup entries
CommitW read portsfor instruction commitment
![Page 80: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/80.jpg)
ICCD’03
80
ROB Port Requirements for a W-way CPU
ROB
WritebackW write ports
To write results
Dispatch/Issue2W read ports
to read the source operands
Decode/Dispatch1 W-wide write port
to setup entries
Commit1 W-wide read port
for instruction commitment
![Page 81: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/81.jpg)
ICCD’03
85
Fully Distributed Reorder Buffer Scheme
![Page 82: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/82.jpg)
ICCD’03
86
Fully Distributed Reorder Buffer Scheme
– Distributed ROB Components (ROBCs) are assigned to each Function Unit
No write port conflicts at writeback stage, and minimal read port conflicts at commitment: Negligible performance penalty
Each ROBC can be tailored to the needs of its FU : No over commitment of resources, less complexity
– The FIFO structure that maintains pointers to the ROBCs remains centralized
![Page 83: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/83.jpg)
ICCD’03
87
Fully Distributed Reorder Buffer Scheme
1
2
3
4
n
ROBC #11 1
2
3
1
FU_id offset
ROBC #21
2
3
4
m 1
2 1
ROBC #m1
Centralized ROB Distributed ROBCs
![Page 84: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/84.jpg)
ICCD’03
88
Fully Distributed Reorder Buffer Scheme
1
2
3
4
n
ROBC #11 1
2
3
1
ROBC #21
2
3
4
m 1
2 1
ROBC #m1
Centralized ROB Distributed ROBCs
FU_id offset
![Page 85: ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e92a/html5/thumbnails/85.jpg)
ICCD’03
90
0
10
20
30
40
50
60
Centralized ROB, Eight 2-ported FIFO Retention Latches
Results for the Scheme with Retention Latches
0
10
20
30
40
50
60
gap gcc gzip parser perl twolf Int Avg.vortex vpr
applu art mesa mgrid swim wupwise FP Avg.
Pow
er S
avin
gs %
Power savings%: 23%