nc state university transparent control independence (tci) ahmed s. al-zawawi vimal k. reddy eric...
TRANSCRIPT
NC STATE UNIVERSITY
Transparent Control Independence (TCI)
Ahmed S. Al-ZawawiVimal K. ReddyEric Rotenberg
Haitham H. Akkary*
*Dept. of Electrical & Computer Engineering*North Carolina State University, Raleigh, NC
*Digital Enterprise Group*Intel Corporation, Hillsboro, OR
NC STATE UNIVERSITY
Effect of branch mispredictions
Branch misprediction rate of 5%-10% still a problem Each misprediction squash’s 100s of inst. Reduces performance: limits window size Increases power: useless speculative work© 2007 Ahmed S. Al-Zawawi ISCA 34 2
NC STATE UNIVERSITY
© 2007 Ahmed S. Al-Zawawi ISCA 34 3
Control independence basics
branch
R5
R5
R5
reconv.
control-independentdata-dependent
(CIDD)
control-dependent (CD)
control-independent (CI)
control-independentdata-independent
(CIDI)
NC STATE UNIVERSITY
© 2007 Ahmed S. Al-Zawawi ISCA 34 4
Control independence basics
branch
R5
R5
R5
reconv.
control-independentdata-dependent
(CIDD)
control-dependent (CD)
control-independentdata-independent
(CIDI)control-independent
(CI)
NC STATE UNIVERSITY
© 2007 Ahmed S. Al-Zawawi ISCA 34 5
Control independence basics
branch
R5
R5
R5
reconv.
control-independentdata-dependent
(CIDD)
control-dependent (CD)
control-independentdata-independent
(CIDI)control-independent
(CI)
NC STATE UNIVERSITY
© 2007 Ahmed S. Al-Zawawi ISCA 34 6
Control independence basics
control-independent (CI)
branch
R5
R5
R5
reconv.
control-independentdata-dependent
(CIDD)
control-dependent (CD)
control-independentdata-independent
(CIDI)
NC STATE UNIVERSITY
© 2007 Ahmed S. Al-Zawawi ISCA 34 7
Four steps for exploiting CI
NC STATE UNIVERSITY
© 2007 Ahmed S. Al-Zawawi ISCA 34 8
Four steps for exploiting CI
1. Identify reconv. point
NC STATE UNIVERSITY
© 2007 Ahmed S. Al-Zawawi ISCA 34 9
Four steps for exploiting CI
1. Identify reconv. point
2. Remove/Insert CD inst.
NC STATE UNIVERSITY
© 2007 Ahmed S. Al-Zawawi ISCA 34 10
Four steps for exploiting CI
1. Identify reconv. point
2. Remove/Insert CD inst.
3. Identify CIDD inst.
NC STATE UNIVERSITY
© 2007 Ahmed S. Al-Zawawi ISCA 34 11
Four steps for exploiting CI
1. Identify reconv. point
2. Remove/Insert CD inst.
3. Identify CIDD inst.
4. Repair CIDD inst.a) Fix data dependencies
b) Re-execute CIDD inst.
NC STATE UNIVERSITY
CIDI-supplied source value
© 2007 Ahmed S. Al-Zawawi ISCA 34 12
Insert correct CD instructions in middle of the window: Repair program order
Re-execute CIDD instructions:Re-reference values from CIDI instructions
Squash wrong CD instructionsIdentify wrong CD inst. and CIDD inst.
CIDD instructions
Wrong CD instructions
Conventional CI misprediction recovery
Misprediction
R
CI inst.CD inst.
Instruction WindowBr
NC STATE UNIVERSITY
2. Dependence order between CIDD & CIDI inst.:
Re-executing CIDD instructions requires preserving referenced CIDI instructions
1. Program order between CD & CI inst:
Fine-grain retirement using ROB requires reordering the correct CD inst. with the CI inst.
© 2007 Ahmed S. Al-Zawawi ISCA 34 13
Conventional CI limitations
Fully decouple CIDI instructions
from CD & CIDD instructions
Goal of selective misprediction recovery:
NC STATE UNIVERSITY
© 2007 Ahmed S. Al-Zawawi ISCA 34 14
No need to identify wrong CD and CIDD instructionsInsert correct CD instructions like any new instructions
Insert duplicate CIDD instructions like any new instructions
Repair program state using self-sufficient recovery program
while relaxing program order
TCI misprediction recovery
Misprediction
R
CI inst.CD inst.
Correct CD inst.Duplicate CIDD inst.
Recoveryprogram
Instruction WindowBr
NC STATE UNIVERSITY
CIDI-suppliedsource value
© 2007 Ahmed S. Al-Zawawi ISCA 34 15
Leverage checkpointed source values to mimic the effect of program order
Exploit coarse-grain checkpoint-based retirement to relax ordering constraints
TCI misprediction recovery
Misprediction
R
Recoveryprogram C
heckpoint 2
branchcheckpoint
Duplicate CIDD inst.Correct CD inst.
In-order retirement is not possible wheninstructions are out of program order
Leverage branch checkpoint for correct CD instructions
CIDD instructions
Checkpoint-based retirement enablesaggressive register reclamation (e.g., CPR):Completed instructions free their resources
Instruction WindowBr
Checkpoint 1
Checkpoint CIDI-supplied source values
NC STATE UNIVERSITY
© 2007 Ahmed S. Al-Zawawi ISCA 34 16
Transparent Control Independence TCI repairs program state, not program order TCI pipeline is recovery-free
Transparent recovery by fetching additional instructions with checkpointed source values
TCI pipeline is free-flowing Leverage conventional speculation to execute
correct and incorrect instructions quickly and efficiently
Completed instructions free their resources
NC STATE UNIVERSITY
© 2007 Ahmed S. Al-Zawawi ISCA 34 17
TCI microarchitecture
Add repair rename map Add selective re-execution buffer (RXB)
correctCD
3
CI2
1 predicted CD
I$ Spec. Map
Checkpoints
Repair MapRXB
IQ RF FU
to RXB(CIDD instructions)
to RXB(CIDD source values)
draininstructions
4 re-execute CIDD
NC STATE UNIVERSITY
© 2007 Ahmed S. Al-Zawawi ISCA 34 18
Predict the branch
Instructions execute and leave the pipeline when done
branch
R5
R5
R5
reconv.
CD
CI
CIDD
predict actual
correctCD
3
CI2I$ Spec.
Map
Checkpoints
Repair MapRXB
IQ RF
1 predicted CD
FU
to RXB(CIDD instructions)
to RXB(CIDD source values)
draininstructions
4 re-execute CIDD
NC STATE UNIVERSITY
© 2007 Ahmed S. Al-Zawawi ISCA 34 19
Construct recovery program
Copy duplicate of CIDD inst. with their source values
into RXB
branch
R5
R5
R5
reconv.
CD
CI
CIDD
predict actualre-execute CIDD4
3correct
CD
predicted CD1
I$ Spec. Map
Checkpoints
Repair MapRXB
IQ RF FU
to RXB(CIDD instructions)
to RXB(CIDD source values)
draininstructions
2 CI
NC STATE UNIVERSITY
© 2007 Ahmed S. Al-Zawawi ISCA 34 20
Insert correct CD instructions
Load branch checkpoint into repair rename map, then
fetch correct CD inst.
branch
R5
R5
R5
reconv.
CD
CI
CIDD
predict actualre-execute CIDD4
CI2
predicted CD1
I$ Spec. Map
Checkpoints
Repair MapRXB
IQ RF FU
to RXB(CIDD instructions)
to RXB(CIDD source values)
draininstructions
3correct
CD
NC STATE UNIVERSITY
© 2007 Ahmed S. Al-Zawawi ISCA 34 21
Repair & re-execute CIDD instructions
Inject duplicate CIDD inst.with their checkpointed
source values
branch
R5
R5
R5
reconv.
CD
CI
CIDD
predict actual
correctCD
3
CI2
predicted CD1
I$ Spec. Map
Checkpoints
Repair MapRXB
IQ RF FU
to RXB(CIDD instructions)
to RXB(CIDD source values)
draininstructions
4 re-execute CIDD
NC STATE UNIVERSITY
© 2007 Ahmed S. Al-Zawawi ISCA 34 22
Merge repair & spec. rename maps
Copy corrected register mappings from repair map to spec. map
branch
R5
R5
R5
reconv.
CD
CI
CIDD
predict actualre-execute CIDD4
CI2
predicted CD1
I$ Spec. Map
Checkpoints
Repair MapRXB
IQ RF FU
to RXB(CIDD instructions)
to RXB(CIDD source values)
draininstructions
5Mergemap
NC STATE UNIVERSITY
1. Identifying CIDD instructions: Control-flow stack (CFS) detects nested reconv. points Influenced register set (IRS) and branch-sets
2. RXB reconstruction: CIDD inst. of multiple branches are co-mingled A misprediction may require repairing RXB
3. Renaming partial programs: Re-rename recovery program despite its CIDI gaps
4. Merging repair/speculative rename maps
© 2007 Ahmed S. Al-Zawawi ISCA 34 23
TCI implementation details
NC STATE UNIVERSITY
To RXB
To IQ Temporary Buffer (TB)ID
I$
B2
pred.actual
CI
xyz
181920
R21617
14
111213
CD
B1
pred.actual
CI
456
R189
14
111213
CD
21
© 2007 Ahmed S. Al-Zawawi ISCA 34 24
Example: construct the RXB
B2 R2
9 x 16 18 20
RXBTail
B1 R1
B1 & B2 are branches R1 & R2 are reconvergent points Rectangular inst. are CIDD on B1 Oval inst. are CIDD on B2
Selective Re-execution Buffer (RXB)
NC STATE UNIVERSITY
B2
pred.actual
CI
xyz
181920
R21617
14
111213
CD
B2
pred. act.
CI
xyz
181920
R21617
14
111213
CD
To RXB
To IQ Temporary Buffer (TB)ID
I$
RXBTail
12
© 2007 Ahmed S. Al-Zawawi ISCA 34 25
Dispatch 11 Don’t insert 11 into the RXB:
CIDI w.r.t. B1 & B2
Fetch correct CD: 11 and 12 Meanwhile pre-read 16 to Temp Buffer
Rollback RXB tail, like complete squash Initiate RXB pre-read pointer Start fetching correct CD
Dispatch 12 Insert 12 into the RXB:
CIDD w.r.t. B1
12
12
Example: reconstructing the RXB
B2 R2 RXB Pre-read
9 x 16 18 20
11,1211
RXBTail
Objective of this example: Inject recovery program for B2 Reconstruct RXB for B1
B1 R1
NC STATE UNIVERSITY
To RXB
To IQ Temporary Buffer (TB)ID
I$
© 2007 Ahmed S. Al-Zawawi ISCA 34 26
Dispatch 13 Don’t insert 13 into the RXB:
CIDI w.r.t. B1 & B2
Reconvergence point detected Correct CD complete
Dispatch 14 Insert 14 into the RXB:
CIDD w.r.t. B1
Fetch correct CD: 13 and 14 Meanwhile pre-read 18 to Temp Buffer
1414
14
B2 R2 RXB Pre-read
9
16
18 20
13,1413
RXBTail
Example: reconstructing the RXB
12
z
xyCD
pred.B2
act.
CI181920
R21617
14
111213
B1 R1
NC STATE UNIVERSITY
To RXB
To IQ Temporary Buffer (TB)ID
I$
© 2007 Ahmed S. Al-Zawawi ISCA 34 27
Dispatch 18:CIDD w.r.t. B2
Don’t insert 18 into the RXB:Not CIDD w.r.t. B1
Dispatch 20:CIDD w.r.t. B2
Insert 20 into the RXB:CIDD w.r.t. B1
B2 recovery program injection complete B1 recovery program is maintained and
compressed
Don’t dispatch 16:Not CIDD w.r.t. B2
Insert 16 into the RXB:CIDD w.r.t. B1
Begin renaming CIDD instructions from Temp Buffer
Meanwhile pre-read 20 into Temp Buffer
20
RXBTail
20 20
Example: reconstructing the RXB
B2 R2 RXB Pre-read
9
16 18
20
161820
12 14
z
xyCD
pred.B2
act.
CI181920
R21617
14
111213
B1 R1
NC STATE UNIVERSITY
© 2007 Ahmed S. Al-Zawawi ISCA 34 28
Simulation methodology
Baseline: Checkpoint-based superscalar processor Issue width: 4 Perceptron branch predictor Register file: 256 registers Branch checkpoints: 16 Load store queue: 512 entries L1 I & L1 D: 64KB 4-way (Hit: 1 cycle) L2: 2MB 8-way (Hit:10 cycles, Miss: 200 cycles)
Benchmarks:11 SPEC2000 INT + 4 SPEC95 INTSimPoint: 10M inst. warm-up + 100M inst. simulated
NC STATE UNIVERSITY
© 2007 Ahmed S. Al-Zawawi ISCA 34 29
CIDD inst. re-renaming models Seq CIDD (TCI):
Only CIDD inst. are re-renamed and re-executed Seq CI: [Akkary et al.] [Chou et al.] [Rotenberg et al.]
All CI inst. are re-renamed, but only CIDD inst. re-execute Proxy: [Cher et al.] [Gandhi et al.]
Uses proxy move instructions to insulate CIDD inst. from source name changes
Only proxies are re-renamed Both proxies and CIDD inst. re-execute by holding issue
queue entries
All models have relaxed order through checkpoint-based substrate
NC STATE UNIVERSITY
TCI maximum %IPC improvement is 61%(64%)Proxy average %IPC improvement is 6%(11%)© 2007 Ahmed S. Al-Zawawi ISCA 34 30
-15%
-5%
5%
15%
25%
35%
45%
55%
65%
bzip
com
press
crafty gap gcc go
gzipijp
eg lim
cf
parser
perl
twolf
vorte
xvp
r
% I
PC
im
pro
vem
ent
ove
r b
ase
Proxy Seq CI TCI
Results for 32 & 64 entries issue queue
Proxy can degrade performanceSeq CI can degrade performanceTCI average %IPC improvement is 16%(16%)
NC STATE UNIVERSITY
Proxy is bandwidth efficient, but resource inefficient© 2007 Ahmed S. Al-Zawawi ISCA 34 31
Varying the issue queue size
TCI is both bandwidth and resource efficientSeq CI is bandwidth inefficient, but resource efficient
NC STATE UNIVERSITY
© 2007 Ahmed S. Al-Zawawi ISCA 34 32
Varying the RXB size
0.0
0.5
1.0
1.5
2.0
2.5
32 64 128 256 512
RXB Size
Ha
rmo
nic
me
an
IPC
Seq CIDD (TCI)Seq CIBase
In Seq CI, the RXB limits the window sizeTCI overcomes problem by only buffering CIDD inst.
NC STATE UNIVERSITY
Conclusion
Recover program state, not program order Transparent branch misprediction recovery
using fully decoupled recovery program Resource efficient
All instructions execute, drain, and free resources quickly based on conventional speculation
Bandwidth efficient TCI only re-sequences CIDD instructions
© 2007 Ahmed S. Al-Zawawi ISCA 34 33
NC STATE UNIVERSITY
Questions