ilp: control flowbojnordi/classes/6810/f19/slides/09-ilp.pdf¤only one entry and one exit point per...
TRANSCRIPT
![Page 1: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:](https://reader033.vdocuments.us/reader033/viewer/2022053019/5f2505a5801d4030476e7631/html5/thumbnails/1.jpg)
ILP: CONTROL FLOW
CS/ECE 6810: Computer Architecture
Mahdi Nazm Bojnordi
Assistant Professor
School of Computing
University of Utah
![Page 2: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:](https://reader033.vdocuments.us/reader033/viewer/2022053019/5f2505a5801d4030476e7631/html5/thumbnails/2.jpg)
Overview
¨ Announcement¤ Homework 2 is due tonight (11:59PM)
¨ This lecture¤ Performance bottleneck¤ Program flow¤ Branch instructions¤ Branch prediction
![Page 3: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:](https://reader033.vdocuments.us/reader033/viewer/2022053019/5f2505a5801d4030476e7631/html5/thumbnails/3.jpg)
Performance Bottleneck
¨ Key performance limitation¤ Number of instructions fetched per second is limited
![Page 4: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:](https://reader033.vdocuments.us/reader033/viewer/2022053019/5f2505a5801d4030476e7631/html5/thumbnails/4.jpg)
Performance Bottleneck
¨ Key performance limitation¤ Number of instructions fetched per second is limited
¨ How to increase fetch performance?
![Page 5: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:](https://reader033.vdocuments.us/reader033/viewer/2022053019/5f2505a5801d4030476e7631/html5/thumbnails/5.jpg)
Performance Bottleneck
¨ Key performance limitation¤ Number of instructions fetched per second is limited
¨ How to increase fetch performance?¤ Wider fetch (multiple pipelines)
![Page 6: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:](https://reader033.vdocuments.us/reader033/viewer/2022053019/5f2505a5801d4030476e7631/html5/thumbnails/6.jpg)
Performance Bottleneck
¨ Key performance limitation¤ Number of instructions fetched per second is limited
¨ How to increase fetch performance?¤ Wider fetch (multiple pipelines)¤ Deeper fetch (multiple stages)
![Page 7: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:](https://reader033.vdocuments.us/reader033/viewer/2022053019/5f2505a5801d4030476e7631/html5/thumbnails/7.jpg)
Performance Bottleneck
¨ Key performance limitation¤ Number of instructions fetched per second is limited
How to handle branches?
¨ How to increase fetch performance?¤ Wider fetch (multiple pipelines)¤ Deeper fetch (multiple stages)
![Page 8: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:](https://reader033.vdocuments.us/reader033/viewer/2022053019/5f2505a5801d4030476e7631/html5/thumbnails/8.jpg)
Impact of Branches
¨ Example C code¤ No structural hazards¤ What is fetch rate (IPS)?
do {sum = sum + i;i = i – 1;
} while(i != j);
![Page 9: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:](https://reader033.vdocuments.us/reader033/viewer/2022053019/5f2505a5801d4030476e7631/html5/thumbnails/9.jpg)
Impact of Branches
¨ Example C code¤ No structural hazards¤ What is fetch rate (IPS)?
do {sum = sum + i;i = i – 1;
} while(i != j);
Loop: ADD R1, R1, R2ADDI R2, R2, #-1BNEQ R2, R0, Loopstall
Assembly code:¨ Five-stage pipeline
¤ Cycle time = 10ns
Fetch Decode Execute Memory Writeback
![Page 10: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:](https://reader033.vdocuments.us/reader033/viewer/2022053019/5f2505a5801d4030476e7631/html5/thumbnails/10.jpg)
Impact of Branches
¨ Example C code¤ No structural hazards¤ What is fetch rate (IPS)?
do {sum = sum + i;i = i – 1;
} while(i != j);
Loop: ADD R1, R1, R2ADDI R2, R2, #-1BNEQ R2, R0, Loopstallstallstall
Assembly code:¨ Ten-stage pipeline
¤ Cycle time = 5ns
Fetch Decode Execute Memory Writeback
![Page 11: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:](https://reader033.vdocuments.us/reader033/viewer/2022053019/5f2505a5801d4030476e7631/html5/thumbnails/11.jpg)
Program Flow
¨ A program contains basic blocks¤ Only one entry and one exit point per basic block
…branch
…
branch
…jump
![Page 12: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:](https://reader033.vdocuments.us/reader033/viewer/2022053019/5f2505a5801d4030476e7631/html5/thumbnails/12.jpg)
Program Flow
¨ A program contains basic blocks¤ Only one entry and one exit point per basic block
…branch
…
branch
…jump
¨ Branches¤ Conditional vs. unconditional
n How to check conditionsn Jumps, calls, and returns
¤ Target addressn Absolute addressn Relative to the program counter
![Page 13: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:](https://reader033.vdocuments.us/reader033/viewer/2022053019/5f2505a5801d4030476e7631/html5/thumbnails/13.jpg)
Branch Instructions
¨ Branch penalty due to unknown outcome¤ Direction and target
¨ How to reduce penalty
Inst.Memory
PC +
4
NPC
Inst
ruct
ion
target
clk
clkdirection
![Page 14: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:](https://reader033.vdocuments.us/reader033/viewer/2022053019/5f2505a5801d4030476e7631/html5/thumbnails/14.jpg)
Branch Instructions
¨ Branch penalty due to unknown outcome¤ Direction and target
¨ How to reduce penalty
Can we predict what instruction to be fetched? Inst.
Memory
PC +
4
NPC
Inst
ruct
ion
target
clk
clkdirection
![Page 15: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:](https://reader033.vdocuments.us/reader033/viewer/2022053019/5f2505a5801d4030476e7631/html5/thumbnails/15.jpg)
Branch Prediction
¨ How to predict the outcome of a branch¤ Profiling the entire program¤ Predict based on common cases
![Page 16: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:](https://reader033.vdocuments.us/reader033/viewer/2022053019/5f2505a5801d4030476e7631/html5/thumbnails/16.jpg)
Branch Prediction
¨ How to predict the outcome of a branch¤ Profiling the entire program¤ Predict based on common cases
i = 10000;do {
r = i%4;if(r != 0) {
sum = sum + i;}i = i – 1;
} while(i != 0);
Example C/C++ code:
How many branches?
![Page 17: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:](https://reader033.vdocuments.us/reader033/viewer/2022053019/5f2505a5801d4030476e7631/html5/thumbnails/17.jpg)
Branch Prediction
¨ How to predict the outcome of a branch¤ Profiling the entire program¤ Predict based on common cases
i = 10000;do {
r = i%4;if(r != 0) {
sum = sum + i;}i = i – 1;
} while(i != 0);
Example C/C++ code:
How many branches?
=>
=>
![Page 18: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:](https://reader033.vdocuments.us/reader033/viewer/2022053019/5f2505a5801d4030476e7631/html5/thumbnails/18.jpg)
Branch Prediction
¨ How to predict the outcome of a branch¤ Profiling the entire program¤ Predict based on common cases
ADDI R1, R0, #10000do:
ANDI R2, R1, #3BEQ R2, R0, skpADD R3, R3, R1
skp: ADDI R1, R1, #-1BNEQ R1, R0, do
Assembly code:
![Page 19: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:](https://reader033.vdocuments.us/reader033/viewer/2022053019/5f2505a5801d4030476e7631/html5/thumbnails/19.jpg)
Branch Prediction
¨ How to predict the outcome of a branch¤ Profiling the entire program¤ Predict based on common cases
ADDI R1, R0, #10000do:
ANDI R2, R1, #3BEQ R2, R0, skpADD R3, R3, R1
skp: ADDI R1, R1, #-1BNEQ R1, R0, do
branch-1
branch-2
TAKEN NOT-TAKEN
Assembly code:
![Page 20: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:](https://reader033.vdocuments.us/reader033/viewer/2022053019/5f2505a5801d4030476e7631/html5/thumbnails/20.jpg)
Branch Prediction
¨ How to predict the outcome of a branch¤ Profiling the entire program¤ Predict based on common cases
ADDI R1, R0, #10000do:
ANDI R2, R1, #3BEQ R2, R0, skpADD R3, R3, R1
skp: ADDI R1, R1, #-1BNEQ R1, R0, do
branch-1
branch-2
TAKEN NOT-TAKEN
Assembly code:
9999
2500 7500
1
![Page 21: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:](https://reader033.vdocuments.us/reader033/viewer/2022053019/5f2505a5801d4030476e7631/html5/thumbnails/21.jpg)
Branch Prediction
¨ The goal of branch prediction¤ To avoid stall cycles in fetch stage
¨ Types¤ Static prediction (based on direction or profile)
n Always not-takenn Target = next PC
n Always takenn Target = unknown
¤ Dynamic predictionn Special hardware using PC
![Page 22: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:](https://reader033.vdocuments.us/reader033/viewer/2022053019/5f2505a5801d4030476e7631/html5/thumbnails/22.jpg)
Branch Prediction
¨ The goal of branch prediction¤ To avoid stall cycles in fetch stage
¨ Types¤ Static prediction (based on direction or profile)
n Always not-takenn Target = next PC
n Always takenn Target = unknown
¤ Dynamic predictionn Special hardware using PC
Which ones are influenceda. Performanceb. Energyc. Power
![Page 23: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:](https://reader033.vdocuments.us/reader033/viewer/2022053019/5f2505a5801d4030476e7631/html5/thumbnails/23.jpg)
Branch Prediction/Misprediction
¨ Prediction accuracy?¤ A: always not-taken
¤ B: always taken
i = 100;do {
sum = sum + i;i = i – 1;
} while(i != 0);
![Page 24: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:](https://reader033.vdocuments.us/reader033/viewer/2022053019/5f2505a5801d4030476e7631/html5/thumbnails/24.jpg)
Branch Prediction/Misprediction
¨ Prediction accuracy?¤ A: always not-taken
¤ B: always taken
i = 100;do {
sum = sum + i;i = i – 1;
} while(i != 0);0.01
0.99
![Page 25: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:](https://reader033.vdocuments.us/reader033/viewer/2022053019/5f2505a5801d4030476e7631/html5/thumbnails/25.jpg)
Problem
¨ Compute IPC of a scalar processor when there are¤ no data/structural hazards, only control hazards,¤ every 5th instruction is a branch, and¤ 90% branch prediction accuracy
![Page 26: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:](https://reader033.vdocuments.us/reader033/viewer/2022053019/5f2505a5801d4030476e7631/html5/thumbnails/26.jpg)
Problem
¨ Compute IPC of a scalar processor when there are¤ no data/structural hazards, only control hazards,¤ every 5th instruction is a branch, and¤ 90% branch prediction accuracy
¨ IPC = 1/ (1 + stalls per instruction)¨ = 1/(1 + 0.2x0.1x1) = 0.98
![Page 27: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:](https://reader033.vdocuments.us/reader033/viewer/2022053019/5f2505a5801d4030476e7631/html5/thumbnails/27.jpg)
Dynamic Branch Prediction
¨ Hardware unit capable of learning at runtime¤ 1. Prediction logic
n Direction (taken or not-taken)n Target address (where to fetch next)
¤ 2. Outcome validation and trainingn Outcome is computed regardless of prediction
¤ 3. Recovery from mispredictionn Nullify the effect of instructions on the wrong path
![Page 28: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:](https://reader033.vdocuments.us/reader033/viewer/2022053019/5f2505a5801d4030476e7631/html5/thumbnails/28.jpg)
Simple Dynamic Predictors
¨ One-bit branch predictor¤ Keep track of and use the outcome of last executed
branch
¨ Prediction accuracy
while(1) {for(i=0; i<10; i++) {}for(j=0; j<20; j++) {}
}
branch-1
branch-2
![Page 29: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:](https://reader033.vdocuments.us/reader033/viewer/2022053019/5f2505a5801d4030476e7631/html5/thumbnails/29.jpg)
Simple Dynamic Predictors
¨ One-bit branch predictor¤ Keep track of and use the outcome of last executed
branch
¨ Prediction accuracy
while(1) {for(i=0; i<10; i++) {}for(j=0; j<20; j++) {}
}
branch-1
branch-2
while:ADDI R3, R0, #10JMP chk1
for1: …chk1: BNQ R1, R3, for1
ADDI R3, R0, #20JMP chk2
for2: …chk2: BNQ R2, R3, for2
JMP while** Loop implementation suggested by an student **
![Page 30: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:](https://reader033.vdocuments.us/reader033/viewer/2022053019/5f2505a5801d4030476e7631/html5/thumbnails/30.jpg)
Simple Dynamic Predictors
¨ One-bit branch predictor¤ Keep track of and use the outcome of last executed
branch
¨ Prediction accuracy
while(1) {for(i=0; i<10; i++) {}for(j=0; j<20; j++) {}
}
branch-1
branch-2
N T
taken
takennot-taken
not-taken
![Page 31: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:](https://reader033.vdocuments.us/reader033/viewer/2022053019/5f2505a5801d4030476e7631/html5/thumbnails/31.jpg)
Simple Dynamic Predictors
¨ One-bit branch predictor¤ Keep track of and use the outcome of last executed
branch
¨ Prediction accuracy
while(1) {for(i=0; i<10; i++) {}for(j=0; j<20; j++) {}
}
branch-1
branch-2
N T
taken
takennot-taken
not-taken
• A single predictor shared by multiple branches
• Two mispredictions for loops (1 entry and 1 exit)
![Page 32: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:](https://reader033.vdocuments.us/reader033/viewer/2022053019/5f2505a5801d4030476e7631/html5/thumbnails/32.jpg)
Bimodal Branch Predictors
¨ One-bit branch predictor¤ Keep track of and use the outcome of last executed
branch
while(1) {for(i=0; i<10; i++) {}for(j=0; j<20; j++) {}
}
branch-1
branch-2
N T
taken
takennot-taken
not-taken
Accuracy = 26/30 = 0.86
¨ Shared predictor¨ Two mispredictions per loop
How to improve?
![Page 33: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:](https://reader033.vdocuments.us/reader033/viewer/2022053019/5f2505a5801d4030476e7631/html5/thumbnails/33.jpg)
Bimodal Branch Predictors
¨ Two-bit branch predictor¤ Increment if taken¤ Decrement if untaken
while(1) {for(i=0; i<10; i++) {}for(j=0; j<20; j++) {}
}
branch-1
branch-2
![Page 34: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:](https://reader033.vdocuments.us/reader033/viewer/2022053019/5f2505a5801d4030476e7631/html5/thumbnails/34.jpg)
Bimodal Branch Predictors
¨ Two-bit branch predictor¤ Increment if taken¤ Decrement if untaken
while(1) {for(i=0; i<10; i++) {}for(j=0; j<20; j++) {}
}
branch-1
branch-2
01 10
taken
takennot-taken
not-taken
00 11
![Page 35: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:](https://reader033.vdocuments.us/reader033/viewer/2022053019/5f2505a5801d4030476e7631/html5/thumbnails/35.jpg)
Bimodal Branch Predictors
¨ Two-bit branch predictor¤ Increment if taken¤ Decrement if untaken
while(1) {for(i=0; i<10; i++) {}for(j=0; j<20; j++) {}
}
branch-1
branch-2
01 10
taken
takennot-taken
not-taken
00 11• One misprediction on loop exit
• Accuracy = 28/30 = 0.93
![Page 36: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:](https://reader033.vdocuments.us/reader033/viewer/2022053019/5f2505a5801d4030476e7631/html5/thumbnails/36.jpg)
Bimodal Branch Predictors
¨ Two-bit branch predictor¤ Increment if taken¤ Decrement if untaken
while(1) {for(i=0; i<10; i++) {}for(j=0; j<20; j++) {}
}
branch-1
branch-2
01 10
taken
takennot-taken
not-taken
00 11• One misprediction on loop exit
• Accuracy = 28/30 = 0.93
• How to improve?• 3-bit predictor?
• Problem?• A single predictor shared
by many branches