![Page 1: Data Hazards & Dynamic Instruction Scheduling (I)htseng/classes/cse203_2019fa/...Solution 1: Let’s try “stall” again • Each instruction has to go through all 5 pipeline stages:](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed493ef72897a089d300a35/html5/thumbnails/1.jpg)
Data Hazards & Dynamic Instruction Scheduling (I)
Hung-Wei Tseng
![Page 2: Data Hazards & Dynamic Instruction Scheduling (I)htseng/classes/cse203_2019fa/...Solution 1: Let’s try “stall” again • Each instruction has to go through all 5 pipeline stages:](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed493ef72897a089d300a35/html5/thumbnails/2.jpg)
Recap: Pipelining
!2
![Page 3: Data Hazards & Dynamic Instruction Scheduling (I)htseng/classes/cse203_2019fa/...Solution 1: Let’s try “stall” again • Each instruction has to go through all 5 pipeline stages:](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed493ef72897a089d300a35/html5/thumbnails/3.jpg)
Recap: Pipelining
!3
add x1, x2, x3 ld x4, 0(x5) sub x6, x7, x8 sub x9,x10,x11 sd x1, 0(x12) xor x13,x14,x15 and x16,x17,x18 add x19,x20,x21 sub x22,x23,x24 ld x25, 4(x26) sd x27, 0(x28)
IF IDIF
EXIDIF
MEMEXIDIF
WBMEM
EXIDIF
WBMEM
EXIDIF
WBMEM
EXIDIF
WBMEM
EXIDIF
WBMEM
EXIDIF
WBMEM
EXIDIF
WBMEM
EXIDIF
WBMEM WB
EX MEM WBID EX MEM
t
After this point, we are completing an instruction each cycle!
CyclesIn stru ctio n = 1
![Page 4: Data Hazards & Dynamic Instruction Scheduling (I)htseng/classes/cse203_2019fa/...Solution 1: Let’s try “stall” again • Each instruction has to go through all 5 pipeline stages:](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed493ef72897a089d300a35/html5/thumbnails/4.jpg)
• Structural hazards — resource conflicts cannot support simultaneous execution of instructions in the pipeline
• Control hazards — the PC can be changed by an instruction in the pipeline
• Data hazards — an instruction depending on a the result that’s not yet generated or propagated when the instruction needs that
!4
Recap: Three pipeline hazards
![Page 5: Data Hazards & Dynamic Instruction Scheduling (I)htseng/classes/cse203_2019fa/...Solution 1: Let’s try “stall” again • Each instruction has to go through all 5 pipeline stages:](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed493ef72897a089d300a35/html5/thumbnails/5.jpg)
• Structural hazards • Stall • Modify hardware design
• Control hazards • Stall • Static prediction • Dynamic prediction
!5
Recap: addressing hazards
![Page 6: Data Hazards & Dynamic Instruction Scheduling (I)htseng/classes/cse203_2019fa/...Solution 1: Let’s try “stall” again • Each instruction has to go through all 5 pipeline stages:](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed493ef72897a089d300a35/html5/thumbnails/6.jpg)
• Local predictor — every branch instruction has its own state • 2-bit — each state is described using 2 bits • Change the state based on actual outcome • If we guess right — no penalty • If we guess wrong — flush (clear pipeline
registers) for mis-predicted instructions that are currently in IF and ID stages and reset the PC
!6
Recap: 2-bit/Bimodal local predictor
0x400048 0x400032 100x400080 0x400068 110x401080 0x401100 000x4000F8 0x400100 01
branch PC target PC State
StrongNot Taken
00 (0)
WeakNot Taken
01 (1)
StrongTaken11 (3)
WeakTaken10 (2)Taken Taken
TakenTaken
Not takenNot taken
Not taken
Not taken
Predict Taken
![Page 7: Data Hazards & Dynamic Instruction Scheduling (I)htseng/classes/cse203_2019fa/...Solution 1: Let’s try “stall” again • Each instruction has to go through all 5 pipeline stages:](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed493ef72897a089d300a35/html5/thumbnails/7.jpg)
Recap: Global history (GH) predictor
!7
PC
4
MUX
0x400048 0x4000320x400080 0x4000680x401080 0x4011000x4000F8 0x400100
branch PC target PC
Branch Target Buffer
0100
Global History Register
State
s ass
ociat
ed w
ith hi
story 0001
1011101110111000000011100100
Predict Taken
=(NT, T,NT,NT)
![Page 8: Data Hazards & Dynamic Instruction Scheduling (I)htseng/classes/cse203_2019fa/...Solution 1: Let’s try “stall” again • Each instruction has to go through all 5 pipeline stages:](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed493ef72897a089d300a35/html5/thumbnails/8.jpg)
Recap: gshare predictor
!8
PC
4
MUX
0x400048 0x4000320x400080 0x4000680x401080 0x4011000x4000F8 0x400100
branch PC target PC
Branch Target Buffer
0100
Global History Register
State
s ass
ociat
ed w
ith pa
ttern 00
011011101110111000000011100100
Predict Not Taken
=(NT, T,NT,NT)
⊕ 1100
0100
1000
![Page 9: Data Hazards & Dynamic Instruction Scheduling (I)htseng/classes/cse203_2019fa/...Solution 1: Let’s try “stall” again • Each instruction has to go through all 5 pipeline stages:](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed493ef72897a089d300a35/html5/thumbnails/9.jpg)
0x400048 0x400032 10x400080 0x400068 10x401080 0x401100 10x4000F8 0x400100 0
branch PC target PC State
Recap: tournament Predictor
!9
PC
4MUX
Branch Target Buffer
0100
Global History Register
State
s ass
ociat
ed w
ith hi
story 0001
1011101110111000000011100100
0x400048 10000x400080 01100x401080 10100x4000F8 0110
branch PC local history
Local History Predictor
State
s ass
ociat
ed w
ith hi
story 0001
1011101110111000000011100100
Predict Taken
![Page 10: Data Hazards & Dynamic Instruction Scheduling (I)htseng/classes/cse203_2019fa/...Solution 1: Let’s try “stall” again • Each instruction has to go through all 5 pipeline stages:](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed493ef72897a089d300a35/html5/thumbnails/10.jpg)
• Which of the following implementations will perform the best on modern pipeline processors?
•
!10
Four implementations
inline int popcount(uint64_t x){ int c=0; while(x) { c += x & 1; x = x >> 1; } return c; }
A
inline int popcount(uint64_t x) { int c = 0; int table[16] = {0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4}; while(x) { c += table[(x & 0xF)]; x = x >> 4; } return c; }
C
inline int popcount(uint64_t x) { int c = 0; while(x) { c += x & 1; x = x >> 1; c += x & 1; x = x >> 1; c += x & 1; x = x >> 1; c += x & 1; x = x >> 1; } return c; }
Binline int popcount(uint64_t x) { int c = 0; int table[16] = {0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4}; for (uint64_t i = 0; i < 16; i++) { c += table[(x & 0xF)]; x = x >> 4; } return c; }
D
![Page 11: Data Hazards & Dynamic Instruction Scheduling (I)htseng/classes/cse203_2019fa/...Solution 1: Let’s try “stall” again • Each instruction has to go through all 5 pipeline stages:](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed493ef72897a089d300a35/html5/thumbnails/11.jpg)
• How many of the following statements explains the main reason why B outperforms C with compiler optimizations က: D has lower dynamic instruction count than C က< D has significantly lower branch mis-prediction rate than C က> D has significantly fewer branch instructions than C က@ D can incur fewer data hazards than C A. 0 B. 1 C. 2 D. 3 E. 4
!16
Why is D better than C?
inline int popcount(uint64_t x) { int c = 0; int table[16] = {0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4}; while(x) { c += table[(x & 0xF)]; x = x >> 4; } return c; }
C
inline int popcount(uint64_t x) { int c = 0; int table[16] = {0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4}; for (uint64_t i = 0; i < 16; i++) { c += table[(x & 0xF)]; x = x >> 4; } return c; }
D
— Compiler can do loop unrolling — no branches
— Could be
— maybe eliminated through loop unrolling…— about the same
![Page 12: Data Hazards & Dynamic Instruction Scheduling (I)htseng/classes/cse203_2019fa/...Solution 1: Let’s try “stall” again • Each instruction has to go through all 5 pipeline stages:](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed493ef72897a089d300a35/html5/thumbnails/12.jpg)
All branches are gone with loop unrolling
!17
inline int popcount(uint64_t x) { int c = 0; int table[16] = {0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4}; c += table[(x & 0xF)]; x = x >> 4; c += table[(x & 0xF)]; x = x >> 4; c += table[(x & 0xF)]; x = x >> 4; c += table[(x & 0xF)]; x = x >> 4; c += table[(x & 0xF)]; x = x >> 4; c += table[(x & 0xF)]; x = x >> 4; c += table[(x & 0xF)]; x = x >> 4; c += table[(x & 0xF)]; x = x >> 4; c += table[(x & 0xF)]; x = x >> 4; c += table[(x & 0xF)]; x = x >> 4; c += table[(x & 0xF)]; x = x >> 4; c += table[(x & 0xF)]; x = x >> 4; c += table[(x & 0xF)]; x = x >> 4; c += table[(x & 0xF)]; x = x >> 4; c += table[(x & 0xF)]; x = x >> 4; c += table[(x & 0xF)]; x = x >> 4; return c; }
![Page 13: Data Hazards & Dynamic Instruction Scheduling (I)htseng/classes/cse203_2019fa/...Solution 1: Let’s try “stall” again • Each instruction has to go through all 5 pipeline stages:](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed493ef72897a089d300a35/html5/thumbnails/13.jpg)
• Because popcount is important, both intel and AMD added a POPCNT instruction in their processors with SSE4.2 and SSE4a
• In C/C++, you may use the intrinsic “_mm_popcnt_u64” to get # of “1”s in an unsigned 64-bit number • You need to compile the program with -m64 -msse4.2 flags to
enable these new features
!18
Hardware acceleration
#include <smmintrin.h>inline int popcount(uint64_t x) { int c = _mm_popcnt_u64(x); return c; }
![Page 14: Data Hazards & Dynamic Instruction Scheduling (I)htseng/classes/cse203_2019fa/...Solution 1: Let’s try “stall” again • Each instruction has to go through all 5 pipeline stages:](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed493ef72897a089d300a35/html5/thumbnails/14.jpg)
• Why the performance is better when option is not “0” က: The amount of dynamic instructions needs to execute is a lot smaller က< The amount of branch instructions to execute is smaller က> The amount of branch mis-predictions is smaller က@ The amount of data accesses is smaller A. 0 B. 1 C. 2 D. 3 E. 4
!23
Demo revisited
if(option) std::sort(data, data + arraySize);
for (unsigned i = 0; i < 100000; ++i) { int threshold = std::rand(); for (unsigned i = 0; i < arraySize; ++i) { if (data[i] >= threshold) sum ++; } }
branch X
Without sorting
With sorting
The prediction
accuracy of X before
thresholdThe
prediction accuracy of
X after threshold
50%
50%
100%
100%
![Page 15: Data Hazards & Dynamic Instruction Scheduling (I)htseng/classes/cse203_2019fa/...Solution 1: Let’s try “stall” again • Each instruction has to go through all 5 pipeline stages:](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed493ef72897a089d300a35/html5/thumbnails/15.jpg)
Data hazards
!24
![Page 16: Data Hazards & Dynamic Instruction Scheduling (I)htseng/classes/cse203_2019fa/...Solution 1: Let’s try “stall” again • Each instruction has to go through all 5 pipeline stages:](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed493ef72897a089d300a35/html5/thumbnails/16.jpg)
• An instruction currently in the pipeline cannot receive the “logically” correct value for execution
• Data dependencies • The output of an instruction is the input of a later instruction • May result in data hazard if the later instruction that consumes the
result is still in the pipeline
!25
Data hazards
![Page 17: Data Hazards & Dynamic Instruction Scheduling (I)htseng/classes/cse203_2019fa/...Solution 1: Let’s try “stall” again • Each instruction has to go through all 5 pipeline stages:](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed493ef72897a089d300a35/html5/thumbnails/17.jpg)
Example: vector scaling
!26
i = 0; do { vector[i] += scale; } while ( ++i < size )
shl X5,X11, 3add X5, X5, X10ld X6, 0(X10) add X7, X6, X12 sd X7, 0(X10)addi X10,X10, 8 bne X10, X5, LOOP
LOOP:
![Page 18: Data Hazards & Dynamic Instruction Scheduling (I)htseng/classes/cse203_2019fa/...Solution 1: Let’s try “stall” again • Each instruction has to go through all 5 pipeline stages:](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed493ef72897a089d300a35/html5/thumbnails/18.jpg)
• How many pairs of data dependences are there in the following RISC-V instructions?
ld X6, 0(X10)add X7, X6, X12sd X7, 0(X10)addi X10,X10, 8bne X10, X5, LOOP
A. 1 B. 2 C. 3 D. 4 E. 5
!31
How many dependencies do we have?
![Page 19: Data Hazards & Dynamic Instruction Scheduling (I)htseng/classes/cse203_2019fa/...Solution 1: Let’s try “stall” again • Each instruction has to go through all 5 pipeline stages:](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed493ef72897a089d300a35/html5/thumbnails/19.jpg)
• Whenever the input is not ready when the consumer is decoding, just stall — the consumer stays at ID.
!32
Solution 1: Let’s try “stall” again
![Page 20: Data Hazards & Dynamic Instruction Scheduling (I)htseng/classes/cse203_2019fa/...Solution 1: Let’s try “stall” again • Each instruction has to go through all 5 pipeline stages:](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed493ef72897a089d300a35/html5/thumbnails/20.jpg)
• Each instruction has to go through all 5 pipeline stages: IF, ID, EXE, MEM, WB in order — only valid if it’s single-issue, RISC-V 5-stage pipeline
• An instruction can enter the next pipeline stage in the next cycle if• No other instruction is occupying the next stage • This instruction has completed its own work in the current stage • The next stage has all its inputs ready and it can retrieve those inputs
• Fetch a new instruction only if• We know the next PC to fetch • We can predict the next PC • Flush an instruction if the branch resolution says it’s mis-predicted.
• Review your undergraduate architecture materials — http://cseweb.ucsd.edu/classes/su19_2/cse141-a/
!37
Tips of drawing a pipeline diagram
![Page 21: Data Hazards & Dynamic Instruction Scheduling (I)htseng/classes/cse203_2019fa/...Solution 1: Let’s try “stall” again • Each instruction has to go through all 5 pipeline stages:](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed493ef72897a089d300a35/html5/thumbnails/21.jpg)
• How many pairs of instructions in the following RISC-V instructions will results in data hazards/stalls in a basic 5-stage RISC-V pipeline?
ld X6,0(X10)add X7,X6, X12sd X7,0(X10)addi X10,X10, 8bne X10,X5, LOOP
A. 1 B. 2 C. 3 D. 4 E. 5
!38
How many of data hazards?
IF IDIF
EXIDIF ID
WB
IFEX
IFID
MEM
IDEX
EX
WB
IDMEM
MEM
MEMIDIF
WBIDIF
EX
IFID
MEM
IFID
IDWB
![Page 22: Data Hazards & Dynamic Instruction Scheduling (I)htseng/classes/cse203_2019fa/...Solution 1: Let’s try “stall” again • Each instruction has to go through all 5 pipeline stages:](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed493ef72897a089d300a35/html5/thumbnails/22.jpg)
• Add logics/wires to forward the desired values to the demanding instructions
• In our five stage pipeline — if the instruction entering the EXE stage consumes a result from a previous instruction that is entering MEM stage or WB stage • A source of the instruction entering EXE stage is the destination of
an instruction entering MEM/WB stage • The previous instruction must be an instruction that updates
register file
!39
Solution 2: Data forwarding
![Page 23: Data Hazards & Dynamic Instruction Scheduling (I)htseng/classes/cse203_2019fa/...Solution 1: Let’s try “stall” again • Each instruction has to go through all 5 pipeline stages:](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed493ef72897a089d300a35/html5/thumbnails/23.jpg)
40
Instruction memory
PC
ALU
4
ReadAddress
Instruction[31:0]
Registers
Control
Inst
ruct
ion
[31:
26]
ReadRegister 1ReadRegister 2
WriteData
ReadData 1
ReadData 2
mux
0
1
Instruction[25:21]
Instruction[20:16]
Instruction[15:11]
Sign-extend
Instruction[15:0]
Adder
mux
01
AdderShift
Left 2
Data memory
Address
WriteData
ReadData m
ux
10
Zero
mux
0
1
WriteRegister
RegDst
BranchMemoryRead
MemToReg
ALU Ctrl.
ALUOp
Inst
ruct
ion
[5:0
]
MemoryWriteALUSrc
RegWrite
PCSr
c
IF/ID ID/EX EX/MEM MEM/WB
23
MEM/WB.RegisterRd
mux
210
Forwarding
EX/MEM.RegisterRd
ForwardA
ForwardBID/EXE.RegisterRtID/EXE.RegisterRs
EX/MEM.MemoryRead
![Page 24: Data Hazards & Dynamic Instruction Scheduling (I)htseng/classes/cse203_2019fa/...Solution 1: Let’s try “stall” again • Each instruction has to go through all 5 pipeline stages:](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed493ef72897a089d300a35/html5/thumbnails/24.jpg)
• How many pairs of instructions in the following RISC-V instructions will results in data hazards/stalls in a basic 5-stage RISC-V pipeline with “full” data forwarding?
ld X6,0(X10)add X7,X6,X12sd X7,0(X10)addi X10,X10, 8bne X10,X5, LOOP
A. 0 B. 1 C. 2 D. 3 E. 4
!45
Do we still have to stall?
IF IDIF
EXIDIF
MEMEX
IFID
EX
WBMEM
IF
MEMID
MEMWBIF
WBEXID
WB
MEMEXID
WB
![Page 25: Data Hazards & Dynamic Instruction Scheduling (I)htseng/classes/cse203_2019fa/...Solution 1: Let’s try “stall” again • Each instruction has to go through all 5 pipeline stages:](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed493ef72897a089d300a35/html5/thumbnails/25.jpg)
46
Instruction memory
PC
ALU
4
ReadAddress
Instruction[31:0]
Registers
Control
Inst
ruct
ion
[31:
26]
ReadRegister 1ReadRegister 2
WriteData
ReadData 1
ReadData 2
mux
0
1
Instruction[25:21]
Instruction[20:16]
Instruction[15:11]
Sign-extend
Instruction[15:0]
Adder
mux
01
AdderShift
Left 2
Data memory
Address
WriteData
ReadData m
ux
10
Zero
mux
0
1
WriteRegister
RegDst
BranchMemoryRead
MemToReg
ALU Ctrl.
ALUOp
Inst
ruct
ion
[5:0
]
MemoryWriteALUSrc
RegWrite
PCSr
c
IF/ID ID/EX EX/MEM MEM/WB
23
MEM/WB.RegisterRd
mux
210
Forwarding
EX/MEM.RegisterRd
ForwardA
ForwardBID/EXE.RegisterRtID/EXE.RegisterRs
EX/MEM.MemoryRead
mux
0
Hazard Detection
PCWrite
ID/EX.MemoryReadIF/IDWrite
![Page 26: Data Hazards & Dynamic Instruction Scheduling (I)htseng/classes/cse203_2019fa/...Solution 1: Let’s try “stall” again • Each instruction has to go through all 5 pipeline stages:](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed493ef72897a089d300a35/html5/thumbnails/26.jpg)
• What if our pipeline gets deeper? — Considering a newly designed pipeline where memory stage is split into 2 stages and the memory access finishes at the 2nd memory stage. By reordering which pair of the following instruction stream can we eliminate all stalls without affecting the correctness of the code?
① ld X6,0(X10) ② add X7,X6,X12 ③ sd X7,0(X10) ④ addi X10,X10, 8 ⑤ bne X10,X5, LOOP
!47
Problems with data forwarding
IF IDIF
EXIDIF IF
M1ID
M2IDIF
EXIDIF
WB
IF
EXID
M1
M1
WBM2
M2WB
WB
M1EXID
M2
EXM1M2WB
We are not making progress
![Page 27: Data Hazards & Dynamic Instruction Scheduling (I)htseng/classes/cse203_2019fa/...Solution 1: Let’s try “stall” again • Each instruction has to go through all 5 pipeline stages:](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed493ef72897a089d300a35/html5/thumbnails/27.jpg)
• By reordering which pair of the following instruction stream can we eliminate all stalls without affecting the correctness of the code? ① ld X6,0(X10) ② add X7,X6, X12 ③ sd X7,0(X10) ④ addi X10,X10, 8 ⑤ bne X10,X5, LOOP A. (1) & (2) B. (2) & (3) C. (3) & (4) D. (4) & (5) E. None of the pairs can be reordered
!52
The effect of code optimization
![Page 28: Data Hazards & Dynamic Instruction Scheduling (I)htseng/classes/cse203_2019fa/...Solution 1: Let’s try “stall” again • Each instruction has to go through all 5 pipeline stages:](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed493ef72897a089d300a35/html5/thumbnails/28.jpg)
• Consider the following dynamic instructions: ① ld X6,0(X10) ② add X7,X6, X12 ③ sd X7,0(X10) ④ addi X10,X10, 8 ⑤ bne X10,X5, LOOP ⑥ ld X6,0(X10) ⑦ add X7,X6, X12 ⑧ sd X7,0(X10) ⑨ addi X10,X10, 8 ɩ bne X10,X5, LOOP
Which of the following pair can we reorder without affecting the correctness if the branch prediction is perfect? A. (2) and (4) B. (3) and (5) C. (5) and (6) D. (6) and (9) E. (9) and (10)
!57
If we can predict the future …
Can we use “branch prediction” to predict the
future and reorder instructions across the branch?
![Page 29: Data Hazards & Dynamic Instruction Scheduling (I)htseng/classes/cse203_2019fa/...Solution 1: Let’s try “stall” again • Each instruction has to go through all 5 pipeline stages:](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed493ef72897a089d300a35/html5/thumbnails/29.jpg)
Dynamic instruction scheduling/Out-of-order (OoO) execution
!58
![Page 30: Data Hazards & Dynamic Instruction Scheduling (I)htseng/classes/cse203_2019fa/...Solution 1: Let’s try “stall” again • Each instruction has to go through all 5 pipeline stages:](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed493ef72897a089d300a35/html5/thumbnails/30.jpg)
• Each instruction has to go through all 5 pipeline stages: IF, ID, EXE, MEM, WB in order — only valid if it’s single-issue, RISC-V 5-stage pipeline
• An instruction can enter the next pipeline stage in the next cycle if• No other instruction is occupying the next stage • This instruction has completed its own work in the current stage • The next stage has all its inputs ready
• Fetch a new instruction only if• We know the next PC to fetch • We can predict the next PC • Flush an instruction if the branch resolution says it’s mis-predicted.
!59
Tips of drawing a pipeline diagram
![Page 31: Data Hazards & Dynamic Instruction Scheduling (I)htseng/classes/cse203_2019fa/...Solution 1: Let’s try “stall” again • Each instruction has to go through all 5 pipeline stages:](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed493ef72897a089d300a35/html5/thumbnails/31.jpg)
• Whenever the instruction is decoded — put decoded instruction somewhere
• Whenever the inputs are ready — all data dependencies are resolved
• Whenever the target functional unit is available
!60
What do you need to execution an instruction?
![Page 32: Data Hazards & Dynamic Instruction Scheduling (I)htseng/classes/cse203_2019fa/...Solution 1: Let’s try “stall” again • Each instruction has to go through all 5 pipeline stages:](https://reader034.vdocuments.us/reader034/viewer/2022042308/5ed493ef72897a089d300a35/html5/thumbnails/32.jpg)
• Homework #3 due Wednesday • iEval submission — attach your “confirmation” screen, you get an extra/bonus
homework • Project due on 12/2 — roughly three weeks from now
• You can only turn-in “helper.c” • mcfutil.c:refresh_potential() creates helper threads • mcfutil.c:refresh_potential() calls helper_thread_sync() function
periodically • It’s your task to think what to do in helper_thread_sync() and helper_thread()
functions • Please DO READ papers before you ask what to do
• Formula for grading — min(100, speedup*100) • No extension
• Office hour for Hung-Wei this week — MWF 1p-2p — no office hour next week!175
Announcement