hardware speculation
TRANSCRIPT
-
7/25/2019 Hardware Speculation
1/20
10/8/15
1
ILP: Out-of-Order Execution
Antonia Zhai
Department Computer Science and Engineering
University of Minnesota
http://www.cs.umn.edu/~zhai
With slides from: Profs. Mowry, Falsafi, Hill, Hoe, Lipasti, Shen,
Smith, Sohi, Vijaykumar, Patterson, Culler
Antonia Zhai !"#$%&'#() +, -#""%'+(.Antonia Zhai
Branch on equal
IF: Instruction fetch
IR
-
7/25/2019 Hardware Speculation
2/20
10/8/15
2
Antonia Zhai !"#$%&'#() +, -#""%'+(.Antonia Zhai
Datapath for Conditional Branch Instructions
3
PC
Instr.Mem.
Reg.Array
regA
regB
regW
datW
datA
datB
ALU
25:21
20:16
+4
aluA
aluB
IncrPC
Instr
Xtnd
-
7/25/2019 Hardware Speculation
3/20
10/8/15
3
Antonia Zhai !"#$%&'#() +, -#""%'+(.Antonia Zhai
Branch Prediction
Why does prediction work?
Underlying algorithm has regularities. Loops are iterated multiple times
Data that is being operated on has regularities.
Instruction sequence has redundancies:
Artifacts of way that humans/compilers think
E.g., Error checking branches are rarely taken
Prediction!Compressible information streams?
Prediction allows us to break control dependence constraints
5
Antonia Zhai !"#$%&'#() +, -#""%'+(.Antonia Zhai
Elements of Branch Prediction
Determine whether it is a branch instruction
Predict whether it will be taken or not
Predict the target address if taken
6
r1 ! r2 / r3
r2 ! r1 / r3r3 ! r2 - r3
beq r3, 100
Just Predicting Taken/Not Taken Can Help
The target can be computed much
earlier than the branch decision
-
7/25/2019 Hardware Speculation
4/20
10/8/15
4
Antonia Zhai !"#$%&'#() +, -#""%'+(.Antonia Zhai
Prediction I: A branch will do exactly what itdid last time
7
Branch History Table (BHT)
Each entry is a state machine; Indexed by low-order bits of instruction address
Encode information about prior history of branch instructions
Small chance of two branch instructions aliasing
Predict whether or not branch will be taken 0
1
1
1
1
0
0
0
.. .. .. .. .. .. .. 1 0 1 0 0 0 0 0
Branch Prediction Table Index
Antonia Zhai !"#$%&'#() +, -#""%'+(.
Example
Taken/NotTaken
Instruction Prediction
Taken 0x108: beq r1, 0x20 Not Taken
Taken 0x108: beq r1, 0x20
Taken 0x108: beq r1, 0x20
Not taken 0x108: beq r1, 0x20
Taken 0x108: beq r1, 0x20
Not taken 0x208: beq r2, 0x10
Taken 0x108: beq r1, 0x20
0 0x08
-
7/25/2019 Hardware Speculation
5/20
10/8/15
5
Antonia Zhai !"#$%&'#() +, -#""%'+(.
Example
Taken/NotTaken
Instruction Prediction
Taken 0x108: beq r1, 0x20 Not taken
Taken 0x108: beq r1, 0x20 Taken
Taken 0x108: beq r1, 0x20 Taken
Not taken 0x108: beq r1, 0x20 Taken
Taken 0x108: beq r1, 0x20 Not taken
Not taken 0x208: beq r2, 0x10 Taken
Taken 0x108: beq r1, 0x20 Not taken
0/1
Problem:Predictor changes tooquickly
0x08
Antonia Zhai !"#$%&'#() +, -#""%'+(.Antonia Zhai
Example
10
for i = 0; i < 100; i ++ {
for j = 0; j < 10; j++ {total = a[i][j]
}
}
What is the misprediction rate?
There are two branches:
1. Backward branch for the inner loop(2 out of 10 misprediction for each invocation,
100 invocation,Misprediction rate: 200/1000)
2. Backward branch for the outer loop
(2/100 misprediction for total)
Solution:
(200 + 2)/(1000 + 100)
-
7/25/2019 Hardware Speculation
6/20
10/8/15
6
Antonia Zhai !"#$%&'#() +, -#""%'+(.Antonia Zhai
Prediction II
Change the prediction after twomispredictions
2-bit saturation counter
11
T T T
Yes! Yes? No? No!
NT
T
NT NT
NT
00/01/10/11 0x08
Antonia Zhai !"#$%&'#() +, -#""%'+(.Antonia Zhai
Example
12
for i = 0; i < 100; i ++ {
for j = 0; j < 10; j++ {total = a[i][j]
}
}
What is the misprediction rate?There are two branches:
1.
Backward branch for the inner loop(1 out of 10 misprediction for each invocation,
100 invocation,
2 extra miss in the first iterationMisprediction rate: 102/1000)
2. Backward branch for the outer loop(3/100 misprediction for total)
Solution:
(102 + 3)/(1000 + 100)
-
7/25/2019 Hardware Speculation
7/20
10/8/15
7
Antonia Zhai !"#$%&'#() +, -#""%'+(.Antonia Zhai
Generalization
Using a N-bit saturation counter as a predictor
If branch taken & counter value < (2^n 1): Increment counter
If branch not taken & counter > 0 Decrement counter
Prediction: Taken: if most significant bit is 1
Not taken: if most significant bit is 0
13
Find the proper N:
We want to remember the history, but only recent history
Antonia Zhai !"#$%&'#() +, -#""%'+(.Antonia Zhai
Prediction III
Whether a branch is taken or notdepends on other branch instructions
How can we make use of this information?
14
if (a > 1) // branch #1
conquer the worldif (a < -1) // branch #2
clean my living room
Two branch instructions:
If branch #1 is taken,Branch #2 will never be taken
-
7/25/2019 Hardware Speculation
8/20
10/8/15
8
Antonia Zhai !"#$%&'#() +, -#""%'+(.Antonia Zhai
Correlation Branch Predictor
Every branch has two separate predictors One bit predicts the branch if the last branch is taken
One bit predicts the branch if the last branch is not taken
A.K.A. two-level predictor
15
Prev.
Branch
Taken
Prev.
Branch
not
Taken
NT
NT
NT
NTT
T T
T
One bit predictor with one bit of correlation
Antonia Zhai !"#$%&'#() +, -#""%'+(.
Example
for(i = 0; i < 10; i++) {
a = random(-100, 100)// a is a random number
// from -100, 100
if (a > 1) // b1
conquer the world
if (a < -1) // b2clean my living room
} // b3
Branch #2
Input:-10, 7, 5, 10, -2, -55, 4, -89, 33, -3
B2
Predictor
Correlate
with B1
B1Action
B2Prediction
B2Action
NT/NT NT NT T
T NT
T NT
T NT
NT T
NT T
T NT
NT T
T NT
NT T
-
7/25/2019 Hardware Speculation
9/20
10/8/15
9
Antonia Zhai !"#$%&'#() +, -#""%'+(.
Example
B2Predictor
Correlate
with B1
B1Action
B2Prediction
B2Action
NT / NT NT NT T
T / NT T NT NT
T / NT T NT NT
T / NT T NT NT
T / NT NT T T
T / NT NT T T
T / NT T NT NT
T / NT NT T TT / NT T NT NT
T / NT NT T T
for(i = 0; i < 10; i++) {
a = random(-100, 100)// a is a random number
// from -100, 100
if (a > 1) // b1
conquer the world
if (a < -1) // b2clean my living room
} // b3
Branch #2
Input:-10, 7, 5, 10, -2, -55, 4, -89, 33, -3
Antonia Zhai !"#$%&'#() +, -#""%'+(.Antonia Zhai
Definition
(1, 1) predictor
Uses the behavior of the last branch
Selects from 2^1 sets of choices
Each choice is coded with 1 bit
(m, n) predictor
Use the history of m branches
Select from 2^m sets of choices
Each choice is coded with n bits
Example: How many bits are there in a 1024-entry (2, 2) branchpredictor
1024 * (2^2) * 2 = 8192 bits
18
-
7/25/2019 Hardware Speculation
10/20
10/8/15
10
Antonia Zhai !"#$%&'#() +, -#""%'+(.Antonia Zhai
A (2,2) Predictor
XX
00/01/10/11 Two bit global history
Branch Address
Antonia Zhai !"#$%&'#() +, -#""%'+(.Antonia Zhai
Combine the Local & Global Predictors
Local predictor: Predict based on history of just one branch
Global predictor:
Predictor based on global history
Combine them with a selector (a multi-level predictor)
20
A branch predictor without branch address???
-
7/25/2019 Hardware Speculation
11/20
10/8/15
11
Antonia Zhai !"#$%&'#() +, -#""%'+(.Antonia Zhai
Tournament Predictor
Alpha branch predictor
4k-entry 2 predictor-predictor Use a 2 bit saturation counter to select between two predictors Based on local information of the branch
A local predictor 1024-entry 10-bit predictor, keeps track of 10 most recent
outcomes
The entry then selects from a 3-bit saturation counter
4k-entry global predictor, indexed by the history of 12 branches,
Each entry is a standard 2-bit predictor
11.5 misprediction per 1000 completed instruction forSPECint95
21
Antonia Zhai !"#$%&'#() +, -#""%'+(.
Branch Target Buffer
0x0000ac24
0b0000001010110000100100
A C 2 4
0x0000aca4
0b0000001010110010100100
A C 2 4
0x00
0x01
0x02
0x03
0x04
0x05
0x06
0x07
0x08
0x09
0x0a
0x0b
0x0c
0x0d
0x0e
0x0f0x2b2
0x2b0
-
7/25/2019 Hardware Speculation
12/20
10/8/15
12
Antonia Zhai !"#$%&'#() +, -#""%'+(.
Example
TAG/Address Instruction
(all branches are taken)
Prediction
NULL/NULL 0xac24: beq r1, 0x20
0xac24: beq r1, 0x20
0xac24: beq r1, 0x20
0xac24: beq r1, 0x20
0xac24: beq r1, 0x20
0xac24: beq r1, 0x20
0xac24: beq r1, 0x200xaca4: beq r2, 0x10
0xac24: beq r1, 0x20
Antonia Zhai !"#$%&'#() +, -#""%'+(.
Example
TAG/Address Instruction
(all branches are taken)
Prediction
NULL/NULL 0xac24: beq r1, 0x20 No match, no prediction
2b0/0xac48 0xac24: beq r1, 0x20 Match, 0xac48
2b0/0xac48 0xac24: beq r1, 0x20 Match, 0xac48
2b0/0xac48 0xac24: beq r1, 0x20 Match, 0xac48
2b0/0xac48 0xac24: beq r1, 0x20 Match, 0xac48
2b0/0xac48 0xac24: beq r1, 0x20 Match, 0xac48
2b0/0xac48 0xac24: beq r1, 0x20 Match, 0xac482b0/0xac48 0xaca4: beq r2, 0x10 No match, no prediction
2b2/0xacb8 0xac24: beq r1, 0x20 No Match, no prediction
-
7/25/2019 Hardware Speculation
13/20
10/8/15
13
Antonia Zhai !"#$%&'#() +, -#""%'+(.Antonia Zhai
The Entire Process
25
Antonia Zhai !"#$%&'#() +, -#""%'+(.Antonia Zhai
Hardware Speculation
When the prediction is wrong, incorrectly executed instruction must beerased
Our hardware support system does not allow this
Extending the hardware --- hardware speculation
Separate the bypassing of results among instructions from thecompletion of instruction
Adding an instruction commit stage to Tomasulos algorithm
Goal Instruction commits inorder
26
-
7/25/2019 Hardware Speculation
14/20
10/8/15
14
Antonia Zhai !"#$%&'#() +, -#""%'+(.Antonia Zhai
Reorder Buffer --- ROB
Hardware buffer that holds the results of instructions that havefinished execution but not yet committed
InstructionType
DestinationField
Value Field Ready Field
27
Antonia Zhai !"#$%&'#() +, -#""%'+(.Antonia Zhai
Tomasulos Algorithm with ROB
28
FP Adders
Common data bus CDB)
Registers
Operand
Buses
FP Multipliers
Instr.
Queue
Operation
Bus
Res.
Stations
Mem.
Unit
Addr. Unit
addr
Load
buffer
12 1
23
ROB
addrReg#Store Addr
Store Value
-
7/25/2019 Hardware Speculation
15/20
10/8/15
15
Antonia Zhai !"#$%&'#() +, -#""%'+(.Antonia Zhai
Four Execution Stages
Issue: Get data from in order instruction queue
Issue instruction if a reservation station and a ROBentry isavailable, else stall
Read register value if available in the register or ROB, else set tag Update register entry and ROBentry
Execute (a.k.a. issue):
Monitor the common data bus to wait for operands
Execute when both operands are ready (Resolve RAW dependences)
Write Results:
Write result to the CDB "all reservation stations, ROB
For store value and address are sent to ROB
Commit
29
Antonia Zhai !"#$%&'#() +, -#""%'+(.Antonia Zhai
Four Execution Stages
Commit (a.k.a., complete, graduate) Normal commit: head of ROB & results in the buffer
Update register
Remove instruction from ROB
Store instruction
Update memory
Remove instruction from ROB
Branch instruction
Correctly predicted, nothing Incorrectly predicted, flush ROB
30
-
7/25/2019 Hardware Speculation
16/20
10/8/15
16
Antonia Zhai !"#$%&'#() +, -#""%'+(.Antonia Zhai
Op1 Op2 ROB
Adder
Adder2
Op1 Op2 ROB
Mult
V: 3 V:3 ROB1
Mult 2
Tag Value
R0 3
R1 100
R2 Mult 1 11
i1: R2 !R0 * R0I2: bne R2, 0x20
i3: 100(R1) !st R0
I4: R0 !R0 + R0
Register File
Time: T0
3 cycles
1 cycles
1 cycles
i1: issue
Examples: Hardware Speculation
Op Dst Val. Ready
ROB1 ALU R2 --- No
ROB2
ROB3
ROB4
PC Op1 Op2 ROB
Branch
31
1 cycles
Incorrectly
Predicted asNot taken
Reorder Buffer
Branchdelay slot
Antonia Zhai !"#$%&'#() +, -#""%'+(.Antonia Zhai
Op1 Op2 ROB
Adder
Adder2
Op1 Op2 ROB
Mult
V: 3
V:3 ROB1
Mult 2
Tag Value
R0 3
R1 100
R2 Mult 1 11
i1: R2 !R0 * R0I2: bne R2, 0x20
i3: 100(R1) !st R0
I4: R0 !R0 + R0
Register File
Time: T1
3 cycles
1 cycles
1 cycles
i1: execute 1
i2: issue
Examples: Hardware Speculation
Op Dst Val. Ready
ROB1 ALU R2 --- NoROB2 bne --- --- No
ROB3
ROB4
PC Op1 Op2 ROB
Branch V:PC+4 T:Mult1V:0x20 ROB2
32
1 cycles
Incorrectly
Predicted asNot taken
Reorder Buffer
-
7/25/2019 Hardware Speculation
17/20
10/8/15
17
Antonia Zhai !"#$%&'#() +, -#""%'+(.Antonia Zhai
Op1 Op2 ROB
Adder
Adder2
Op1 Op2 ROB
Mult
V: 3 V:3 ROB1
Mult 2
Tag Value
R0 3
R1 100
R2 Mult 1 11
i1: R2 !R0 * R0I2: bne R2, 0x20
i3: 100(R1) !st R0
I4: R0 !R0 + R0
Register File
Time: T2
3 cycles
1 cycles
1 cycles
i1: execute 2
i2: wait (pred)i3:issue (not shown)
Examples: Hardware Speculation
Op Dst Val. Ready
ROB1 ALU R2 --- No
ROB2 bne --- --- No
ROB3 St --- --- No
ROB4
PC Op1 Op2 ROB
Branch V:PC+4 T:Mult1V:0x20 ROB2
33
1 cycles
Incorrectly
Predicted asNot taken
Reorder Buffer
Antonia Zhai !"#$%&'#() +, -#""%'+(.Antonia Zhai
Op1 Op2 ROB
Adder V:3 V:3 ROB4
Adder2
Op1 Op2 ROB
Mult
V: 3 V:3 ROB1
Mult 2
Tag Value
R0 Adder 1 3
R1 100
R2 Mult 1 11
i1: R2 !R0 * R0I2: bne R2, 0x20
i3: 100(R1) !st R0
I4: R0 !R0 + R0
Register File
Time: T3
3 cycles
1 cycles
1 cycles
i1: execute 3
i2: waiti3: execute (addr. Unit)
i4: issue
Examples: Hardware Speculation
Op Dst Val. Ready
ROB1 ALU R2 --- N0ROB2 bne --- --- N0
ROB3 St --- --- NO
ROB4 ALU R0 --- NO
PC Op1 Op2 ROB
Branch V:PC+4 T:Mult1V:0x20 ROB2
34
1 cycles
Incorrectly
Predicted asNot taken
Reorder Buffer
-
7/25/2019 Hardware Speculation
18/20
10/8/15
18
Antonia Zhai !"#$%&'#() +, -#""%'+(.Antonia Zhai
Op1 Op2 ROB
Adder V:3 V:3 ROB4
Adder2
Op1 Op2 ROB
Mult
Mult 2
Tag Value
R0 Adder 1 3
R1 100
R2 Mult 1 11
i1: R2 !R0 * R0I2: bne R2, 0x20
i3: 100(R1) !st R0
I4: R0 !R0 + R0
Register File
Time: T4
3 cycles
1 cycles
1 cycles
i1: write result
i2: waiti3: write result(stall)
i4: execute
Examples: Hardware Speculation
Op Dst Val. Ready
ROB1 ALU R2 9 Yes
ROB2 bne --- --- N0
ROB3 St --- --- NO
ROB4 ALU R0 --- NO
PC Op1 Op2 ROB
Branch V:PC+4 V:9 V:0x20 ROB2
35
1 cycles
Incorrectly
Predicted asNot taken
Reorder Buffer
Antonia Zhai !"#$%&'#() +, -#""%'+(.Antonia Zhai
Op1 Op2 ROB
Adder V:3 V:3 ROB4
Adder2
Op1 Op2 ROB
Mult
Mult 2
Tag Value
R0 Adder 1 3
R1 100
R2 9
i1: R2 !R0 * R0I2: bne R2, 0x20
i3: 100(R1) !st R0
I4: R0 !R0 + R0
Register File
Time: T5
3 cycles
1 cycles
1 cycles
i1: commit
i2: executei3: write results
i4: write results(stall)
Examples: Hardware Speculation
Op Dst Val. Ready
ROB1
ROB2 bne --- --- N0
ROB3 St 200 V:3 Yes
ROB4 ALU R0 --- NO
PC Op1 Op2 ROB
Branch V:PC+4 V:9 V:0x20 ROB2
36
1 cycles
Incorrectly
Predicted asNot taken
Reorder Buffer
-
7/25/2019 Hardware Speculation
19/20
10/8/15
19
Antonia Zhai !"#$%&'#() +, -#""%'+(.Antonia Zhai
Op1 Op2 ROB
Adder V:3 V:3 ROB4
Adder2
Op1 Op2 ROB
Mult
Mult 2
Tag Value
R0 Adder 1 3
R1 100
R2 9
i1: R2 !R0 * R0I2: bne R2, 0x20
i3: 100(R1) !st R0
I4: R0 !R0 + R0
Register File
Time: T6
3 cycles
1 cycles
1 cycles
i2: write result
(misprediction)i3: wait for commit
i4: write results
Examples: Hardware Speculation
Op Dst Val. Ready
ROB1
ROB2 bne --- --- Yes
ROB3 St 200 V:3 Yes
ROB4 ALU R0 V:6 Yes
PC Op1 Op2 ROB
Branch V:PC+4 V:9 V:0x20 ROB2
37
1 cycles
Incorrectly
Predicted asNot taken
Reorder Buffer
Antonia Zhai !"#$%&'#() +, -#""%'+(.Antonia Zhai
Op1 Op2 ROB
Adder
Adder2
Op1 Op2 ROB
Mult
Mult 2
Tag Value
R0 3
R1 100
R2 9
i1: R2 !R0 * R0I2: bne R2, 0x20
i3: 100(R1) !st R0
I4: R0 !R0 + R0
Register File
Time: T7
3 cycles
1 cycles
1 cycles
i2: commit
(misprediction)i3: wait for commit
i4: Squashed
Examples: Hardware Speculation
Op Dst Val. Ready
ROB1
ROB2
ROB3 St 200 V:3 Yes
ROB4
PC Op1 Op2 ROB
Branch
38
1 cycles
Incorrectly
Predicted asNot taken
Reorder Buffer
Branchdelay slot
-
7/25/2019 Hardware Speculation
20/20
10/8/15
20
Antonia Zhai !"#$%&'#() +, -#""%'+(.Antonia Zhai
Op1 Op2 ROB
Adder
Adder2
Op1 Op2 ROB
Mult
Mult 2
Tag Value
R0 3
R1 100
R2 9
i1: R2 !R0 * R0I2: bne R2, 0x20
i3: 100(R1) !st R0
I4: R0 !R0 + R0
Register File
Time: T8
3 cycles
1 cycles
1 cycles
i3: commit
Examples: Hardware Speculation
Op Dst Val. Ready
ROB1
ROB2
ROB3
ROB4
PC Op1 Op2 ROB
Branch
39
1 cycles
Incorrectly
Predicted asNot taken
Reorder Buffer
Branchdelay slot
Store 3 to memory location 200