1/24/2016 11:00 pm 1 of 86 pipelining chapter 6. 1/24/2016 11:00 pm 2 of 86 overview of pipelining...

Post on 17-Jan-2018

220 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

1/24/ :00 PM 3 of 86 Analogy Doing laundry: Doing laundry: 1. Put clothes in washer to wash. 2. Put clothes in dryer to dry. 3. Put clothes on table to fold. 4. Put clothes away.

TRANSCRIPT

05/03/23 16:28 1 of 86

PipeliningPipeliningChapter 6Chapter 6

05/03/23 16:28 2 of 86

Overview of PipeliningOverview of Pipelining Pipelining is an implementation Pipelining is an implementation

technique in which multiple technique in which multiple instructions are overlapped in instructions are overlapped in execution.execution.

Pipelining improves performance by Pipelining improves performance by increasing instruction throughput.increasing instruction throughput.

The execution time of an individual The execution time of an individual instruction is not decreased.instruction is not decreased.

05/03/23 16:28 3 of 86

AnalogyAnalogy Doing laundry:Doing laundry:

1.1. Put clothes in washer to wash.Put clothes in washer to wash.2.2. Put clothes in dryer to dry.Put clothes in dryer to dry.3.3. Put clothes on table to fold.Put clothes on table to fold.4.4. Put clothes away.Put clothes away.

05/03/23 16:28 4 of 86

AnalogyAnalogy Non-pipelined:Non-pipelined:

05/03/23 16:28 5 of 86

AnalogyAnalogy Pipelined:Pipelined:

05/03/23 16:28 6 of 86

ExampleExample Assume that the operation time for Assume that the operation time for

the major functional units are:the major functional units are: 200 ps for memory access200 ps for memory access 200 ps for ALU operation200 ps for ALU operation 100 ps for register access100 ps for register access

MIPS InstructionsMIPS Instructions 5 stages for a MIPS instruction:5 stages for a MIPS instruction:

Fetch → Reg. Read → ALU Op.Fetch → Reg. Read → ALU Op.→ → Data access → Reg. WriteData access → Reg. Write

lw $s1, 100($s2)lw $s1, 100($s2) sw $s1, 100($s2)sw $s1, 100($s2) add $s1, $s2, $s3add $s1, $s2, $s3 beq $s1, $s2, 25beq $s1, $s2, 2505/03/23 16:28 7 of 86

05/03/23 16:28 8 of 86

ExampleExample

InstrInstructiouctio

nn

FetcFetchh

Reg Reg readread

ALU ALU opop

Data Data acceacce

ssss

Reg Reg writewrite

Total Total timetime

lwlw 200200 100100 200200 200200 100100 800 800 psps

swsw 200200 100100 200200 200200 700 700 psps

addadd 200200 100100 200200 100100 600 600 psps

beqbeq 200200 100100 200200 500 500 psps

Execution time for each instruction Execution time for each instruction class:class:

05/03/23 16:28 9 of 86

ExampleExample For the single-cycle design:For the single-cycle design: Must allow for the slowest Must allow for the slowest

instruction – lw.instruction – lw. So the time required for So the time required for everyevery

instruction is 800 ps.instruction is 800 ps.

05/03/23 16:28 10 of 86

ExampleExample Non-pipelined for three lw Non-pipelined for three lw

instructions:instructions:

05/03/23 16:28 11 of 86

ExampleExample Non-pipelined for three lw Non-pipelined for three lw

instructions:instructions: The time between the first and the The time between the first and the

fourth instructions is 3 x 800 ps = fourth instructions is 3 x 800 ps = 2400 ps.2400 ps.

05/03/23 16:28 12 of 86

ExampleExample For the pipelined multi-cycle design:For the pipelined multi-cycle design: Each clock cycle must be long Each clock cycle must be long

enough to accommodate the slowest enough to accommodate the slowest operation.operation.

So the time required for So the time required for everyevery clock clock cycle is 200 ps.cycle is 200 ps.

05/03/23 16:28 13 of 86

ExampleExample Pipelined for three lw instructions :Pipelined for three lw instructions :

05/03/23 16:28 14 of 86

ExampleExample Pipelined for three lw instructions:Pipelined for three lw instructions: The time between the first and the The time between the first and the

fourth instructions is 3 x 200 ps = fourth instructions is 3 x 200 ps = 600 ps.600 ps.

2400/600 = 4.2400/600 = 4. A fourfold performance A fourfold performance

improvement.improvement.

05/03/23 16:28 15 of 86

Pipeline HazardsPipeline Hazards Structural hazardsStructural hazards Data hazardsData hazards Control hazardsControl hazards

05/03/23 16:28 16 of 86

Structural HazardsStructural Hazards There is a structural hazard when There is a structural hazard when

the hardware cannot support the the hardware cannot support the combination of instructions that we combination of instructions that we want to execute in the same clock want to execute in the same clock cycle.cycle.

Analogy: Having a washer/dryer Analogy: Having a washer/dryer combination.combination.

05/03/23 16:28 17 of 86

ExampleExample What happens if we execute four lw What happens if we execute four lw

instructions one after another…instructions one after another…

05/03/23 16:28 18 of 86

ExampleExample What happens if we execute four lw What happens if we execute four lw

instructions one after another…instructions one after another…

05/03/23 16:28 19 of 86

ExampleExample What happens if we execute four lw What happens if we execute four lw

instructions one after another…instructions one after another…

05/03/23 16:28 20 of 86

ExampleExample What happens if we execute four lw What happens if we execute four lw

instructions one after another…instructions one after another…

The 1The 1stst instruction is accessing data while instruction is accessing data while the 4the 4thth instruction is being fetched. instruction is being fetched.

SolutionSolution Have two separate memories – Have two separate memories –

One for instructionOne for instructionOne for dataOne for data

05/03/23 16:28 21 of 86

05/03/23 16:28 22 of 86

Data HazardsData Hazards Data hazards occur when the pipeline Data hazards occur when the pipeline

must be stalled because one step must must be stalled because one step must wait for another to complete.wait for another to complete.

Arise from the dependence of one Arise from the dependence of one instruction on an earlier one that is still instruction on an earlier one that is still in the pipeline.in the pipeline.

addadd $s0, $t0, $t1$s0, $t0, $t1subsub $t2, $s0, $t3$t2, $s0, $t3

05/03/23 16:28 23 of 86

Solution 1Solution 1 Compilers can remove the data Compilers can remove the data

hazard by moving non-dependent hazard by moving non-dependent instructions in between.instructions in between.

05/03/23 16:28 24 of 86

Solution 2Solution 2 Observation: we don’t need to wait Observation: we don’t need to wait

for the add instruction to complete for the add instruction to complete before trying to resolve the data before trying to resolve the data hazard.hazard.

As soon as the ALU creates the sum As soon as the ALU creates the sum for the add, we can supply it as an for the add, we can supply it as an input for the subtract.input for the subtract.

05/03/23 16:28 25 of 86

ForwardingForwarding ForwardingForwarding or or bypassingbypassing is when is when

extra hardware is added to retrieve extra hardware is added to retrieve the missing item early from the the missing item early from the internal resources.internal resources.

05/03/23 16:28 26 of 86

ForwardingForwarding

Forwarding paths are valid only if Forwarding paths are valid only if the destination stage is later in time the destination stage is later in time than the source stage.than the source stage.

05/03/23 16:28 27 of 86

ForwardingForwarding What happens when we have a sub What happens when we have a sub

instruction after a lw instruction?instruction after a lw instruction?

05/03/23 16:28 28 of 86

ForwardingForwarding

05/03/23 16:28 29 of 86

Pipeline StallPipeline Stall Even with forwarding, we need to Even with forwarding, we need to

stall one stage for a stall one stage for a load-use data load-use data hazardhazard..

This is referred to as a This is referred to as a pipeline pipeline stallstall..

05/03/23 16:28 30 of 86

Example of reordering Example of reordering codecode

Consider the following code segment Consider the following code segment in C:in C:A = B + E;A = B + E;C = B + F;C = B + F;

Assume that all variables are in Assume that all variables are in memory and are addressable as memory and are addressable as offsets from $t0.offsets from $t0.

05/03/23 16:28 31 of 86

Example of reordering Example of reordering codecode

The corresponding MIPS code is:The corresponding MIPS code is:lwlw $t1, 0($t0)$t1, 0($t0) // load B; offset from // load B; offset from

$t0$t0lwlw $t2, 4($t0)$t2, 4($t0) // load E// load Eaddadd $t3, $t1, $t2$t3, $t1, $t2 // B + E// B + Eswsw $t3, 12($t0)$t3, 12($t0)lwlw $t4, 8($t0)$t4, 8($t0)addadd $t5, $t1, $t4$t5, $t1, $t4swsw $t5, 16($t0)$t5, 16($t0)

05/03/23 16:28 32 of 86

Example of reordering Example of reordering codecode

What are the problems?What are the problems?lwlw $t1, 0($t0)$t1, 0($t0) // load B; offset from // load B; offset from

$t0$t0lwlw $t2, 4($t0)$t2, 4($t0) // load E// load Eaddadd $t3, $t1, $t2$t3, $t1, $t2 // B + E// B + Eswsw $t3, 12($t0)$t3, 12($t0)lwlw $t4, 8($t0)$t4, 8($t0)addadd $t5, $t1, $t4$t5, $t1, $t4swsw $t5, 16($t0)$t5, 16($t0)

05/03/23 16:28 33 of 86

Example of reordering Example of reordering codecode

What are the problems?What are the problems?lwlw $t1, 0($t0)$t1, 0($t0) // load B; offset from // load B; offset from

$t0$t0lwlw $t2, 4($t0)$t2, 4($t0) // load E// load Eaddadd $t3, $t1, $t2$t3, $t1, $t2 // B + E// B + Eswsw $t3, 12($t0)$t3, 12($t0)lwlw $t4, 8($t0)$t4, 8($t0)addadd $t5, $t1, $t4$t5, $t1, $t4swsw $t5, 16($t0)$t5, 16($t0)

05/03/23 16:28 34 of 86

Example of reordering Example of reordering codecode

Code re-ordered with no stallsCode re-ordered with no stallslwlw $t1, 0($t0)$t1, 0($t0) // load B; offset from // load B; offset from

$t0$t0lwlw $t2, 4($t0)$t2, 4($t0) // load E// load Elwlw $t4, 8($t0)$t4, 8($t0)addadd $t3, $t1, $t2$t3, $t1, $t2 // B + E// B + Eswsw $t3, 12($t0)$t3, 12($t0)addadd $t5, $t1, $t4$t5, $t1, $t4swsw $t5, 16($t0)$t5, 16($t0)

05/03/23 16:28 35 of 86

Control HazardsControl Hazards A control hazard (also called branch A control hazard (also called branch

hazard) arises from the need to make a hazard) arises from the need to make a decision based on the results of one decision based on the results of one instruction while others are executing.instruction while others are executing.

The proper instruction cannot execute The proper instruction cannot execute in the proper clock cycle because the in the proper clock cycle because the instruction that was fetched is not the instruction that was fetched is not the one that is needed.one that is needed.

Caused by the branch instruction.Caused by the branch instruction.

05/03/23 16:28 36 of 86

Pipelined DatapathPipelined Datapath

05/03/23 16:28 37 of 86

Pipelined DatapathPipelined Datapath

05/03/23 16:28 38 of 86

Pipelined DatapathPipelined Datapath

05/03/23 16:28 39 of 86

Pipelined DP for lwPipelined DP for lwInstruction fetch

05/03/23 16:28 40 of 86

Pipelined DP for lwPipelined DP for lwInstruction decode

05/03/23 16:28 41 of 86

Pipelined DP for lwPipelined DP for lwInstruction execute

05/03/23 16:28 42 of 86

Pipelined DP for lwPipelined DP for lwMemory access

05/03/23 16:28 43 of 86

Pipelined DP for lwPipelined DP for lwWrite back

05/03/23 16:28 44 of 86

Pipelined DP for lwPipelined DP for lwTo properly handle write back

05/03/23 16:28 45 of 86

Pipelined ControlPipelined Control The pipelined registers are written at each The pipelined registers are written at each

clock cycle, so there’s no separate write clock cycle, so there’s no separate write signals for them (IF/ID, ID/EX, EX/MEM, signals for them (IF/ID, ID/EX, EX/MEM, and MEM/WB)and MEM/WB)

To specify control for the pipeline, we To specify control for the pipeline, we need only set the control values during need only set the control values during each pipeline stage.each pipeline stage.

Each control line is associated with a Each control line is associated with a component active in only a single pipeline component active in only a single pipeline stage.stage.

05/03/23 16:28 46 of 86

Pipelined ControlPipelined Control Divide the control lines into five groups:Divide the control lines into five groups:1.1. Instruction fetch – same operation in every Instruction fetch – same operation in every

clock cycle, therefore always asserted.clock cycle, therefore always asserted.2.2. Instruction decode – same as 1.Instruction decode – same as 1.3.3. Execution/address calculation – the signals Execution/address calculation – the signals

to be set are RegDst, ALUOp and ALUSrc.to be set are RegDst, ALUOp and ALUSrc.4.4. Memory access – the signals to be set are Memory access – the signals to be set are

Branch, MemRead and MemWrite. PCSrc is Branch, MemRead and MemWrite. PCSrc is asserted by ALUasserted by ALU

5.5. Write back – the signals to be set are Write back – the signals to be set are MemtoReg and RegWrite.MemtoReg and RegWrite.

05/03/23 16:28 47 of 86

Pipelined ControlPipelined Control The 9 control signalsThe 9 control signals

05/03/23 16:28 48 of 86

Pipelined ControlPipelined Control Implementing pipelined control Implementing pipelined control

means setting the nine control lines means setting the nine control lines to these values in each stage for to these values in each stage for each instruction.each instruction.

05/03/23 16:28 49 of 86

Pipelined ControlPipelined Control The 9 control signalsThe 9 control signals

05/03/23 16:28 50 of 86

Pipelined ControlPipelined Control 4 of the 9 control lines are used in 4 of the 9 control lines are used in

the EX stage.the EX stage. 5 are passed on to the EX/MEM 5 are passed on to the EX/MEM

registerregister

05/03/23 16:28 51 of 86

Pipelined ControlPipelined Control 3 of the 9 lines are used in the MEM 3 of the 9 lines are used in the MEM

stage.stage. 2 are passed on to the MEM/WB 2 are passed on to the MEM/WB

registerregister

05/03/23 16:28 52 of 86

Pipelined ControlPipelined Control 2 of the 9 control lines are used in 2 of the 9 control lines are used in

the WB stage.the WB stage.

05/03/23 16:28 53 of 86

Pipelined ControlPipelined Control

05/03/23 16:28 54 of 86

Data HazardsData Hazards Pipelined dependences for 5 Pipelined dependences for 5

instructionsinstructions

05/03/23 16:28 55 of 86

ForwardingForwarding

05/03/23 16:28 56 of 86

Datapath with Datapath with Forwarding UnitForwarding Unit

Ignores forwarding of a store value to a store instruction.Ignores forwarding of a store value to a store instruction.

05/03/23 16:28 57 of 86

Forwarding UnitForwarding Unit The forwarding unit controls the The forwarding unit controls the

ALU multiplexors to replace the ALU multiplexors to replace the value from a general-purpose value from a general-purpose register with the value from the register with the value from the proper pipeline register.proper pipeline register.

05/03/23 16:28 58 of 86

Data Hazards and StallsData Hazards and Stalls One case where forwarding cannot One case where forwarding cannot

solve the problem is when an solve the problem is when an instruction tries to read a register instruction tries to read a register following a load instruction that following a load instruction that writes the same register.writes the same register.

E.g. a lw followed by a subE.g. a lw followed by a sub

05/03/23 16:28 59 of 86

Data Hazards and StallsData Hazards and Stalls

Since the dependence between the Since the dependence between the lwlw and the and the andand goes goes back in time, this hazard cannot be solved by forwarding.back in time, this hazard cannot be solved by forwarding.

05/03/23 16:28 60 of 86

Inserting a StallInserting a Stall

05/03/23 16:28 61 of 86

Inserting a StallInserting a Stall The The andand instruction is turned into a instruction is turned into a

nopnop All instructions beginning with the All instructions beginning with the

andand instruction are delayed one instruction are delayed one cycle.cycle.

05/03/23 16:28 62 of 86

Hazard Detection UnitHazard Detection Unit

05/03/23 16:28 63 of 86

Hazard Detection UnitHazard Detection Unit The hazard detection unit controls The hazard detection unit controls

the writing of the PC and IF/ID the writing of the PC and IF/ID registers plus the multiplexor that registers plus the multiplexor that chooses between the real control chooses between the real control values and all 0s.values and all 0s.

The hazard detection unit stalls and The hazard detection unit stalls and deasserts the control fields if the deasserts the control fields if the load-use hazard test is true.load-use hazard test is true.

05/03/23 16:28 64 of 86

Control HazardControl Hazard Pipeline hazards involving branches.Pipeline hazards involving branches. The branch instruction decides The branch instruction decides

whether to branch in the MEM stage whether to branch in the MEM stage (clock cycle 4 in the figure).(clock cycle 4 in the figure).

In the meantime, three following In the meantime, three following instructions will have begun instructions will have begun execution.execution.

05/03/23 16:28 65 of 86

Control HazardControl Hazard

05/03/23 16:28 66 of 86

Solutions for Control Solutions for Control HazardsHazards

1.1. Assume branch not takenAssume branch not taken Continue execution down the sequential Continue execution down the sequential

instruction stream.instruction stream. If the branch is taken, the instructions If the branch is taken, the instructions

that are in the pipeline must be discarded.that are in the pipeline must be discarded. Execution continues at the branch target.Execution continues at the branch target. If branches are untaken half the time, and If branches are untaken half the time, and

if it costs little to discard the instructions, if it costs little to discard the instructions, then this optimization halves the cost of then this optimization halves the cost of control hazards.control hazards.

05/03/23 16:28 67 of 86

Solutions for Control Solutions for Control HazardsHazards

1.1. Assume branch not takenAssume branch not taken Discarding instructions means to flush Discarding instructions means to flush

instructions in the IF, ID, and Ex instructions in the IF, ID, and Ex stages of the pipeline.stages of the pipeline.

Change the original control values to Change the original control values to 0s, and let them percolate through the 0s, and let them percolate through the pipeline.pipeline.

05/03/23 16:28 68 of 86

Solutions for Control Solutions for Control HazardsHazards

2.2. Reducing the delay of branchesReducing the delay of branches Reduce the cost of the taken branch.Reduce the cost of the taken branch. Move the branch execution earlier in Move the branch execution earlier in

the pipeline so that fewer instructions the pipeline so that fewer instructions need to be flushed.need to be flushed.

Requires two actions to occur earlier:Requires two actions to occur earlier:

05/03/23 16:28 69 of 86

Solutions for Control Solutions for Control HazardsHazards

2.2. Reducing the delay of branchesReducing the delay of branches Reduce the cost of the taken branch.Reduce the cost of the taken branch. Move the branch execution earlier in Move the branch execution earlier in

the pipeline so that fewer instructions the pipeline so that fewer instructions need to be flushed.need to be flushed.

Requires two actions to occur earlier:Requires two actions to occur earlier:i.i. Computing the branch target address.Computing the branch target address.

05/03/23 16:28 70 of 86

Solutions for Control Solutions for Control HazardsHazards

2.2. Reducing the delay of branchesReducing the delay of branches Reduce the cost of the taken branch.Reduce the cost of the taken branch. Move the branch execution earlier in Move the branch execution earlier in

the pipeline so that fewer instructions the pipeline so that fewer instructions need to be flushed.need to be flushed.

Requires two actions to occur earlier:Requires two actions to occur earlier:i.i. Computing the branch target address.Computing the branch target address.ii.ii. Evaluating the branch decision.Evaluating the branch decision.

05/03/23 16:28 71 of 86

Solutions for Control Solutions for Control HazardsHazards

2.2. Reducing the delay of branchesReducing the delay of branchesi.i. Computing the branch target address.Computing the branch target address. Easy.Easy. Already have the PC and the immediate Already have the PC and the immediate

field in the IF/ID pipeline register.field in the IF/ID pipeline register. Just move the branch adder from the EX Just move the branch adder from the EX

stage to the ID stage.stage to the ID stage. The address calculation will be performed The address calculation will be performed

for all instructions, but only used when for all instructions, but only used when needed.needed.

05/03/23 16:28 72 of 86

Branch adder locationBranch adder location Move from EX to ID stageMove from EX to ID stage

05/03/23 16:28 73 of 86

Solutions for Control Solutions for Control HazardsHazards

2.2. Reducing the delay of branchesReducing the delay of branchesii.ii. Evaluating the branch decision.Evaluating the branch decision. Harder.Harder. Need to compare the two registers read Need to compare the two registers read

during the ID stage.during the ID stage. During ID, we mustDuring ID, we must

Decode the instructionDecode the instruction Decide whether a bypass to the equality unit is Decide whether a bypass to the equality unit is

needed. Source can come from EX/MEM or needed. Source can come from EX/MEM or MEM/WB pipeline registers.MEM/WB pipeline registers.

Complete the comparison.Complete the comparison. Set the PC to the branch address if necessary.Set the PC to the branch address if necessary.

05/03/23 16:28 74 of 86

Solutions for Control Solutions for Control HazardsHazards

2.2. Reducing the delay of branchesReducing the delay of branchesii.ii. Evaluating the branch decision.Evaluating the branch decision. The values in a branch comparison are The values in a branch comparison are

needed during ID but may be produced needed during ID but may be produced later in time later in time can cause a data hazard can cause a data hazard and a stall might be needed.and a stall might be needed.

Ex. If an ALU instruction immediately Ex. If an ALU instruction immediately preceding a branch produces one of the preceding a branch produces one of the operands for the comparison in the operands for the comparison in the branch, a stall will be required. Why?branch, a stall will be required. Why?

05/03/23 16:28 75 of 86

Solutions for Control Solutions for Control HazardsHazards

2.2. Reducing the delay of branchesReducing the delay of branchesii.ii. Evaluating the branch decision.Evaluating the branch decision. The values in a branch comparison are needed The values in a branch comparison are needed

during ID but may be produced later in time during ID but may be produced later in time can cause a data hazard and a stall might be can cause a data hazard and a stall might be needed.needed.

Ex. If an ALU instruction immediately Ex. If an ALU instruction immediately preceding a branch produces one of the preceding a branch produces one of the operands for the comparison in the branch, a operands for the comparison in the branch, a stall will be required.stall will be required.

Because the EX stage for the ALU instruction Because the EX stage for the ALU instruction will occur after the ID cycle of the branch.will occur after the ID cycle of the branch.

05/03/23 16:28 76 of 86

Solutions for Control Solutions for Control HazardsHazards

2.2. Reducing the delay of branchesReducing the delay of branchesii.ii. Evaluating the branch decision.Evaluating the branch decision. Ex. If a load instruction immediately Ex. If a load instruction immediately

preceding a branch produces one of the preceding a branch produces one of the operands for the comparison in the operands for the comparison in the branch, two stalls will be required.branch, two stalls will be required.

Because the result from the load appears Because the result from the load appears at the end of the MEM cycle but is needed at the end of the MEM cycle but is needed at the beginning of the ID cycle of the at the beginning of the ID cycle of the branch.branch.

05/03/23 16:28 77 of 86

Solutions for Control Solutions for Control HazardsHazards

2.2. Reducing the delay of branchesReducing the delay of branches Moving the branch execution to the ID Moving the branch execution to the ID

stage is an improvement since it reduces stage is an improvement since it reduces the penalty of a branch to only one the penalty of a branch to only one instruction if the branch is taken, namely, instruction if the branch is taken, namely, the one currently being fetched.the one currently being fetched.

Zeros the instruction field of the IF/ID Zeros the instruction field of the IF/ID pipeline register.pipeline register.

Clearing the register transforms the Clearing the register transforms the fetched instruction into a nop.fetched instruction into a nop.

05/03/23 16:28 78 of 86

Solutions for Control Solutions for Control HazardsHazards

3.3. Dynamic branch predictionDynamic branch prediction Assuming a branch is not taken is one Assuming a branch is not taken is one

simple form of branch prediction.simple form of branch prediction. With deeper pipelines and multiple issue, With deeper pipelines and multiple issue,

branch penalty increases in terms of branch penalty increases in terms of instructions lost.instructions lost.

A simple static branch prediction wastes too A simple static branch prediction wastes too much performance.much performance.

Possible to try to predict branch behavior Possible to try to predict branch behavior dynamically (i.e. during program execution).dynamically (i.e. during program execution).

05/03/23 16:28 79 of 86

Dynamic Branch Dynamic Branch PredictionPrediction

Implementation:Implementation: A A branch prediction bufferbranch prediction buffer or or

branch history tablebranch history table is used. is used. This is a small memory indexed by the This is a small memory indexed by the

lower portion of the address of the lower portion of the address of the branch instruction.branch instruction.

The memory contains a bit that says The memory contains a bit that says whether the branch was recently taken whether the branch was recently taken or not.or not.

05/03/23 16:28 80 of 86

Dynamic Branch Dynamic Branch PredictionPrediction

Look up the address of the Look up the address of the instruction to see if a branch was instruction to see if a branch was taken the last time this instruction taken the last time this instruction was executed.was executed.

If so, then fetch the new instruction If so, then fetch the new instruction from the same place.from the same place.

05/03/23 16:28 81 of 86

Dynamic Branch Dynamic Branch PredictionPrediction

The bit may have been put there by The bit may have been put there by another branch instruction that has the another branch instruction that has the same low-order address bits.same low-order address bits.

If the hint is wrong thenIf the hint is wrong then The incorrectly predicted instructions are The incorrectly predicted instructions are

deleted.deleted. The prediction bit is inverted and stored The prediction bit is inverted and stored

back.back. The proper sequence is fetched and The proper sequence is fetched and

executed.executed.

05/03/23 16:28 82 of 86

Dynamic Branch Dynamic Branch PredictionPrediction

Problem:Problem: If the branch is almost always taken, we will If the branch is almost always taken, we will

likely predict incorrectly likely predict incorrectly twicetwice, rather than , rather than once, when it is not taken.once, when it is not taken.

Example:Example: Consider a loop branch that branches nine Consider a loop branch that branches nine

times in a row, then is not taken once on the times in a row, then is not taken once on the tenth time. What is the prediction accuracy tenth time. What is the prediction accuracy assuming the prediction bit for this branch assuming the prediction bit for this branch remains in the prediction buffer?remains in the prediction buffer?

05/03/23 16:28 83 of 86

Dynamic Branch Dynamic Branch PredictionPrediction

Answer:Answer: The steady-state prediction behavior will The steady-state prediction behavior will

mispredict on the first and last loop mispredict on the first and last loop iterations.iterations.

Mispredicting the last iteration is Mispredicting the last iteration is inevitable since the prediction bit will say inevitable since the prediction bit will say taken during the first nine times.taken during the first nine times.

Mispredicting on the first iteration Mispredicting on the first iteration happens because the bit is flipped on prior happens because the bit is flipped on prior execution of the last iteration of the loop.execution of the last iteration of the loop.

05/03/23 16:28 84 of 86

Dynamic Branch Dynamic Branch PredictionPrediction

The prediction accuracy for this The prediction accuracy for this branch that is taken 90% of the time branch that is taken 90% of the time is only 80% (8 out of 10).is only 80% (8 out of 10).

Ideally, the accuracy of the predictor Ideally, the accuracy of the predictor should match the taken branch should match the taken branch frequency for these highly regular frequency for these highly regular branches.branches.

05/03/23 16:28 85 of 86

Dynamic Branch Dynamic Branch PredictionPrediction

A 2-bit prediction scheme.A 2-bit prediction scheme. A prediction must be wrong twice A prediction must be wrong twice

before the bit is changed.before the bit is changed.

05/03/23 16:28 86 of 86

2-bit prediction scheme2-bit prediction scheme

05/03/23 16:28 87 of 86

Solutions for Control Solutions for Control HazardsHazards

4.4. Scheduling the branch delay slotScheduling the branch delay slot dd

05/03/23 16:28 88 of 86

Partial MIPS Partial MIPS InstructionsInstructions

InstructiInstructionon

OP (6)OP (6) rs (5)rs (5) rt (5)rt (5) rd (5)rd (5) shamt shamt (5)(5)

funct funct (6)(6)

LWLW 3535 rsrs rdrd offsetoffsetSWSW 4343 rsrs rdrd offsetoffsetBEQBEQ 44 rsrs rtrt offsetoffsetADDADD 00 rsrs rtrt rdrd 00 3232SUBSUB 00 rsrs rtrt rdrd 00 3434ANDAND 00 rsrs rtrt rdrd 00 3636OROR 00 rsrs rtrt rdrd 00 3737SLTSLT 00 rsrs rtrt rdrd 00 4242ADDIADDI 88 rsrs rtrt immimmOUTOUT 6363 rsrs

* All numbers are in decimal.

top related