prof. mokhtar aboelaze york university · cosc 4201 4 hardware-based speculation °instructions can...

17
COSC 4201 1 COSC4201 Instruction Level Parallelism Hardware-Based Speculation Prof. Mokhtar Aboelaze York University Based on Slides by Prof. L. Bhuyan (UCR) Prof. M. Shaaban (RIT)

Upload: others

Post on 09-Sep-2019

2 views

Category:

Documents


0 download

TRANSCRIPT

COSC 4201 1

COSC4201

Instruction Level ParallelismHardware-Based Speculation

Prof. Mokhtar Aboelaze

York UniversityBased on Slides by

Prof. L. Bhuyan (UCR)

Prof. M. Shaaban (RIT)

COSC 4201 2

Hardware-Based Speculation

°A wide-issue processor may have to execute a branch every cycle.

°We can overcome that by speculatingand continue to execute as if our guess is correct (not just issue, but execute).

°We need a mechanism to handle situations where our speculation is not correct

°Hardware-Based Speculation: dynamic branch prediction + speculation (and undo) + dynamic scheduling

°Used in MIPS, PowePC, PIII/4, AMD K5/6, and Alpha 21246

COSC 4201 3

Hardware-Based Speculation

°We must separate between passing results between instructions and completing the instruction (updates that can not be undone).

°The bypassed values can be used speculatively (we are not sure it is the right results to pass).

°After the instruction is not speculative any more, the instruction is said to be committed then we allow updates of memory and registers

COSC 4201 4

Hardware-Based Speculation°Instructions can execute out-of-order, but commits in-order

°A new stage, commit, is added to the process.

°A set of buffers, reorder buffer (ROB), is used to pass results between instructions.

°ROB provides extra buffers similar to the reservation stations in the Tomasulo’s algorithm.

°ROB hold result of an instruction between the end of the operation, and commit.

COSC 4201 5

Hardware-Based Speculation

°In Tomasulo’s, once the instruction write the results, it will be copied to the register file.

° In hardware-based speculation, the result will be written to the register file after commit.

°ROB contains 4 fields.• Instruction type (branch, store, op incl load)• Destination field (memory or reg. number)• Value field• Ready field

COSC 4201 6

COSC 4201 7

Hardware-Based Speculation

°ROB replaces store buffers.

°Data written to memory after commit.

°Still need the reservation stations to hold data and operations waiting for the functional unit to be free.

°We tag the results using the ROB entry number.

COSC 4201 8

Hardware-Based Speculation —Steps

° Issue: (AKA dispatch), Get the instruction from the queue,issue the instruction if there is a free ROB slot and there is a free reservation station, else stall. Send operands to reservation station if available in ROB or register file

°Execute: (AKA issue), If missing operand(s) monitor the CDB. Checking for RAW hazard.

COSC 4201 9

Hardware-Based Speculation —Steps

°Write Results:When the results are available, broadcast it on the CDB with the ROB tag.

°Commit: (AKA completion or graduation) When an instruction reaches the head of the queue, the data is written to the register file (or memory), and the instruction is committed. If an instruction that was wrongly speculated reaches the head of the queue, the ROB is flushed, and execution starts at the correct address

COSC 4201 10

Example

L.D F6,34(R2)

L.D F2,45(R3)

MUL.D F0,F2,F4

SUB.D F8,F6,F2

DIV.D F10,F0,F6

ADD.D F6,F8,F2

Shoe the reservationstations and ROB just when MUL.D is ready to commit

COSC 4201 11

Example

COSC 4201 12

Example

Loop: L.D F0,0(R1)

MUL.D F4,F0,F2

S.D F4,0(R1)

DADDI R1,R1,#-8

BNE R1,R2,Loop

COSC 4201 13

COSC 4201 14

Dynamic Scheduling — Conclusion

°More sensitive to branch prediction accuracy, since a misprediction will flush the ROB and starts over.

° If a speculated instruction raises an exception, it is recorded in the ROB. Dealt with after commit, or otherwise is flushed.

°Must deal with WAW and WAR hazards.• Loads can not initiate the second step of execution if there is a a store entry with the same destination address

• Maintain the program order for the computation of an effective address of a load wrt all earlier stores.

COSC 4201 15

Multiple Issue With Speculation

Loop: LD R2,0(R1)

DADDI R2,R2,#1

SD R2,0(R1)

DADDI R1,R1,#4

BNE R2,R3, Loop

COSC 4201 16

Answer: Without Speculation

COSC 4201 17

Answer: 2-way Superscalar Tomasulo With Speculation

Branches Still Single Issue