instruction level parallelism2

Upload: shahida18

Post on 04-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 Instruction Level Parallelism2

    1/56

    Instruction Level Parallelism

  • 7/29/2019 Instruction Level Parallelism2

    2/56

    Instruction Level Parallelism(ILP)

    Overlap the execution of instructions to

    improve performance

    Two approaches to exploit ILP

    1. dynamic and hardware intensive

    (desktop and server markets)

    2. static and software intensive(embedded market)

  • 7/29/2019 Instruction Level Parallelism2

    3/56

    Instruction Level Parallelism(ILP)

    Pipeline CPI = Ideal pipeline CPI +

    structural stalls +

    data hazard stalls +control stalls

    reduce each of the terms in the RHS to reduce

    the overall CPI and thus increase instructionsper cycle (IPC)

  • 7/29/2019 Instruction Level Parallelism2

    4/56

    Instruction Level Parallelism(ILP)

    Amount of parallelism available within a Basic

    Block is very small.

    Therefore, we exploit ILP across multiple

    blocks

  • 7/29/2019 Instruction Level Parallelism2

    5/56

    Instruction Level Parallelism(ILP)

    Loop-level parallelism: exploit parallelism

    across iterations of a loop

    Ex:

    for (i=1;i

  • 7/29/2019 Instruction Level Parallelism2

    6/56

    Instruction Level Parallelism(ILP)

    Converting loop-level parallelism intoinstruction level parallelism either statically bythe compiler or dynamically by the hardware

    Alternatively, use vector instructions thatoperate on a sequence of data items

    ex: we need 4 vector instructions to execute

    this code: 2 for loading x and y into memory, 2for adding the two vectors and 1 for storingback the result vector

  • 7/29/2019 Instruction Level Parallelism2

    7/56

    Data dependences and Hazards

    Finding dependences is critical in determining

    how much parallelism exists in a program

    Which instructions can be executed in

    parallel?

    Whether an instruction is dependent on other

    instruction?

  • 7/29/2019 Instruction Level Parallelism2

    8/56

    Data dependences and Hazards

    Three different types of dependences:

    1. data dependences (also called true data

    dependences)2. name dependences

    3. control dependences

  • 7/29/2019 Instruction Level Parallelism2

    9/56

    Data dependences

    An instruction j is data dependent on instruction Iif either of the following holds:

    1. instruction i produces a result that may be

    used by instruction j, or2. instruction j is data dependent on

    instruction k, and instruction k is data

    dependent on instruction i(this dependence chain can be as long as the

    entire program)

  • 7/29/2019 Instruction Level Parallelism2

    10/56

    Data dependences

    Ex: LOOP: L.D F0, 0(R1)

    ADD.D F4, F0, F2

    S.D F4, 0(R1)

    DAAUI R1, R1, #-8

    BNE R1, R2, LOOP

    If two instructions are data dependent, they cannot beexecuted simultaneously or be completely overlapped

    The dependence implies that there would be a chain of oneor more data hazards between the two instructions

  • 7/29/2019 Instruction Level Parallelism2

    11/56

    Data dependences

    The effect of the original data dependence mustbe preserved

    The presence of the dependence indicates the

    potential for a hazard, but the actual hazard andthe length of any stall is a property of thepipeline.

    Ex: there is a data dependence between DADDIU and BNE;

    this dependence causes a stall because we moved the branchtest for the MIPS pipeline to the ID stage. Had the branch test

    stayed in EX, this dependence would not cause a stall

  • 7/29/2019 Instruction Level Parallelism2

    12/56

    Data dependences

    The importance of the data dependences isthat a dependence

    (1) indicates the possibility of a hazard.

    (2) determines the order in which results must

    be calculated, and

    (3) sets an upper bound on how much

    parallelism can possibly be exploited

  • 7/29/2019 Instruction Level Parallelism2

    13/56

    Data dependences

    A dependence can be overcome in two

    different ways:

    1. maintaining the dependence but avoiding a

    hazard, &

    2. eliminating a dependence by transforming

    the code

  • 7/29/2019 Instruction Level Parallelism2

    14/56

    Data dependences

    Primary method used to avoid a hazard is byscheduling the code without altering thedependencies (we see a hardware scheme forscheduling code dynamically as it is executed)

    Dependences that flow through registers areeasy to detect than dependences that flowthrough memory locations(register names are fixed in the instruction,

    100(R4) & 20(R6) may be identical,

    20(R4) & 20(R4) may be different)

  • 7/29/2019 Instruction Level Parallelism2

    15/56

    Name dependences

    Occurs when two instructions use the same

    register or memory location, called a name,

    but there is no flow of data between the

    instructions associated with that name.

  • 7/29/2019 Instruction Level Parallelism2

    16/56

    Name dependences

    There are two types of name dependencesbetween an instruction i that precedesinstruction j in program order:

    1. An antidependence between instruction i &instruction j occurs when instruction j

    writes register or memory location that

    instruction i reads. The original orderingmust be preserved to ensure that I reads the

    correct values

  • 7/29/2019 Instruction Level Parallelism2

    17/56

    Name dependences

    2. An output dependence occurs when

    instruction i & instruction j writes the same

    register or memory location. The ordering

    between the instructions must be preservedto ensure that value finally written

    corresponds to instruction j

  • 7/29/2019 Instruction Level Parallelism2

    18/56

    Name dependences

    Instructions involved in a name dependence

    can execute simultaneously or be reordered,

    if the name( register number or memory)

    used in the instruction is changed so theinstructions do not conflict (register

    renaming)

  • 7/29/2019 Instruction Level Parallelism2

    19/56

    Dependences and data hazards

    Data hazards are of three types depending on

    the order of read and write accesses in the

    instructions:

    Consider two instructions i and j, with i

    occurring before j in program order

    1. RAW (read after write) true data

    dependence. Ex: LOAD followed by an ALU

    instrn that directly uses the LOAD result

  • 7/29/2019 Instruction Level Parallelism2

    20/56

    Dependences and data hazards

    2. WAW (write after write) output

    dependence.

    3. WAR (write after read) - antidependence

    RAR (read after read) is not a hazard

  • 7/29/2019 Instruction Level Parallelism2

    21/56

    Control dependences

    Determines the ordering of an instruction i

    with respect to a branch instruction

    Ex: if p1 {

    s1;

    }

    if p2 {s2;

    }

    s1 is control dependent on p1,

    and s2 is control dependent on

    p2, but not on p1

  • 7/29/2019 Instruction Level Parallelism2

    22/56

    Control dependences

    Two constraints imposed by control

    dependencies:

    1. An instruction that is control dependent on a

    branch cannot be moved before the branch

    so that its execution is no longer controlled

    by the branch

    2. An instruction that is not control dependent

    on a branch cannot be moved after the

    branch so that its execution is controlled by

    the branch

  • 7/29/2019 Instruction Level Parallelism2

    23/56

    Control dependences

    Control dependencies are preserved by two

    properties in a simple pipeline:

    1. instructions execute in program order: this

    ensures that the instruction that occurs

    before the branch is executed before the

    branch

    2. the detection of a control or branch hazard

    ensures that an instruction that is control

    dependent on a branch is not executed until

    the branch direction is known.

  • 7/29/2019 Instruction Level Parallelism2

    24/56

    Control dependences

    However we may be willing to violate the

    control dependencies without affecting the

    correctness of the program

    Two properties that are critical to program

    correctness and that must be preserved using

    data and control dependencies are

    exception behavior and the data flow.

  • 7/29/2019 Instruction Level Parallelism2

    25/56

    Control dependences

    Preserving the exception behavior: reordering of

    instruction execution must not change how

    exceptions are raised in the program (or must not

    cause any new exceptions in the program)

    Ex: DADDU R2, R3, R4

    BEQZ R2, L1

    LW R1, 0(R2)

    Speculation a hardware technique which allowsus to overcome this exception problem

    To show how maintainingdata and control dependences do

    not cause any new exceptions

    after instruction reordering

  • 7/29/2019 Instruction Level Parallelism2

    26/56

    Control dependences

    Preserving the data flow : data flow is the actual

    flow of data values among instructions that produce

    results and those that consume them

    Branches make the data flow dynamic

    ex: DADDU R1, R2, R3

    BEQZ R4, L

    DSUBU R1, R5, R6

    L1: OR R7, R1, R8

    Data dependence alone is

    insufficient for correct execution.

    Instead when the instructions

    execute, the data flow must be

    preserved. This data flow is

    preserved by preserving controlflow.

  • 7/29/2019 Instruction Level Parallelism2

    27/56

    Overcoming Data Hazards using

    dynamic scheduling

    Hardware rearranges the instruction execution

    to reduce the stalls while maintaining data

    flow and exception behavior

    Advantage of dynamic scheduling is gained at

    the cost of significant increase in hardware

    complexity

  • 7/29/2019 Instruction Level Parallelism2

    28/56

    Dynamic scheduling

    Limitation of a simple pipeline: Instructions

    are issued in program order, and if an

    instruction is stalled in the pipeline, no later

    instruction can proceed

    Ex: DIV.D F0, F2, F4

    ADD.D F10, F0, F8

    SUB.D F12, F8, F14

    ( SUB.D cannot execute because the dependence of ADD.D on

    DIV.D causes the pipeline to stall even though it is not data

    dependent on anything in the pipeline)

  • 7/29/2019 Instruction Level Parallelism2

    29/56

    Dynamic scheduling

    In the 5-stage pipeline, both structural and data

    hazards are checked in the ID stage

    In order to begin executing SUB.D , we must separate

    the ID stage into two parts:

    1. checking for structural hazards2. Waiting for the absence of data hazard

    We still use in-order instruction issue, but we want

    an instruction to begin execution as soon as its dataoperands are available (out-of-order execution and

    hence out-of order-completion)

  • 7/29/2019 Instruction Level Parallelism2

    30/56

    Dynamic scheduling

    Out-of-order execution introduces the possibility of

    WAR and WAW hazards

    Ex: DIV.D F0, F2, F4

    ADD.D F6, F0, F8

    SUB.D F8, F10, F14

    MUL.D F6, F10, F8

    ( WAR hazard because of ADD.D and SUB.D if SUB.D

    executes before ADD.D.WAW hazard because of ADD.D and MUL.D )

  • 7/29/2019 Instruction Level Parallelism2

    31/56

    Dynamic schedulingTomosulos

    approach

    The scheme

    - tracks when operands for instructions are

    available, to minimize RAW hazards and- uses register renaming, to minimize WAR and WAW

    hazards

    In Tomosulos approach, register renaming is

    provided by the reservation stations

  • 7/29/2019 Instruction Level Parallelism2

    32/56

    Dynamic schedulingTomasulos

    approach

    How register renaming eliminates WAR and WAW

    hazards:

    Ex: code before renaming:

    DIV.D F0, F2, F4

    ADD.D F6, F0, F8

    S.D F6, 0(R1)

    SUB.D F8, F10, F14

    MUL.D F6, F10, F8

    Code after renaming:

    DIV.D F0, F2, F4

    ADD.D S, F0, F8

    S.D S, 0(R1)

    SUB.D T, F10, F14

    MUL.D F6, F10, T

  • 7/29/2019 Instruction Level Parallelism2

    33/56

    Tomasulo-based MIPS processor

  • 7/29/2019 Instruction Level Parallelism2

    34/56

    Tomasulo-based MIPS processor

    Each reservation station has 7 fields

    Op:Operation to perform on source operands

    Vj, Vk: Value of Source operands

    Loads have offset value in Vk field

    Qj, Qk: Reservation stations producing the corresponding source

    operands (value of 0 indicates that the value of source operand

    already available in Vj or Vk, or is unnecessary)

    A : used to hold information for the memory address

    calculation for a load or store

    Busy: Indicates reservation station or Functional unit is busy

  • 7/29/2019 Instruction Level Parallelism2

    35/56

    Tomasulo-based MIPS processor

    The register file has one field

    Qi : The number/name of the reservation station that

    contains the operation whose result should be

    stored into this register

    The Load and Store buffers each have a field

    A : holds the result of the effective address

  • 7/29/2019 Instruction Level Parallelism2

    36/56

    Three Stages of Tomasulo

    Algorithm1. Issueget instruction from FP Op Queue (Maintains

    the correct data flow)

    If reservation station free (no structural hazard),

    control issues instr & sends operands (renames registers).2. Executeoperate on operands (EX)

    When both operands ready then execute;

    if not ready, watch Common Data Bus for result

    3. Write resultfinish execution (WB)Write on Common Data Bus to all awaiting units;

    mark reservation station available

  • 7/29/2019 Instruction Level Parallelism2

    37/56

    Dynamic Scheduling : Example

    Show what the information tables look like for the following codesequence when only the first load has completed and written its

    result:

    1. L.D F6, 34(R2)

    2. L.D F2, 45(R3)

    3. MUL.D F0, F2, F4

    4. SUB.D F8, F2, F6

    5. DIV.D F10, F0, F6

    6. ADD.D F6, F8, F2

  • 7/29/2019 Instruction Level Parallelism2

    38/56

    Tomasulo ExampleInstruction status: Exec Write

    Instruction j k Issue Comp Result Busy Address

    L.D F6 34+ R2 Load1 NoL.D F2 45+ R3 Load2 No

    MULT.D F0 F2 F4 Load3 No

    SUB.D F8 F6 F2

    DIV.D F10 F0 F6

    ADD.D F6 F8 F2

    Reservation Stations: S1 S2 RS RS

    Time Name Busy Op Vj Vk Qj Qk

    Add1 No

    Add2 No

    Add3 No

    Mult1 No

    Mult2 No

    Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

    0 FU

  • 7/29/2019 Instruction Level Parallelism2

    39/56

    Tomasulo Example Cycle 1Instruction status: Exec Write

    Instruction j k Issue Comp Result Busy Address

    LD F6 34+ R2 1 Load1 Yes 34+R2LD F2 45+ R3 Load2 No

    MULTD F0 F2 F4 Load3 No

    SUBD F8 F6 F2

    DIVD F10 F0 F6

    ADDD F6 F8 F2

    Reservation Stations: S1 S2 RS RS Time Name Busy Op Vj Vk Qj Qk

    Add1 No

    Add2 No

    Add3 No

    Mult1 No

    Mult2 No

    Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

    1 FU Load1

  • 7/29/2019 Instruction Level Parallelism2

    40/56

    Tomasulo Example Cycle 2Instruction status: Exec Write

    Instruction j k Issue Comp Result Busy Address

    LD F6 34+ R2 1 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3

    MULTD F0 F2 F4 Load3 No

    SUBD F8 F6 F2

    DIVD F10 F0 F6

    ADDD F6 F8 F2

    Reservation Stations: S1 S2 RS RS

    Time Name Busy Op Vj Vk Qj Qk

    Add1 No

    Add2 No

    Add3 No

    Mult1 No

    Mult2 No

    Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

    2 FU Load2 Load1

  • 7/29/2019 Instruction Level Parallelism2

    41/56

    Tomasulo Example Cycle 3Instruction status: Exec Write

    Instruction j k Issue Comp Result Busy Address

    LD F6 34+ R2 1 3 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3

    MULTD F0 F2 F4 3 Load3 No

    SUBD F8 F6 F2

    DIVD F10 F0 F6

    ADDD F6 F8 F2

    Reservation Stations: S1 S2 RS RS Time Name Busy Op Vj Vk Qj Qk

    Add1 No

    Add2 No

    Add3 No

    Mult1 Yes MULTD R(F4) Load2

    Mult2 No

    Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

    3 FU Mult1 Load2 Load1

  • 7/29/2019 Instruction Level Parallelism2

    42/56

    Tomasulo Example Cycle 4Instruction status: Exec Write

    Instruction j k Issue Comp Result Busy Address

    LD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 Load2 Yes 45+R3

    MULTD F0 F2 F4 3 Load3 No

    SUBD F8 F6 F2 4

    DIVD F10 F0 F6

    ADDD F6 F8 F2

    Reservation Stations: S1 S2 RS RS Time Name Busy Op Vj Vk Qj Qk

    Add1 Yes SUBD M(A1) Load2

    Add2 No

    Add3 No

    Mult1 Yes MULTD R(F4) Load2

    Mult2 No

    Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

    4 FU Mult1 Load2 M(A1) Add1

    Load2 completing; what is waiting for Load2?

  • 7/29/2019 Instruction Level Parallelism2

    43/56

    Tomasulo Example Cycle 5Instruction status: Exec Write

    Instruction j k Issue Comp Result Busy Address

    LD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 No

    MULTD F0 F2 F4 3 Load3 No

    SUBD F8 F6 F2 4

    DIVD F10 F0 F6 5

    ADDD F6 F8 F2

    Reservation Stations: S1 S2 RS RS Time Name Busy Op Vj Vk Qj Qk

    2 Add1 Yes SUBD M(A1) M(A2)

    Add2 No

    Add3 No

    10 Mult1 Yes MULTD M(A2) R(F4)

    Mult2 Yes DIVD M(A1) Mult1

    Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

    5 FU Mult1 M(A2) M(A1) Add1 Mult2

  • 7/29/2019 Instruction Level Parallelism2

    44/56

    Tomasulo Example Cycle 6Instruction status: Exec Write

    Instruction j k Issue Comp Result Busy Address

    LD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 No

    MULTD F0 F2 F4 3 Load3 No

    SUBD F8 F6 F2 4

    DIVD F10 F0 F6 5

    ADDD F6 F8 F2 6

    Reservation Stations: S1 S2 RS RS Time Name Busy Op Vj Vk Qj Qk

    1 Add1 Yes SUBD M(A1) M(A2)

    Add2 Yes ADDD M(A2) Add1

    Add3 No

    9 Mult1 Yes MULTD M(A2) R(F4)

    Mult2 Yes DIVD M(A1) Mult1

    Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

    6 FU Mult1 M(A2) Add2 Add1 Mult2

  • 7/29/2019 Instruction Level Parallelism2

    45/56

    Tomasulo Example Cycle 7Instruction status: Exec Write

    Instruction j k Issue Comp Result Busy Address

    LD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 No

    MULTD F0 F2 F4 3 Load3 No

    SUBD F8 F6 F2 4 7

    DIVD F10 F0 F6 5

    ADDD F6 F8 F2 6

    Reservation Stations: S1 S2 RS RS Time Name Busy Op Vj Vk Qj Qk

    0 Add1 Yes SUBD M(A1) M(A2)

    Add2 Yes ADDD M(A2) Add1

    Add3 No

    8 Mult1 Yes MULTD M(A2) R(F4)

    Mult2 Yes DIVD M(A1) Mult1

    Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

    7 FU Mult1 M(A2) Add2 Add1 Mult2

    Add1 completing; what is waiting for it?

  • 7/29/2019 Instruction Level Parallelism2

    46/56

    Tomasulo Example Cycle 8Instruction status: Exec Write

    Instruction j k Issue Comp Result Busy Address

    LD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 No

    MULTD F0 F2 F4 3 Load3 No

    SUBD F8 F6 F2 4 7 8

    DIVD F10 F0 F6 5

    ADDD F6 F8 F2 6

    Reservation Stations: S1 S2 RS RS Time Name Busy Op Vj Vk Qj Qk

    Add1 No

    2 Add2 Yes ADDD (M-M) M(A2)

    Add3 No

    7 Mult1 Yes MULTD M(A2) R(F4)

    Mult2 Yes DIVD M(A1) Mult1

    Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

    8 FU Mult1 M(A2) Add2 (M-M) Mult2

  • 7/29/2019 Instruction Level Parallelism2

    47/56

    Tomasulo Example Cycle 9Instruction status: Exec Write

    Instruction j k Issue Comp Result Busy Address

    LD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 No

    MULTD F0 F2 F4 3 Load3 No

    SUBD F8 F6 F2 4 7 8

    DIVD F10 F0 F6 5

    ADDD F6 F8 F2 6

    Reservation Stations: S1 S2 RS RS Time Name Busy Op Vj Vk Qj Qk

    Add1 No

    1 Add2 Yes ADDD (M-M) M(A2)

    Add3 No

    6 Mult1 Yes MULTD M(A2) R(F4)

    Mult2 Yes DIVD M(A1) Mult1

    Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

    9 FU Mult1 M(A2) Add2 (M-M) Mult2

  • 7/29/2019 Instruction Level Parallelism2

    48/56

    Tomasulo Example Cycle 10Instruction status: Exec Write

    Instruction j k Issue Comp Result Busy Address

    LD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 No

    MULTD F0 F2 F4 3 Load3 No

    SUBD F8 F6 F2 4 7 8

    DIVD F10 F0 F6 5

    ADDD F6 F8 F2 6 10

    Reservation Stations: S1 S2 RS RS Time Name Busy Op Vj Vk Qj Qk

    Add1 No

    0 Add2 Yes ADDD (M-M) M(A2)

    Add3 No

    5 Mult1 Yes MULTD M(A2) R(F4)

    Mult2 Yes DIVD M(A1) Mult1

    Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

    10 FU Mult1 M(A2) Add2 (M-M) Mult2

    Add2 completing; what is waiting for it?

  • 7/29/2019 Instruction Level Parallelism2

    49/56

    Tomasulo Example Cycle 11Instruction status: Exec Write

    Instruction j k Issue Comp Result Busy Address

    LD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 No

    MULTD F0 F2 F4 3 Load3 No

    SUBD F8 F6 F2 4 7 8

    DIVD F10 F0 F6 5

    ADDD F6 F8 F2 6 10 11

    Reservation Stations: S1 S2 RS RS Time Name Busy Op Vj Vk Qj Qk

    Add1 No

    Add2 No

    Add3 No

    4 Mult1 Yes MULTD M(A2) R(F4)

    Mult2 Yes DIVD M(A1) Mult1

    Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

    11 FU Mult1 M(A2) (M-M+ (M-M) Mult2

  • 7/29/2019 Instruction Level Parallelism2

    50/56

    Tomasulo Example Cycle 12Instruction status: Exec Write

    Instruction j k Issue Comp Result Busy Address

    LD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 No

    MULTD F0 F2 F4 3 Load3 No

    SUBD F8 F6 F2 4 7 8

    DIVD F10 F0 F6 5

    ADDD F6 F8 F2 6 10 11

    Reservation Stations: S1 S2 RS RS Time Name Busy Op Vj Vk Qj Qk

    Add1 No

    Add2 No

    Add3 No

    3 Mult1 Yes MULTD M(A2) R(F4)

    Mult2 Yes DIVD M(A1) Mult1

    Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

    12 FU Mult1 M(A2) (M-M+ (M-M) Mult2

  • 7/29/2019 Instruction Level Parallelism2

    51/56

    Tomasulo Example Cycle 13Instruction status: Exec Write

    Instruction j k Issue Comp Result Busy Address

    LD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 No

    MULTD F0 F2 F4 3 Load3 No

    SUBD F8 F6 F2 4 7 8

    DIVD F10 F0 F6 5

    ADDD F6 F8 F2 6 10 11

    Reservation Stations: S1 S2 RS RS

    Time Name Busy Op Vj Vk Qj Qk Add1 No

    Add2 No

    Add3 No

    2 Mult1 Yes MULTD M(A2) R(F4)

    Mult2 Yes DIVD M(A1) Mult1

    Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

    13 FU Mult1 M(A2) (M-M+ (M-M) Mult2

  • 7/29/2019 Instruction Level Parallelism2

    52/56

    Tomasulo Example Cycle 14Instruction status: Exec Write

    Instruction j k Issue Comp Result Busy Address

    LD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 No

    MULTD F0 F2 F4 3 Load3 No

    SUBD F8 F6 F2 4 7 8

    DIVD F10 F0 F6 5

    ADDD F6 F8 F2 6 10 11

    Reservation Stations: S1 S2 RS RS

    Time Name Busy Op Vj Vk Qj Qk Add1 No

    Add2 No

    Add3 No

    1 Mult1 Yes MULTD M(A2) R(F4)

    Mult2 Yes DIVD M(A1) Mult1

    Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

    14 FU Mult1 M(A2) (M-M+ (M-M) Mult2

  • 7/29/2019 Instruction Level Parallelism2

    53/56

    Tomasulo Example Cycle 15Instruction status: Exec Write

    Instruction j k Issue Comp Result Busy Address

    LD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 No

    MULTD F0 F2 F4 3 15 Load3 No

    SUBD F8 F6 F2 4 7 8

    DIVD F10 F0 F6 5

    ADDD F6 F8 F2 6 10 11

    Reservation Stations: S1 S2 RS RS

    Time Name Busy Op Vj Vk Qj Qk Add1 No

    Add2 No

    Add3 No

    0 Mult1 Yes MULTD M(A2) R(F4)

    Mult2 Yes DIVD M(A1) Mult1

    Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

    15 FU Mult1 M(A2) (M-M+ (M-M) Mult2

  • 7/29/2019 Instruction Level Parallelism2

    54/56

    Tomasulo Example Cycle 16Instruction status: Exec Write

    Instruction j k Issue Comp Result Busy Address

    LD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 No

    MULTD F0 F2 F4 3 15 16 Load3 No

    SUBD F8 F6 F2 4 7 8

    DIVD F10 F0 F6 5

    ADDD F6 F8 F2 6 10 11

    Reservation Stations: S1 S2 RS RS

    Time Name Busy Op Vj Vk Qj Qk Add1 No

    Add2 No

    Add3 No

    Mult1 No

    40 Mult2 Yes DIVD M*F4 M(A1)

    Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

    16 FU M*F4 M(A2) (M-M+ (M-M) Mult2

  • 7/29/2019 Instruction Level Parallelism2

    55/56

    Faster than light computation

    (skip a couple of cycles)

    l l l

  • 7/29/2019 Instruction Level Parallelism2

    56/56

    Tomasulo Example Cycle 55Instruction status: Exec Write

    Instruction j k Issue Comp Result Busy Address

    LD F6 34+ R2 1 3 4 Load1 No

    LD F2 45+ R3 2 4 5 Load2 No

    MULTD F0 F2 F4 3 15 16 Load3 No

    SUBD F8 F6 F2 4 7 8

    DIVD F10 F0 F6 5

    ADDD F6 F8 F2 6 10 11

    Reservation Stations: S1 S2 RS RS

    Time Name Busy Op Vj Vk Qj Qk Add1 No

    Add2 No

    Add3 No

    Mult1 No

    1 Mult2 Yes DIVD M*F4 M(A1)

    Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

    55 FU M*F4 M(A2) (M-M+ (M-M) Mult2