p6: dynamic execution - clemson university

41
Dynamic Execution Explanation Step 1 The key architectural feature of the P6 processor is Dynamic Execution -- a unique combination of speculative execution, multiple branch prediction, and data-flow analysis. G-Number P6: Dynamic Execution P6: Dynamic Execution A Unique A Unique Combination Of: Combination Of: Speculative Speculative Execution Execution Multiple Branch Multiple Branch Prediction Prediction Data-flow Analysis Data-flow Analysis ©1995, Intel Corporation ©1995, Intel Corporation External Bus MOB IEU MIU AGU FEU ROB RRF BTB BIU IFU I D MIS RAT R S L2 DCU

Upload: others

Post on 12-Mar-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Dynamic Execution Explanation

Step 1

The key architectural feature of the P6 processor is Dynamic Execution -- a uniquecombination of speculative execution, multiple branch prediction, and data-flow analysis.

G-Number

P6: Dynamic ExecutionP6: Dynamic ExecutionA UniqueA UniqueCombination Of:Combination Of:

llSpeculative Speculative ExecutionExecution

llMultiple BranchMultiple BranchPredictionPrediction

llData-flow AnalysisData-flow Analysis

©1995, Intel Corporation©1995, Intel Corporation

External Bus

MOB

IEU

MIU

AGU

FEU

ROB RRF

BTB

BIU

IFU

ID MIS

RAT

RS

L2

DCU

Dynamic Execution Explanation

Step 2

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionBeforeBefore

Clock = 0Instructions = 0

3

5

44

8

11

12

4

46

910

4243

7

47

45

6

PentiumPentium®®

ProcessorProcessor

timingtiming

To show the effects of Dynamic Execution, we show a BEFORE and AFTER scenario.

The Pentium® processor is an excellent example of a high performance, superscalar(level 2) CPU, so I chose it as a basis for comparison (the BEFORE).

The diagram lists a set of instructions on the verticle axis -- note they start at #1, there is asubroutine call at #6 to #42, this returns to #7. What the instructions actually are is notimportant to this explanation.

The horizontal axis is time, the vertical dotted lines are CPU clocks.

There is a tally counter in the bottom right corner that keps track of the number of clockswe have used and the number of instructions completed. (Completed is important,starting an instruction is interesting but this needs to complete to be counted.)

Dynamic Execution Explanation

Step 3

The first clock:

The Pentium® processor started and completed TWO instructions in this clock.Remember the Pentium processor has two execution pipes, U and V, and both of theseinstructions happened to be single cycle.

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionBeforeBefore

Clock = 1Instructions = 2

3

5

44

8

11

12

4

46

910

4243

7

47

45

6

dualdual

executionexecution

unitsunits

Dynamic Execution Explanation

Step 4

On Clock 2, the Pentium® processor starts the next 2 instructions

Note that #3 was a load of data and that data was not in the cache -- that is, we have acache miss. (The instruction is colored black via a gradient shading to show this.) We willhave to go and get this data from external memory (and it’s going to take 4 clocks).

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionBeforeBefore

Clock = 2Instructions = 2

3

5

44

8

11

12

4

46

910

4243

7

47

45

6

cache miss cache miss

instructioninstruction

#3#3

Dynamic Execution Explanation

Step 5

On Clock 3, instruction 4 completed but we are still waiting for the cache miss on #3.

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionBeforeBefore

Clock = 3Instructions = 3

3

5

44

8

11

12

4

46

910

4243

7

47

45

6

.. wait .... wait ..

Dynamic Execution Explanation

Step 6

On Clock 4 we’re still waiting.

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionBeforeBefore

Clock = 4Instructions = 3

3

5

44

8

11

12

4

46

910

4243

7

47

45

6

.. wait .... wait ..

Dynamic Execution Explanation

Step 7

Data is returned from memory and #3 completes.

I classify the “black” area as wasted time since no forward progress was made with theprogram -- the CPU was STALLED waiting for main memory, or a Level 2 cache, torespond. If the data had been resident in the L1 cache, then this instruction would havecompleted in 1 clock.

There are, of course, technical reasons why the L1 cache cannot be made MUCH biggerto reduce this miss rate. Increasing the L1 size also has a dramatic effect on die sizewhich, in turn, impacts the cost of the component.

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionBeforeBefore

Clock = 5Instructions = 4

3

5

44

8

11

12

4

46

910

4243

7

47

45

6

�wasted��wasted�

clocks dueclocks due

to missto miss

Dynamic Execution Explanation

Step 8

In this clock, the Pentium® processor started, and completed, two instructions.

#6 is a branch that the Pentium processor predicted would transfer control to instruction42. The processor’s fetch unit is double buffered so instructions at target location (#42)have already been fetched.

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionBeforeBefore

Clock = 6Instructions = 6

3

5

44

8

11

12

4

46

910

4243

7

47

45

6

2 instructions2 instructions

include a include a

BranchBranch

Dynamic Execution Explanation

Step 9

The correct preduction and the prefetching allows the Pentium® processor to start twoinstructions at #42 and #43 in Clock 7.

There was no penalty for the transfer of control which the processor correctly predicted.

We’re at 7 clocks and have completed 8 instructions -- that’s better than 1 insruction perclock (IPC).

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionBeforeBefore

Clock = 7Instructions = 8

3

5

44

8

11

12

4

46

910

4243

7

47

45

6

Branch atBranch at

#6 predicted#6 predicted

No penaltyNo penalty

Dynamic Execution Explanation

Step 10

Inside the subroutine on Clock 8 the processor starts two instructions.

#44 is a single cycle instruction.

#45 is an L1 cache miss, the processor stalls again while the core waits and while the businterface unit reads the data from main memory or the L2 cache.

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionBeforeBefore

Clock = 8Instructions = 9

3

5

44

8

11

12

4

46

910

4243

7

47

45

6

AnotherAnother

cache misscache miss

at #45at #45

Dynamic Execution Explanation

Step 11

Waiting again.

Notice that the speed of the memory subsystem is having a LARGE impact on the PP’sinstruction execution rate.

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionBeforeBefore

Clock = 9Instructions = 9

3

5

44

8

11

12

4

46

910

4243

7

47

45

6

.. wait .... wait ..

Dynamic Execution Explanation

Step 12

Waiting again.

The IPC has now dropped below 1.

The processor’s core is stalled. It cannot do any useful work as it is waiting for data.

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionBeforeBefore

Clock = 10Instructions = 9

3

5

44

8

11

12

4

46

910

4243

7

47

45

6

.. wait .... wait ..

Dynamic Execution Explanation

Step 13

#45 completes, we can move on

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionBeforeBefore

Clock = 11Instructions = 10

3

5

44

8

11

12

4

46

910

4243

7

47

45

6

DoneDone

Dynamic Execution Explanation

Step 14

Two instructions are started again.

#46 is a single cycle operation.

#47 is a RETURN and the Pentium® processor correctly predicts the return address.

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionBeforeBefore

Clock = 12Instructions = 12

3

5

44

8

11

12

4

46

910

4243

7

47

45

6

2 instructions2 instructions

include ainclude a

BranchBranch

Dynamic Execution Explanation

Step 15

We return to the main program and #7 is started with no delay.

#7 is not pairable with #8, so the Pentium® processor only starts #7.

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionBeforeBefore

Clock = 13Instructions = 12

3

5

44

8

11

12

4

46

910

4243

7

47

45

6

BranchBranch

predictedpredicted

No PenaltyNo Penalty

ThisThis

instructioninstruction

not pairablenot pairable

Dynamic Execution Explanation

Step 16

#7 is a two cycle instruction which now completes.

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionBeforeBefore

Clock = 14Instructions = 13

3

5

44

8

11

12

4

46

910

4243

7

47

45

6

ThisThis

instructioninstruction

takes 2 clockstakes 2 clocks

ThisThis

instructioninstruction

not pairablenot pairable

Dynamic Execution Explanation

Step 17

The processor starts #8 and #9 in this clock.

#8 is single cycle.

#9 is another cache miss, so we have to wait again.

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionBeforeBefore

Clock = 15Instructions = 14

3

5

44

8

11

12

4

46

910

4243

7

47

45

6

AnotherAnother

cache miss!cache miss!

Dynamic Execution Explanation

Step 18

Waiting for data once more.

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionBeforeBefore

Clock = 16Instructions = 14

3

5

44

8

11

12

4

46

910

4243

7

47

45

6

.. wait .... wait ..

Dynamic Execution Explanation

Step 19

And waiting.

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionBeforeBefore

Clock = 17Instructions = 14

3

5

44

8

11

12

4

46

910

4243

7

47

45

6

.. wait .... wait ..

Dynamic Execution Explanation

Step 20

Done. Now let’s move forward.

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionBeforeBefore

Clock = 18Instructions = 15

3

5

44

8

11

12

4

46

910

4243

7

47

45

6

DoneDone

Dynamic Execution Explanation

Step 21

The Pentium® processor started and completed #10 & #11 to complete this sequence of17 instructions in 19 clocks.

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionBeforeBefore

Clock = 19Instructions = 17

3

5

44

8

11

12

4

46

910

4243

7

47

45

6

2 instructions2 instructions

Dynamic Execution Explanation

Step 22

Pentium® processor summary:

Once the processor had the data, it did great work (shown in green) typically executing 2instructions per clock (superscalar, using U & V pipes).

But a lot of time was spent waiting for data (shown in black) and this reduced our averageIPC to less than 1.

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionBeforeBefore

Clock = 19Instructions = 17

3

5

44

8

11

12

4

46

910

4243

7

47

45

6

17 instructions17 instructions

inin19 clocks19 clocks

useful workuseful work

�wasted� cycles�wasted� cycles

Dynamic Execution Explanation

Step 23

Now let’s look at how the P6 would execute this same set of instructions.

G-Number

Effects of Dynamic ExecutionEffects of Dynamic Execution

3

5

44

8

11

12

4

46

910

4243

7

47

45

6

How wouldHow would

the P6the P6

execute this?execute this?

Dynamic Execution Explanation

Step 24

The P6 looks at all prefetched instructions and determines if there are data dependancies-- for example, #3 requires the result of #1 so #3 cannot execute until #1 has completed.Similarly #5 cannot run until #4 has completed.

Why worry about this now? Why wasn’t this a concern with the Pentium® processor?The Pentium processor executes instructions strictly in the original program order (thereare some pairing issues with U & V pipes) so instruction dependancies were not a concernwith Pentium processor. In contrast, we will see that the P6 is going to start manyinstructions and instruction dependancies will be a key concern.

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionAfterAfter

Clock = 0Instructions = 0

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA3

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

5

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA

44

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

8

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

11

12

4

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA46

910

4243

7

47

45

6

DependentDependent

instructionsinstructions

identified atidentified at

run timerun time

Dynamic Execution Explanation

Step 25

On the 1st clock, the P6 starts 3 instructions -- it is superscalar level 3.

#1 and #2 are single cycle instructions and they start AND complete in this cycle.

#3 cannot start since it is dependant upon the result of #1, so the P6 also starts #4.

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionAfterAfter

Clock = 1Instructions = 2

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA3

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

5

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA

44

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

8

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

11

12

4

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA46

910

4243

7

47

45

6

3 instructions3 instructions

started,started,

2 completed2 completed

Dynamic Execution Explanation

Step 26

On the second clock the P6 starts 3 more instructions -- one of them is #3 which missesthe cache again (as in did in the Pentium® processor scenario). We must wait for thememory access again.

Instruction 4 completes but the P6 cannot use the result yet since it must retireinstructions in the same order that the program was originally written. So the P6 storesthe result in temporary storage. We have speculatively executed #4.

#5 did not start since it is dependent upon the result of #4.

#6 started and completed -- this was a branch to the subroutine at #42. #42 was alsostarted and completed.

Instead of waiting for the memory access of #3, the P6 went off and did useful work bystarting instructions after #3 and stores their results in temporary on-chip registers.

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionAfterAfter

Clock = 2Instructions = 2

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA3

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

5

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA

44

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

8

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

11

12

4

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA46

910

4243

7

47

45

6

That cacheThat cache

miss again!miss again!

but 3 startedbut 3 started

SpeculatingSpeculating

past thepast the

BranchBranch

Dynamic Execution Explanation

Step 27

It’s Clock 3 and we’re still waiting on that first cache miss.

We start 3 more instructions in this clock too: #5, #43 and #45.

Notice that #45 is also a cache miss -- the P6 L2 cache is fully pipelined and can supportfour concurrent accesses. This means that we do not have to wait until the cache miss at#3 is serviced -- we can service the cache miss at #45 in parallel.

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionAfterAfter

Clock = 3Instructions = 2

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA3

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

5

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA

44

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

8

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

11

12

4

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA46

910

4243

7

47

45

6

SecondSecond

cache misscache miss

startedstarted

Dynamic Execution Explanation

Step 28

It’s Clock 4 and we’re still waiting on that first cache miss. We start 3 more instructions inthis clock, too: #44, #47 and #7.

Notice that #47 is another branch (return from the subroutine). The front end of the P6has predicted that we’ll return to #7 and, indeed, we start #7 in this clock.

While the “back-end” of the P6 is waiting for #3 to retire, the “front end” has made thesubroutine call and has come back! This is typical of how the P6 speculatively executeswell ahead of the current program counter.

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionAfterAfter

Clock = 4Instructions = 2

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA3

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

5

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA

44

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

8

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

11

12

4

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA46

910

4243

7

47

45

6

SpeculatingSpeculating

past thepast the

2nd Branch2nd Branch

Dynamic Execution Explanation

Step 29

It’s Clock 5 and that cache miss at #3 is finally satisfied, so the P6 retires #3 and #4 and#5. The P6 can retire 3 instructions/clock.

The P6 also starts 3 more instructions, #9, #10 and one > 11. Note that #9 is anothercache miss and that, once again, we are starting the memory access early so the data willbe ready when we need it later.

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionAfterAfter

Clock = 5Instructions = 5

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA3

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

5

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA

44

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

8

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

11

12

4

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA46

910

4243

7

47

45

6

3 instructions3 instructions

completedcompleted

and 3 startedand 3 started

Dynamic Execution Explanation

Step 30

We’re now at clock 6 and the P6 is “on-a-roll” -- it’s starting 3 instructions per clock andcompleting three instructions per clock. These instructions are unrelated to each other.The P6 “front end” is finding three new instructions per clock and the “back end” is retiringthree of the speculatively executed instructions per clock.

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionAfterAfter

Clock = 6Instructions = 8

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA3

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

5

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA

44

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

8

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

11

12

4

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA46

910

4243

7

47

45

6

3 instructions3 instructions

completedcompleted

and 3 startedand 3 started

Dynamic Execution Explanation

Step 31

Clock 7 and, once again, we completed three instructions and retired three instructions inthe same clock.

The core is observing all instruction data dependancies and is executing instructions toget maximum work done.

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionAfterAfter

Clock = 7Instructions = 11

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA3

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

5

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA

44

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

8

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

11

12

4

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA46

910

4243

7

47

45

6

3 instructions3 instructions

completedcompleted

and 3 startedand 3 started

Dynamic Execution Explanation

Step 32

Another typical clock cycle with 3 instructions completed and 3 instructions started.

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionAfterAfter

Clock = 8Instructions = 14

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA3

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

5

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA

44

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

8

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

11

12

4

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA46

910

4243

7

47

45

6

3 instructions3 instructions

completedcompleted

and 3 startedand 3 started

Dynamic Execution Explanation

Step 33

Another 3 instructions completed and 3 started. Note that #11 started and completed inthis clock.

We’ve completed the 17 instructions (and have 10 others speculatively executed ready toretire in upcoming clocks) in only 9 clocks.

This is more than two times the instructions per clock relative to the Pentium® processorfor this example.

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionAfterAfter

Clock = 9Instructions = 17

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA3

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

5

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA

44

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

8

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

11

12

4

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA46

910

4243

7

47

45

6

3 instructions3 instructions

completedcompleted

and 3 startedand 3 started

Dynamic Execution Explanation

Step 34

Summary of how the 17 instructions were executed in the P6:

Note that the order that the instructions were started in is mainly a function of the datadependancies within the instruction stream. (This also depends upon availability ofinternal resources.) Instructions are always retired in original program order, maintainingfull compatibility.

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionAfterAfter

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA3

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

5

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA

44

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

8

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

11

12

4

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA46

910

4243

7

47

45

6

17 instructions17 instructions

completedcompleted

in 9 clocksin 9 clocks

Dynamic Execution Explanation

Step 35

Recap of the 3 main features of Dynamic Execution:

-Multiple Branch Predition

-Data Flow Analysis

-Speculative Execution

Rather than waiting the P6 speculatively executes instructions after the one that’s stalled.That is, rather than doing nothing, the P6 goes and starts some other work since we’llneed it anyway. The P6 ensures that data dependancies are observed (i.e. cannotarbitrarily execute any instructions), ensuring that it is doing the right work.

The P6 must also be able to easily unravel this speculative work done in case an interrupt,mis-predicted branch, page-fault or debug trap occurs.

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionAfterAfter

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA3

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

5

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA

44

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

8

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

11

12

4

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA46

910

4243

7

47

45

6

Instead of waitingInstead of waiting

at #3, P6at #3, P6

speculatively executed

speculatively executed

other instructions

other instructions

Dynamic Execution Explanation

Step 36

The red boxes (green on the previous slide) indicate those that are speculativelyexecuted.

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionAfterAfter

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA3

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

5

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA

44

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

8

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

11

12

4

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA46

910

4243

7

47

45

6

Instead of waitingInstead of waiting

at #3, P6at #3, P6

speculatively executed

speculatively executed

other instructions

other instructions

Dynamic Execution Explanation

Step 37

The “front end” of the P6 must aggressively get work (i.e. a large array of instructions) forthe core to choose from.

On average, there is a branch every 5 to 7 instructions, so the P6 must be able toaccurately preduct multiple branches.

The P6 contains a large Branch Target Buffer which uses both a static algorithm and ahistory based algorithm to predict branches correctly over 90% of the time.

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionAfterAfter

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA3

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

5

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA

44

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

8

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

11

12

4

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA46

910

4243

7

47

45

6

Branch PredictionBranch Prediction

operates over operates over

MULTIPLEMULTIPLE

BranchesBranches

Dynamic Execution Explanation

Step 38

The pink arrows and instructions above highlight the P6’s branch prediction over multiplebranches.

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionAfterAfter

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA3

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

5

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA

44

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

8

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

11

12

4

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA46

910

4243

7

47

45

6

Branch PredictionBranch Prediction

operates over operates over

MULTIPLEMULTIPLE

BranchesBranches

Dynamic Execution Explanation

Step 39

Again, the small red arrows indicate the P6’s analysis of data dependencies and data flowto determine the most efficient order in which to execute instructions.

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionAfterAfter

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA3

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

5

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA

44

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

8

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

11

12

4

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA46

910

4243

7

47

45

6

P6 Analyzed theP6 Analyzed the

Data Dependencies Data Dependencies &&

Data Flow Data Flow to to

Optimize ExecutionOptimize Execution

Dynamic Execution Explanation

Step 40

The benefit of this is that the P6 can make forward progress at multiple points within aprogram flow (compared with the Pentium® processor that only has a single point offorward progress).

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionAfterAfter

3

5

44

8

11

12

4

46

910

4243

7

47

45

6

P6 Analyzed theP6 Analyzed the

Data Dependencies Data Dependencies &&

Data Flow Data Flow to to

Optimize ExecutionOptimize Execution

Dynamic Execution Explanation

Step 41

In sum, Dynamic Execution is the unique combination of:

Improved Branch Prediction (get lots of work to do)

Speculative Execution (ability to execute in any order)

and

Data Flow Analysis (choose the best order)

G-Number

Effects of Dynamic ExecutionEffects of Dynamic ExecutionAfterAfter

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA3

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

5

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA

44

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

8

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAA

11

12

4

AAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAA46

910

4243

7

47

45

6

The InnovativeThe Innovative

Combination of these

Combination of these

ArchitecturalArchitectural

Enhancements is calledEnhancements is called

Dynamic ExecutionDynamic Execution