computer architecture m - unibo.it architectures... · it detects the instruction boundaries within...

32
1 P6 Architecture Computer architecture M

Upload: others

Post on 29-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computer architecture M - unibo.it Architectures... · It detects the instruction boundaries within a 16 byte block (half cache line). In the IFU2 any conditional BRANCH address is

1

P6 Architecture

Computer architecture M

Page 2: Computer architecture M - unibo.it Architectures... · It detects the instruction boundaries within a 16 byte block (half cache line). In the IFU2 any conditional BRANCH address is

2

PIPELINE

Between the three main sections compensation queues are inserted. Themachine instructions are rotated in order to align them to the decoders.Superpipelined processor (number of stages greater than necessary inorder to increase the clock frequency)

Variablenumberof clocks

IFU1

IFU2

IFU3

DEC1

DEC2

ROB

DIS

EX

RET1

RET2

BUS interfacemanagement

(in order)

Executionmechanism

(Out-Of-Order)

Results handling(in order)

8 clocks

Dispatcher(issues the

u-ops- Risc type )

RAT

RAT = Register Allocation TableROB = ReOrder Buffer

Renaming

Page 3: Computer architecture M - unibo.it Architectures... · It detects the instruction boundaries within a 16 byte block (half cache line). In the IFU2 any conditional BRANCH address is

3

Behaviour

• In order results transfers to the machine registers(commitment)

• Instruction extraction from the prefetch queue (a small setof instructions already extracted from the cache )

• Instruction decoding and alignment (in order)

• Machine instructions translation into RISC µ-operations (µ-ops) –fixed lenght 118 bit (RISC - in order)

• µ-operations insertion in the ROB (in order)

• Out-of-order µ-operations execution for functional modules use- optimization

Page 4: Computer architecture M - unibo.it Architectures... · It detects the instruction boundaries within a 16 byte block (half cache line). In the IFU2 any conditional BRANCH address is

4

Pipeline stages

RAT: (Register AllocationTable) 40 more registers which can beglobally allocated

IFU1: (Instruction Fetch Stage 1) loads the 2x16=32 bytes buffer (a cacheline) directly from L1 cache. While one buffer transfers data toIFU2 the other is loaded by L1

IFU2: (Instruction Fetch Stage 2) detects the instructions boundaries(CISC) for the IFU3. If a branch is detected it is forwarded to theBTB

IFU3: (Instruction Fetch Stage 3) sends the instructions to theappropriate decoders (see later)

DEC1: (Decoder Stage 1) transforms the machine instructions into µ-operation (118 bit wide). Up to three IA32 instructions per clockcan be processed. For very complex machine instructions asequencer is used. The µ-operations consist of two sources and onedestination plus op-code (RISC)

DEC2: (Decoder Stage 2) transfers the µ-operations to the decodedinstruction queue. Sometimes for very complex instructions (forinstnce string instructions) many clocks are requested to completethe operation since the µ-instruction queue accepts up to 3 elementsper clock. Micro Instruction Sequencer. It includes a second BTB(static – see later)

Page 5: Computer architecture M - unibo.it Architectures... · It detects the instruction boundaries within a 16 byte block (half cache line). In the IFU2 any conditional BRANCH address is

5

Pipeline stages

RET2: (RETirement Stage 2). It transfers the results to the architecturalmachine destination registers when all the preceding machine levelinstructions have been already committed. Up to 3 µ-ops per clock areretired

ROB: loads three µ-operations per clock into its buffer. If all µ-operationsrequired data are already available (produced by preceding ROB µ-operations or already available in the machine registers) and a free slotin the RS queue (Reservation Station of the required functional unit) theµ-operation is inserted (here the RS is different from Tomasulo’s. Here inthe RS only ready µ-operations that is the required operands are alreadyavailable).

DIS: (DISpatch Stage) if the µ-operations in the previous clocks were notinserted into the RS because of lack of the necessary data or slots, insertsthe µ-operation as soon as the required conditions are met

EX: (EXecution Stage) executes the µ-operation. The number of clocksnecessary depends on the µ-operation. Several µ-operations are executedin a single clock period. Functional modules

RET1: (RETirement Stage 1). When a µ-operation has been executed and allthe preceding conditional branches have been solved, attaches a ready-for-retirement tag to the µ-operation

Page 6: Computer architecture M - unibo.it Architectures... · It detects the instruction boundaries within a 16 byte block (half cache line). In the IFU2 any conditional BRANCH address is

6

IFU1-IFU2 stages

It transfers a 32 bytes line from the L1 cache to the prefetch queue

IFU1

IFU2

It detects the instruction boundaries within a 16 byte block (half cache line). Inthe IFU2 any conditional BRANCH address is forwarded to the BTB (physicaladdresses!). Up to 4 addresses can be in parallel analyzed by the BTB. Initiallythe BTB is obviously empty and for each decision taken the BTB is updated.

If the branch is predicted as taken the following instructions loaded in theprefetch buffer are removed and the buffer is loaded again with the destinationinstructions. If the branch is predicted as not taken no change

During the branch execution in the Jump Execution Unit no problem if thebranch was correctly predicted, otherwise all following ROB u-ops are cancelledtogether with their results. The same occurs to all other instructions already inthe pipeline. The prefetch buffer is emptied and loaded again with the correctinstruction sequence.

Page 7: Computer architecture M - unibo.it Architectures... · It detects the instruction boundaries within a 16 byte block (half cache line). In the IFU2 any conditional BRANCH address is

7

Branch

A further buffer exists in the P6 (the Return Stack Buffer) whichstores the return addresses of the speculated subroutines. When acall is speculated (executed before beeing top of the instructionqueue) it is not yet sure whether it must be really executed since aprevious branch could change the instruction flow. In this case thestack would have been «corrupted». The content of the RSB aretransferred to the real stack as soon as the call is actually executed.The RSB consists of 8 entries

The BTB is made of a 4-ways set-associative cache with 512entries (for each index there are 4 physical branch addresseswhich are handled)

The prediction algorithm is two-levels: for each BTB entry there is a4 bit register which stores the behaviours of the last occurrences ofthe address (BHT).

Page 8: Computer architecture M - unibo.it Architectures... · It detects the instruction boundaries within a 16 byte block (half cache line). In the IFU2 any conditional BRANCH address is

8

Pipeline

IFU1

IFU2

IFU3

DEC1

DEC2

RAT

ROB

DIS

EX

RET1

RET2

PrefetchBuffer

Instructionlenght

detection

BranchTargetBuffer

Decoder

Decoderqueue

Functionally this pipeline is triple

OUTOF

ORD.

INORD.

||6 Up to 6µ−ops/clock

INORD.

In the ROB the µ−ops are storedin order, are executed OOO, areretired in order

(alignment for the decodimg)

Compensation queuesare needed for different

stages speed

Page 9: Computer architecture M - unibo.it Architectures... · It detects the instruction boundaries within a 16 byte block (half cache line). In the IFU2 any conditional BRANCH address is

9

IFU3 stage

• If there are two ( o more) «complex» instructions the compilergenerated instruction sequence is not optimal and the operations takeplace in sequence

Instructions types

• Simple (converted into a single m-operation): register to register,memory read , etc.

• Complex-2 (converted into 2 m-operations): memory write, read/modify,register-memory (sometimes requiring 3 m-operations)

• Complex-3: MMX

• Complex-4: read/modify/write (ex. add [BP], bx)

• It prepares the instructions for the three decoders of stage DEC1

• Using the «markers» inserted into the 16 bytes block by IFU2, IFU3rotates, if needed, the three IA instructions so as to aligne them for thenext stage

• If the three instructions are «simple» no rotation is needed and theyare forwarded to the three decoders with no intervention

• If in the three instructions there is one «complex» and two «simple» arotation takes place so as to align the «complex» to decoder 0

Page 10: Computer architecture M - unibo.it Architectures... · It detects the instruction boundaries within a 16 byte block (half cache line). In the IFU2 any conditional BRANCH address is

10

Decoding

Fetch andAligningIFU1-IFU2-IFU3

16 bytes

RAT

3x118 bits

ROB: in the Pentium II40 slots: loaded with 3 u-ops max per clockROB

3x118 bits

MicroInstructionSequencer

MIS

(4+1+1 = 6) x 118 bits

DEC1

Decoder 1simple

Decoder 2simple

Decoder 0complex

RS 1 RS 2 RS 3 RS 4 RS 520 µ-ops queue

for the ResStations

From the RS to the FU

decodedµ-operations

queue(up to 6 µ−ops)

DEC2

Page 11: Computer architecture M - unibo.it Architectures... · It detects the instruction boundaries within a 16 byte block (half cache line). In the IFU2 any conditional BRANCH address is

11

DEC1 and DEC2 Stages

DEC1

Decoder 0complex

decodes IAinstructions

into1-4 µ-ops

Decoder 1simple

decodes IAinstructions

into1 µ-op

Decoder 2simple

decodes IAinstructions

into1 µ-op

• If the decoded instruction is a JMP the instruction queue is immediatelyemptied and reloaded

• The µ-ops are queued in the same order as they were produced. Thequeue has 6 slots

• The decoder 0 is able to convert in a single clock a complex instruction notlonger than 7 bytes generating max 4 µ-operations

• Decoders 1 e 2 are able to convert in a single clock a «simple» instructionnot longer than 7 bytes generating max 1 µ-operation

• Up to 6 µ-operations per clock can be generated• In all other cases MIS The Micro Instruction Sequences is a ROM which stores

the µ-operations associated to each complex IA instruction which cannot bedecoded in a single clock period.

• The generated sequences (max 6 µ-ops per clock) are directly fed into stageDEC2

DEC2• The static BTB (see next slide) is activated if among the µ-operations of

the preceding clock there is a µ-op branch not handled by the dynamicBTB (not detected as branch – it must noticed that here the instructionsare already RISC type: two sources and one destination !!!)

Page 12: Computer architecture M - unibo.it Architectures... · It detects the instruction boundaries within a 16 byte block (half cache line). In the IFU2 any conditional BRANCH address is

12

Static BTB

The P6 uses a static BTB in the stage DEC2 (the stage which decodesthe opcode of the µ-ops). It handles the branches not present in thedynamic BTB. It is “static” because uses static rules not depending onthe previous instruction history.

IP relative ?

Conditional ?

Back ?

no - takenyes

yesno

taken

no

taken

yes

not taken

The static prediction includes the destination address evaluation too

Page 13: Computer architecture M - unibo.it Architectures... · It detects the instruction boundaries within a 16 byte block (half cache line). In the IFU2 any conditional BRANCH address is

13

RAT stage

EAXEBXECXEDXESIEDIESPEBP

012.............................39

RAT

Register Allocation Table(Register Renaming)

Page 14: Computer architecture M - unibo.it Architectures... · It detects the instruction boundaries within a 16 byte block (half cache line). In the IFU2 any conditional BRANCH address is

14

ROB stage

NB:Very often the «ports» are common to many functional units. Theports are the busses which link – for instance – the ROB withthe FU and require always a lot of space in the IC

• The µ-ops with the registers renamed in the RAT stage are stored inorder three per clock in the ReOrdering Buffer which has 40 slots(much more in the modern processors which however derive from theP6 architecture)

• The Reservation Station (the unity which handles the functional unitsavailability) extracts up to 5 µ-ops per clock from the ROB (there are5 ports – busses toward the RS) storing them in a buffer with 20 slots(subdivided per FU) whence they are extracted to be forwarded tothe exec units

• After the execution the µ-ops are stored back into the ROB togetherwith the results. In the ROB there are two pointers : one for the«oldest» µ-ops not yet retired and one for the first free slot (if any)where to store the new µ-ops

• The µ-ops are “committed” always three at a time in order. Thisentails that no µ-ops is comitted before a preceding branch has notbeen solved.

• The ROB can be viewed as a “ 40 instructions window”

Page 15: Computer architecture M - unibo.it Architectures... · It detects the instruction boundaries within a 16 byte block (half cache line). In the IFU2 any conditional BRANCH address is

15

EX Stage(5 functional units only)

JMP xxxx

LoadUnit

StoreAddress

Unit

StoreDataUnit

PORT2 PORT3 PORT4

Jumpexecution

Unit

FP UnitFP Unit

IntegerUnit 1

PORT1PORT 0

ReorderBufferROB

ReservationStation

RS (20 slots)

Mov EAX, Mem

Mov Mem,EAX

INC EAX

FMUL ST0FDIV ST1

5 µ-ops

Typicalìnstructions

Same portSame port

Page 16: Computer architecture M - unibo.it Architectures... · It detects the instruction boundaries within a 16 byte block (half cache line). In the IFU2 any conditional BRANCH address is

16

Instructions and µ-ops execution

16 bytes

16 bytes

012

Prefetch 012345

ID queue

012

RAT,ROB

StatusMemoryaddress µ-op op-code renamed

registers (RAT)0

39

ROB (actually the size depends on the processor)

MISMIS

IFU1,IFU2,IFU33 CK

Decoders

DEC1, DEC22 CK 2 CK

Memory address of the first correspondingIA instruction

Page 17: Computer architecture M - unibo.it Architectures... · It detects the instruction boundaries within a 16 byte block (half cache line). In the IFU2 any conditional BRANCH address is

17

µ-ops in the ROB

• It must be noticed that in case of exception a flag ininserted into the µ-op : the excepton is handled only whenthe µ-op is retired. All precedingµ-ops are retired (preciseinterrupt)

• µ-ops states in the ROB:

SD: scheduled for execution. The µ-op has been inserted in theRS queue but not yet sent to the FU

DP: dispatchable. It is in “pole position” in the EU queue EX: executed. It is being executed WB: write back. About to be rewritten in the ROB after the

execution. Unblocks other µ-ops stalled waiting for its result RR: ready for retirement. The µ-op can be retired RT: retired. The µ-op is being retired

• Memory address: it is the memory address of the first byte of theIA32 instruction corresponding to the µ-op(s). The address fied forthe following µ-ops is empty(a IA32 instruction can correspond tomany µ-ops). An address, therefore, signals a new IA32 instruction

• µ-op type: branch or not branch

• Allocation register: one of the 40 allocation registers

Page 18: Computer architecture M - unibo.it Architectures... · It detects the instruction boundaries within a 16 byte block (half cache line). In the IFU2 any conditional BRANCH address is

18

RESET

vvvvvvvviiiiiiii

iiiiiiiiiiiiiiii

PrefetchStreaming

Buffer (32 bytes)Decoders

ID queueRAT/ROB

StatusMemoryaddress

0

39

AL RESET

MIS

Jump 8 bytesv=valid code bytei=invalid bytes

N.B. The dynamic BTB is obviously unable to predict the branch

InitialJUMP

µ-op op-coderenamed

registers (RAT)

Page 19: Computer architecture M - unibo.it Architectures... · It detects the instruction boundaries within a 16 byte block (half cache line). In the IFU2 any conditional BRANCH address is

19

RESET –IFUi stages

vvvvvvvviiiiiiii

iiiiiiiiiiiiiiii

FFFFFFF:0FFFFFFFF:F

Prefetch StreamingBuffer (IFU1) (stores 32

bytes – a cache line)

NB Each clock a 32 bytes line is read by IFU1. In case of «pipelinetraffic jam», because of the decoders, the pipeline stalls

First instruction boundary

i: not signifcant bytes

Jump

• The first instruction is always a backward jump (instructionpresent in IFU1)

• In IFU2 the first instruction boundary is detected (8 bytes). In the remaining 24 bytes other not-signifcant instructions

• In IFU3 the first instruction is aligned to 0

Page 20: Computer architecture M - unibo.it Architectures... · It detects the instruction boundaries within a 16 byte block (half cache line). In the IFU2 any conditional BRANCH address is

20

RESET

JMP

PrefetchStreaming

buffer DecodersID queue

RAT/ROB

0

39

MIS

StatusMemoryaddress µ-op op-code

renamed registers (RAT)

Page 21: Computer architecture M - unibo.it Architectures... · It detects the instruction boundaries within a 16 byte block (half cache line). In the IFU2 any conditional BRANCH address is

21

RESET –DECi stages

• Instructions in the stages from IFU1 to DEC2 are emptied-.This provokes a stall in the pipeline which must reloadinstructions from the jump address. The µ-op is stored in thequeue of the decoded instructions

• The detected instructions are decoded by DEC1.

• DEC1 transforms the JMP in a jump µ-op (in P6 all jumpsare transformed in Branches Taken )

Page 22: Computer architecture M - unibo.it Architectures... · It detects the instruction boundaries within a 16 byte block (half cache line). In the IFU2 any conditional BRANCH address is

22

RESET

Branch µ-op

PrefetchStreaming

buffer DecodersID queue

RAT/ROB

0

39

RESET

MIS

StatusMemoryaddress µ-op op-code

renamed registers (RAT)

Page 23: Computer architecture M - unibo.it Architectures... · It detects the instruction boundaries within a 16 byte block (half cache line). In the IFU2 any conditional BRANCH address is

23

RESET – RAT stage

• The µ-op is extracted from the queue of the decodedinstructions (which still has the initial order) and inserted inthe RAT stage for possible register assignment (not used forbranch)

Page 24: Computer architecture M - unibo.it Architectures... · It detects the instruction boundaries within a 16 byte block (half cache line). In the IFU2 any conditional BRANCH address is

24

RESET

PrefetchStreaming

buffer DecodersID queue

Branch µ-op RAT/ROB

0

39

RESET

MIS

StatusMemoryaddress µ-op op-code

renamed registers (RAT)

Page 25: Computer architecture M - unibo.it Architectures... · It detects the instruction boundaries within a 16 byte block (half cache line). In the IFU2 any conditional BRANCH address is

25

RESET – ROB and RS stage

• From the ROB the µ-op is then sent to the RS queue (4x5 slots) assoon a slot for its FU is available. This operation can be done inparallel to the previous one if there are slots available. This is thecase of the first instruction at the RESET

• The µ-op is then sent to the first free ROB slot (normally three ofthem are trasferred in order to the ROB if there are free slots)

Page 26: Computer architecture M - unibo.it Architectures... · It detects the instruction boundaries within a 16 byte block (half cache line). In the IFU2 any conditional BRANCH address is

26

RESET

PrefetchStreaming

buffer DecodersID queue

RAT/ROB

FFFFFFFF0 branch µ-op none0

39

RESET

MIS

StatusMemoryaddress µ-op op-code

renamed registers (RAT)

Page 27: Computer architecture M - unibo.it Architectures... · It detects the instruction boundaries within a 16 byte block (half cache line). In the IFU2 any conditional BRANCH address is

27

RESET – execution and retirement

• Three µ-ops are retired in order bewteen them too per clock.

• The RS after a branch execution informs the BTB in order to updatethe prediction.)

• The µ-op after the execution is tagged as «executed» in the ROB.If a µ-op produces a result (typically a register value) for anotherµ-op (stalled) waiting for it, the waiting µ-op status becomes“ready” in the ROB and inserted in the RS as soon as a slot is free

Page 28: Computer architecture M - unibo.it Architectures... · It detects the instruction boundaries within a 16 byte block (half cache line). In the IFU2 any conditional BRANCH address is

28

020000042020000044020000045020000051

020000055020000057020000000

020000000020000001

02000000302000000402000000A02000000C02000000F

020000010020000014

02000001602000001B020000021020000025020000026

02000002C02000002F020000034

020000037

EXRREX

SD

DPRR

RTRTRTRRRRRREXWBRRRREXDPSDRRRRRRRRRRRRWBEXRRRRRRRRSDRR

ROB start

non-branch µ-op

non-branch µ-op

non-branch u-opnon-branch µ-op

branch µ−op

non-branch µ-opnon-branch µ-opnon-branch µ-op

non-branch µ-op

non-branch µ-op

non-branch u-opnon-branch µ-op

non-branch µ-opnon-branch µ-opnon-branch µ-op

non-branch µ-op

non-branch µ-op

non-branch u-opnon-branch µ-op

non-branch µ-opnon-branch µ-opnon-branch µ-op

non-branch µ-op

non-branch µ-op

non-branch u-opnon-branch µ-op

non-branch µ-opnon-branch µ-opnon-branch µ-op

non-branch µ-op

non-branch µ-opnon-branch µ-op

non-branch µ-opnon-branch µ-opnon-branch µ-op

Mem. Addr. Renamed registerStato µ-operation

123456789101112131415161718192021222324252627282930313233343536373839

0

Instructions execution

Page 29: Computer architecture M - unibo.it Architectures... · It detects the instruction boundaries within a 16 byte block (half cache line). In the IFU2 any conditional BRANCH address is

29

ROB – description (1)

19. This µ-op corresponds to the two bytes long IA instructionstarting at address 0200000A. It is now being executed and canlast more than a clock. At the end its status will be changedfrom EX to WB. It will be retired when Its execution is completedd The result is written in slot 19 All previous µ-ops in the slots 13 -18 have been already

retired

13. This is the ROB oldest µ-op which corresponds to IA instructionwhose first byte is at address 02000000 which will be retired togetherthe µ-ops of slots 14 and 15

14. This µ-op (together those in slots 15 and 16) corresponds to an IAinstruction 2 bytes long starting at address 02000001. It must benoticed that the 3 µ-ops related to the same IA instruction are NOTretired in the same clock. The address of the first byte of the followingIA instruction is 02000003 (slot 17)

15. See previous description (µ-op now retired)

16. See previous description (µ-op ready for retirement)

17. This µ-op corresponds to a IA instruction one byte long at address02000003. It is ready for retirement and will be retired with µ-ops inslots 16 and 18

18. This µ-op is the only one generated by the 6 bytes long IA addressat addresses 02000004-02000009. It is RR

Page 30: Computer architecture M - unibo.it Architectures... · It detects the instruction boundaries within a 16 byte block (half cache line). In the IFU2 any conditional BRANCH address is

30

ROB – description (2)

26. It derives again from the same IA instruction of the slots 24 and itis RR together with the µ-ops in the slots 25 and 27

…………………………………………………………….

20. This µ-op is the only one generated by the instruction at addresses0200000C-0200000E. Its execution is complete and the result is beingwritten in the slot 20 (status WB). The µ-op will be then RR but itwill be not retired until the µ-ops in the slots 19 and 21 are RR

21. This µ-op (similar to that of slot 22) corresponds to a single byteIA instruction at address 0200000F. It is RR but must wait for µ-ops in the slots 19 and 20.

22. Also this µ-op (similar to that of slot 21) corresponds to the samesingle byte IA instruction at address 0200000F. It will be retiredtogether with the µ-ops in the slots 23 and 24

23. This µ-op derives from IA instruction at addresses 02000010-02000013. It is still being executed (EX). After execution its statuswill be WB and afterwards it will become RR and retired togetherwith µ-ops in the slots 22 and 24 (when they will be RR)

24. This µ-op (as those of the slots 25 and 26) corresponds to the twobytes IA instruction starting at address 02000014. It is waiting forexecution and on the RS queue top (DP status). It will be retiredtogether with µ-ops in the slots 22 and 23

25. This µ-op derives form the same instruction of µ-op in the slot 24but its status is SD that is is already in the RS queue but not on top

Page 31: Computer architecture M - unibo.it Architectures... · It detects the instruction boundaries within a 16 byte block (half cache line). In the IFU2 any conditional BRANCH address is

31

ROB – description (3)

N.B. If the predction had been detected as incorrect the µ-op of the slot8 and all the following µ-ops would have been cancelled

……………………………………………………………………….

1. This µ-op derives form the one-byte IA instruction at address02000044. It is RR and will be retired together with µ-ops in theslots 0 and 2 as soon: The µ-ops of the slots 0 and 2 have completed their

execution and their results are in the slots 0 and 2 All µ-ops in the slots 13-39 have been already retiredThe µ-op in the slot 2 derives from IA instruction athexadecimal addresses 02000045-02000050 (12 bytes).

……………………………………………………………………6. This µ-op corresponds to the IA instruction at addresses

02000055-02000056

7. This µ-op is an already executed branch (RR status)corresponding to the IA instruction at address 02000057. It will beretired together with µ-ops of the slots 6 and 8. The branch waspredicted as taken and the prediction was detected as correctduring the execution, then ..

8. .. the µ-op of this slot derives from the iA instruction at address02000000 (branch destination address)

Page 32: Computer architecture M - unibo.it Architectures... · It detects the instruction boundaries within a 16 byte block (half cache line). In the IFU2 any conditional BRANCH address is

32

020000042020000044020000045020000051

020000055020000057020000000

02000000302000000402000000A02000000C02000000F

020000010020000014

02000001602000001B020000021020000025020000026

02000002C02000002F020000034

020000037

EXRREX

SD

DPRR

RRRRRREXWBRRRREXDPSDRRRRRRRRRRRRWBEXRRRRRRRRSDRR

ROB start

non-branch µ-op

non-branch µ-op

non-branch u-opnon-branch µ-op

branch µ−op

non-branch µ-opnon-branch µ-opnon-branch µ-op

non-branch µ-op

non-branch u-opnon-branch µ-op

non-branch µ-op

non-branch µ-op

non-branch µ-op

non-branch u-opnon-branch µ-op

non-branch µ-opnon-branch µ-opnon-branch µ-op

non-branch µ-op

non-branch µ-op

non-branch u-opnon-branch µ-op

non-branch µ-opnon-branch µ-opnon-branch µ-op

non-branch µ-op

non-branch µ-opnon-branch µ-op

non-branch µ-opnon-branch µ-opnon-branch µ-op

Mem. Addr. Renamed registerStato µ-operation

123456789101112131415161718192021222324252627282930313233343536373839

0

After retiring13,14,15