instruction set architecture

84
June 12, 2022 204521 Digital System Architecture Instruction Set Architecture Pradondet Nilagupta Spring 2001 (original notes from Prof. Mike Schulte )

Upload: molimo

Post on 22-Jan-2016

82 views

Category:

Documents


0 download

DESCRIPTION

Instruction Set Architecture. Pradondet Nilagupta Spring 2001 (original notes from Prof. Mike Schulte ). Overview ISA (1/2). Concentrate on ISA Introduce wide variety of design alternative to instruction set architecture Focus on four topics Classification of instruction set alternative - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture

Instruction Set Architecture

Pradondet Nilagupta

Spring 2001

(original notes from Prof. Mike Schulte )

Page 2: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 2

Overview ISA (1/2)

Concentrate on ISA

Introduce wide variety of design alternative to instruction set architecture– Focus on four topics

• Classification of instruction set alternative– Give some qualitative assessment of the advantage and

disadvantage of various approach

• Present and analyze some instruction set measurement that are largely independent of a specific instruction

Page 3: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 3

Overview ISA (2/3)

• Address the issue of a languages and compiler and their bearing on ISA

• Show how these idea are reflected in DLX instruction set, which is typical of recent instruction set architectures

Examine a wide variety of architectural measurement– Measurements depend on the programs

measured and on the compiler used in making these measurements

Page 4: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 4

Hot Topics in Computer Architecture

1950s and 1960s:– Computer Arithmetic

1970 and 1980s: – Instruction Set Design– ISA Appropriate for Compilers

1990s: – Design of CPU– Design of memory system– Design of I/O system– Multiprocessors– Instruction Set Extensions

Page 5: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 5

Instruction Set Architecture

Instruction set architecture is the structure of a computer that a machine language programmer must understand to write a correct (timing independent) program for that machine.

The instruction set architecture is also the machine description that a hardware designer must understand to design a correct implementation of the computer.

Page 6: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 6

Instruction Set Architecture

The instruction set architecture serves as the interface between software and hardware

instruction set

software

hardware

Page 7: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 7

Interface Design

A good interface:– Lasts through many implementations

(portability, compatibility)

– Is used in many different ways (generality)

– Provides convenient functionality to higher levels

– Permits an efficient implementation at lower levels

Page 8: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 8

What Are the Components of an ISA? (1/2)

Sometimes known as The Programmer’s Model of the machineStorage cells– General and special purpose registers in the CPU– Many general purpose cells of same size in memory– Storage associated with I/O devices

The machine instruction set– The instruction set is the entire repertoire of machine op

erations– Makes use of storage cells, formats, and results of the f

etch/execute cycle– i.e., register transfers

Page 9: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 9

What Are the Components of an ISA? (2/2)

The instruction format– Size and meaning of fields within the

instruction

The nature of the fetch-execute cycle– Things that are done before the

operation code is known

Page 10: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 10

Programmer’s Models of Various Machines

216 bytes of main memorycapacity

Fewer than 100

instructions

7

15

A

216 – 1

B

IX

SP

PC

0

12 generalpurposeregisters

More than 300instructions

More than 250instructions

More than 120instructions

232 – 1

252 – 1

0

PSW

Status

R0

PC

R11

AP

FP

SP

0 31 0

32 64-bit

floating pointregisters

(introduced 1993)(introduced 1981)(introduced 1975) (introduced 1979)

0

31

0 63

32 32-bitgeneral purposeregisters

0

31

0 31

More than 50 32-bit special

purposeregisters

0 31

252 bytes of main mem orycapacity

0

M6800 VAX11 PPC601

220 – 1

AX

BX

CX

DX

SP

BP

SI

DI

15 7 08

IP

Status

Addressand

countregisters

CS

DS

SS

ES

M emorysegm entregisters

220 bytes of main memorycapacity

0

I8086

232 bytes of main mem orycapacity

Dataregisters

6 specialpurposeregisters

Page 11: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 11

What Must an Instruction Specify?(1/2)

Which operation to perform– add r0, r1, r3

– Ans: Op code: add, load, branch, etc.

Where to find the operand or operands– add r0, r1, r3

– In CPU registers, memory cells, I/O locations, or part of instruction

Place to store result– add r0, r1, r3

– Again CPU register or memory cell

Page 12: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 12

Location of next instructionadd r0, r1, r3

br endloop– Almost always memory cell pointed to by prog

ram counter—PC

Instruction Format (encoding)– How is it decoded?

Sometimes there is no operand, or no result, or no next instruction. Can you think of examples?

What Must an Instruction Specify?(2/2)

Page 13: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 13

Instructions Can Be Divided into Classes (1/2)

Data movement instructions– Move data from a memory location or register t

o another memory location or register without changing its form

– Load—source is memory and destination is register

– Store—source is register and destination is memory

Arithmetic and logic (ALU) instructions– Change the form of one or more operands to p

roduce a result stored in another location– Add, Sub, Shift, etc.

Page 14: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 14

Instructions Can Be Divided into 3 Classes (2/2)

Branch instructions (control flow instructions)– Alter the normal flow of control from ex

ecuting the next instruction in sequence– Br Loc, Brz Loc2,—unconditional or con

ditional branches

Page 15: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 15

Examples of Data Movement Instructions

Instruction Meaning Machine

MOV A, B Move 16 bits from memory location A to VAX11 Location B

LDA A, Addr Load accumulator A with the byte at memory M6800 location Addr

lwz R3, A Move 32-bit data from memory location A to PPC601 register R3

li $3, 455 Load the 32-bit integer 455 into register $3 MIPS R3000

mov R4, dout Move 16-bit data from R4 to output port dout DEC PDP11

IN, AL, KBD Load a byte from in port KBD to accumulator Intel Pentium

LEA.L (A0), A2 Load the address pointed to by A0 into A2 M6800

Page 16: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 16

Examples of ALUInstructions

Instruction Meaning Machine

MULF A, B, Cmultiply the 32-bit floating point values at VAX11mem loc’ns. A and B, store at C

nabs r3, r1 Store abs value of r1 in r3 PPC601

ori $2, $1, 255 Store logical OR of reg $ 1 with 255 into reg $2MIPS R3000

DEC R2 Decrement the 16-bit value stored in reg R2DEC PDP11

SHL AX, 4 Shift the 16-bit value in reg AX left by 4 bit pos’ns.Intel 8086

• Notice again the complete dissimilarity of both syntax and semantics.

Page 17: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 17

Examples of Branch Instructions

Instruction Meaning Machine

BLSS A, Tgt Branch to address Tgt if the least significant VAX11bit of mem loc’n. A is set (i.e. = 1)

bun r2 Branch to location in R2 if result of previous PPC601floating point computation was Not a Number (NAN)

beq $2, $1, 32 Branch to location (PC + 4 + 32) if contentsMIPS R3000

of $1 and $2 are equal

SOB R4, Loop Decrement R4 and branch to Loop if R4 0DEC PDP11

JCXZ Addr Jump to Addr if contents of register CX 0. Intel 8086

Page 18: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 18

ISA Metrics

Orthogonality– No special registers, few special cases, all operand

modes available with any data type or instruction type

Completeness– Support for a wide range of operations and target

applications

Regularity– No overloading for the meanings of instruction fields

Streamlined– Resource needs easily determined

Ease of compilation (programming?), Ease of implementation, Scalability

Page 19: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 19

Instruction Set Design Issues (1/2)

Instruction set design issues include:– Where are operands stored?

• registers, memory, stack, accumulator

– How many explicit operands are there? • 0, 1, 2, or 3

– How is the operand location specified?• register, immediate, indirect, . . .

– What type & size of operands are supported?• byte, int, float, double, string, vector. . .

Page 20: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 20

Instruction Set Design Issues (2/2)

– What operations are supported? • add, sub, mul, move, compare . . .

– How to encode them into instruction format?

• Instructions should be multiples of Bytes.

Page 21: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 21

Evolution of Instruction Sets

Single Accumulator (EDSAC 1950)

Accumulator + Index Registers(Manchester Mark I, IBM 700 series 1953)

Separation of Programming Model from Implementation

High-level Language Based Concept of a Family(B5000 1963) (IBM 360 1964)

General Purpose Register Machines

Complex Instruction Sets Load/Store Architecture

RISC

(Vax, Intel 8086 1977-80) (CDC 6600, Cray 1 1963-76)

(Mips,Sparc,88000,IBM RS6000, . . .1987+)

Page 22: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 22

Evolution of Instruction Sets

Major advances in computer architecture are typically associated with landmark instruction set designs– Ex: Stack VS. GPR (System 360)

Design decisions must take into account:– technology– machine organization– programming languages– compiler technology– operating systems

The design decisions in turn influence these.

Page 23: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 23

Classifying ISAs

Accumulator (before 1960):1 address add A acc acc + mem[A]

Stack (1960s to 1970s):0 address add tos tos + next

Memory-Memory (1970s to 1980s):2 address add A, B mem[A] mem[A] + mem[B]3 address add A, B, C mem[A] mem[B] + mem[C]

Register-Memory (1970s to present): 2 address add R1, A R1 R1 + mem[A]

load R1, A R1 mem[A]

Register-Register (Load/Store) (1960s to present):3 address add R1, R2, R3 R1 R2 + R3

load R1, R2 R1 mem[R2]store R1, R2 mem[R1] R2

Page 24: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 24

Comparison of ISA Classes

Code Sequence for C = A+BStack Accumulator Register

(register-Mem)Register(load/store)

Push A Load A Load R1, A Load R1, A

Push B Add B Add R1, B Load R2, B

Add Store C Store C, R1 Add R3, R1, R2

Pop C Store C, R3

Memory efficiency? Instruction access? Data access?

Page 25: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 25

Comparison of ISA Classes

Memory efficiency? Instruction access? Data access?

Stack Accumulator Register(register-Mem)

Register(load/store)

Push A Load A Load R1, A Load R1, A

Push B Add B Add R1, B Load R2, B

Add Store C Store C, R1 Add R3, R1, R2

Pop C Store C, R3

Page 26: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 26

Ex. Expression Evaluation for 3-, 2-, 1-, and 0-Address Machines

Number of instructions & number of addresses both varyDiscuss as examples: size of code in each case

3 - a d d r e s s 2 - a d d r e s s 1 - a d d r e s s S t a c k

add a, b, cmpy a, a, dsub a, a, e

load a, badd a, cmpy a, dsub a, e

load badd cmpy dsub estore a

push bpush caddpush dmpypush esubpop a

Evaluat e a = (b+c) *d - e

Page 27: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 27

Stack Architectures

Instruction set: add, sub, mult, div, . . .push A, pop A

Example: A*B - (A+C*B)push Apush Bmulpush Apush Cpush Bmuladdsub

A BA

A*BA*B

A*BA*B

AAC

A*BA A*B

A C B B*C A+B*C result

Page 28: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 28

The 0-Address, or Stack, Machine and Instruction Format

Memory

Op1Addr:

TOS

SOS

etc.

Op1

Programcounter

NextiAddr: Nexti

Bits:

Format

Format

8 24

CPU

Where to findnext instruction

Stack

24

push Op1 (TOS ฌ Op1)

Instruction formats

add (TOS ฌ TOS + SOS)

push Op1Addr

Operation

Bits: 8

add

Which operation

Result

W here to find operands, and where to put result

(on the stack)

Page 29: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 29

Stacks: Pros and Cons

Pros– Good code density (implicite top of stack)– Low hardware requirements– Easy to write a simpler compiler for stack

architectures

Cons– Stack becomes the bottleneck– Little ability for parallelism or pipelining– Data is not always at the top of stack when need,

so additional instructions like TOP and SWAP are needed

– Difficult to write an optimizing compiler for stack architectures

Page 30: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 30

Accumulator Architectures

Instruction Setadd A, sub A, mult A, div A, . . .

load A, store A

Example: A*B-(A+C*B)load B

mul C

add A

store D

load A

mul B

sub D

B B*C A+B*C AA+B*C A*B result

Page 31: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 31

1-Address Machine and Instruction Format

Special CPU register, the accumulator, supplies 1 operand and stores result

One memory address used for other operand

Need instructions to load and store operands:LDA OpAddrSTA OpAddr

Memory

Op1Addr: Op1

NextiProgramcounter

Accumulator

NextiAddr:

CPU

Where to findnext instruction

24

add Op1 (Acc ฌ Acc + Op1)

Bits: 8 24

Instruction format

add Op1Addr

Whichoperation

Where to find operand1

Where to find operand2, and

where to put result

Page 32: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 32

Accumulators: Pros and Cons

Pros– Very low hardware requirements

– Easy to design and understand

Cons– Accumulator becomes the bottleneck

– Little ability for parallelism or pipelining

– High memory traffic

Page 33: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 33

Memory-Memory Architectures

Instruction set:(3 operands) add A, B, C sub A, B, C mul A, B, C

(2 operands) add A, B sub A, B mul A, B

Example: A*B - (A+C*B)– 3 operands 2 operands

mul D, A, B mov D, A

mul E, C, B mul D, B

add E, A, E mov E, C

sub E, D, E mul E, B

add E, A

sub E, D

Page 34: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 34

The 2-Address Machine and Instruction Format

Result overwrites Operand 2Needs only 2 addresses in instruction but less choice in placing data

M em ory

O p1Addr:

O p2Addr:

Op1

Programcounter

Op2,Res

N extiNextiAddr:

CPU

W here to findnext instruction

24

add Op2, Op1 (O p2 ฌ O p2 + Op1)

B its: 8 24 24

Instruction format

add Op2Addr Op1Addr

W hichoperation

W here toput result

W here to find operands

Page 35: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 35

Memory - Memory:Pros and Cons

Pros– Requires fewer instructions (especially if 3

operands)

– Easy to write compilers for (especially if 3 operands)

Cons– Very high memory traffic (especially if 3

operands)

– Variable number of clocks per instruction

– With two operands, more data movements are required

Page 36: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 36

Register-Memory Architectures

Instruction Set:add R1, A sub R1, A mul R1, B

load R1, A store R1, A

Example: A*B - (A+C*B)

mul R1, B /* A*B */

store R1, D

load R2, C

mul R2, B /* C*B */

add R2, A /* A + CB */

sub R2, D /* AB - (A + C*B) */

Page 37: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 37

Memory-Register: Pros and Cons

Pros– Some data can be accessed without loading

first

– Instruction format easy to encode

– Good code density

Cons– Operands are not equivalent (poor

orthorganality)

– Variable number of clocks per instruction

– May limit number of registers

Page 38: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 38

Load-Store Architectures

Instruction Set:add R1, R2, R3 sub R1, R2, R3 mul R1, R2, R3load R1, R4 store R1, R4

Example: A*B - (A+C*B)load R2, &Bload R3, &Cload R4, R1load R5, R2load R6, R3mul R7, R6, R5 /* C*B */add R8, R7, R4 /* A + C*B */mul R9, R4, R5 /* A*B */sub R10, R9, R8 /* A*B - (A+C*B) */

Page 39: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 39

The 3-Address Machine and Instruction format

Address of next instruction kept in processor state register—the PC (except for explicit branches/jumps)Rest of addresses in instruction– Discuss: savings in instruction word size

add, Res, Op1, Op2 (Res ฌ Op2 + Op1)

Op1Addr:

Op2Addr:

Op1

Programcounter

Op2

ResAddr:

NextiAddr:

Res

Nexti

Where to findnext instruction

24Bits: 8 24 24

Instruction format

24

add ResAddr Op1Addr Op2Addr

Whichoperation

Where toput result Where to find operands

Memory CPU

Page 40: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 40

Load-Store: Pros and Cons

Pros– Simple, fixed length instruction encoding

– Instructions take similar number of cycles

– Relatively easy to pipeline

Cons– Higher instruction count

– Not all instructions need three operands

– Dependent on good compiler

Page 41: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 41

Registers:Advantages and Disadvantages

Advantages– Faster than cache (no addressing mode or tags)

– Deterministic (no misses)

– Can replicate (multiple read ports)

– Short identifier (typically 3 to 8 bits)

– Reduce memory traffic

Disadvantages– Need to save and restore on procedure calls and contex

t switch

– Can’t take the address of a register (for pointers)

– Fixed size (can’t store strings or structures efficiently)

– Compiler must manage

Page 42: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 42

General Register Machine and Instruction Formats

Memory

Op1Addr: Op1load

Nexti Programcounter

load R8, Op1 (R8 ฌ Op1)

CPU

Registers

R8

R6

R4

R2

Instruction formats

R8load Op1Addr

add R2, R4, R6 (R2 ฌ R4 + R6)

R2add R6R4

Page 43: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 43

General Register Machine and Instruction Formats

It is the most common choice in today’s general-purpose computersWhich register is specified by small “address” (3 to 6 bits for 8 to 64 registers)Load and store have one long & one short address: 1- addressesArithmetic instruction has 3 “half” addresses

Page 44: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 44

Real Machines Are Not So Simple

Most real machines have a mixture of 3, 2, 1, 0, and 1- address instructions

A distinction can be made on whether arithmetic instructions use data from memory

If ALU instructions only use registers for operands and result, machine type is load-store– Only load and store instructions reference memory

Other machines have a mix of register-memory and memory-memory instructions

Page 45: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 45

Byte Ordering

Idea– Bytes in long word numbered 0 to 3– Which is most (least) significant?– Can cause problems when exchanging binary data

between machines

Big Endian: Byte 0 is most, 3 is least– IBM 360/370, Motorola 68K, Sparc.

Little Endian: Byte 0 is least, 3 is most– Intel x86, VAX

Alpha– Chip can be configured to operate either way– DEC workstation are little endian– Cray T3E Alpha’s are big endian

Page 46: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 46

Byte Ordering Example (1/2)

union { unsigned char c[8]; unsigned short s[4]; unsigned int i[2]; unsigned long l[1]; } dw;

c[3]

s[1]

i[0]

c[2]c[1]

s[0]

c[0] c[7]

s[3]

i[1]

c[6]c[5]

s[2]

c[4]

l[0]

Page 47: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 47

Byte Ordering Example (2/2)

int j;for (j = 0; j < 8; j++)dw.c[j] = 0xf0 + j;printf("Characters 0-7 == [0x%x,0x%x,0x%x,0x%x,0x%x,0x%x,0x%x,0x%x]\n", dw.c[0], dw.c[1], dw.c[2], dw.c[3], dw.c[4], dw.c[5], dw.c[6], dw.c[7]);printf("Shorts 0-3 == [0x%x,0x%x,0x%x,0x%x]\n", dw.s[0], dw.s[1], dw.s[2], dw.s[3]);printf("Ints 0-1 == [0x%x,0x%x]\n", dw.i[0], dw.i[1]);printf("Long 0 == [0x%lx]\n", dw.l[0]);

Page 48: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 48

Byte Ordering on Alpha

Little Endian

c[3]

s[1]

i[0]

LSB MSB

c[2]c[1]

s[0]

c[0]

LSB MSB

LSB MSB

c[7]

s[3]

i[1]

LSB MSB

c[6]c[5]

s[2]

c[4]

LSB MSB

LSB MSB

f0 f1 f2 f3 f4 f5 f6 f7

Output on Alpha:Print

l[0]

LSB MSB

Page 49: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 49

Byte Ordering on x86

Little Endian

c[3]

s[1]

i[0]

LSB MSB

c[2]c[1]

s[0]

c[0]

LSB MSB

LSB MSB

c[7]

s[3]

i[1]

LSB MSB

c[6]c[5]

s[2]

c[4]

LSB MSB

LSB MSB

f0 f1 f2 f3 f4 f5 f6 f7

Output on Pentium:Print

l[0]

LSB MSB

Page 50: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 50

Byte Ordering on Sun

Big Endian

c[3]

s[1]

i[0]

LSBMSB

c[2]c[1]

s[0]

c[0]

MSB LSB

LSB MSB

c[7]

s[3]

i[1]

LSB MSB

c[6]c[5]

s[2]

c[4]

MSB LSB

LSB MSB

f0 f1 f2 f3 f4 f5 f6 f7

Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7]Shorts 0-3 == [0xf0f1,0xf2f3,0xf4f5,0xf6f7]Ints 0-1 == [0xf0f1f2f3,0xf4f5f6f7]Long 0 == [0xf0f1f2f3]

Output on Sun:Print

l[0]

MSB LSB

Page 51: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 51

Big Endian Addressing

With Big Endian addressing, the byte binary address

x . . . x00

is in the most significant position (big end) of a 32 bit word (IBM, Motorola, Sun, HP).

MSB LSB0 1 2 34 5 6 7

Page 52: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 52

Little Endian Addressing

With Little Endian addressing, the byte binary address

x . . . x00

is in the least significant position (little end) of a 32 bit word (DEC, Intel).

MSB LSB3 2 1 07 6 5 4

Page 53: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 53

Operand Alignment

An access to an operand of size s bytes at byte address A is said to be aligned if A mod s = 0

40 41 42 43 44D0 D1 D2 D3

D0 D1 D2 D3

Page 54: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 54

Unrestricted Alignment

If the architecture does not restrict memory accesses to be aligned then– Software is simple

– Hardware must detect misalignment and make 2 memory accesses

– Expensive detection logic is required

– All references can be made slower

Sometimes unrestricted alignment is required for backwards compatibility

Page 55: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 55

Restricted Alignment

If If the architecture restricts memory accesses to be aligned then– Software must guarantee alignment

– Hardware detects misalignment access and traps

– No extra time is spent when data is aligned

Since we want to make the common case fast, having restricted alignment is often a better choice, unless compatibility is an issue.

Page 56: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 56

Addressing Modes (1/3)ImmediateAdd R4, #3Regs[R4] Regs[R4]+3

Operand:3

RegisterAdd R4, R3Regs[R4] Regs[R4]+Regs[R3]

R3

Operand

Registers

Register IndirectAdd R4, (R1)Regs[R4] Regs[R4]+Mem[Regs[R1]]

R1

Operand

Registers Memory

Page 57: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 57

Addressing Modes(2/3)

DirectAdd R4, (1001)Regs[R4] Regs[R4]+Mem[1001]

1001

Operand

Memory

Memory IndirectAdd R4, @(R3)Regs[R4] Regs[R4]+Mem[Mem[Regs[R3]]]

R3

Operand

Registers Memory

Page 58: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 58

Addressing Modes(3/3)

DisplacementAdd R4, 100(R1)Regs[R4] Regs[R4]+Mem[100+R1]

Registers

R1 100

Memory

Operand

ScaledAdd R1, 100(R2) [R3]Regs[R1] Regs[R1]+Mem[100+ Regs[R2]+Regs[R3]*d]

Registers

R2 100

Memory

Operand

R3

*d

Page 59: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 59

Addressing Mode Usage

3 Programs from SPEC89 on VAX– Others : 0.

1%

0%

24%

43%

32%

6%

16%

3%

17%

55%

1%

6%

11%

39%

40%

0% 20% 40% 60%

Me

mo

ryIn

dir

ec

tIm

me

dia

te

Frequency of addressing mode

gcc

spice

Tex

Page 60: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 60

Displacement Address Size

Average of 5 programs from SPECint92 and SPECfp92.

– X-axis is log2 of displacement.

– 1% of addresses > 16 bits.

0%

5%

10%

15%

20%

25%

30%

0 2 4 6 8 10 12 14

Number of Bits

Page 61: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 61

Immediate Addressing Mode (1/2)

10 Programs from SPECInt92 and SPECfp92

10%

87%

58%

35%

45%

77%

78%

10%

0% 50% 100%LoadsCom

pares

ALUAll

Inst

.

Percentage of operations using immediate

FP

Integer

Page 62: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 62

Immediate Addressing Mode (2/2)

50% to 60% fit within 8 bits

75% to 80% fit within 16 bits

0%

10%

20%

30%

40%

50%

60%

0 4 8 12 16 20 24 28 32

Number of Bits

Page 63: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 63

Addressing Mode Summary

Important data addressing modes– Displacement– Immediate– Register Indirect

Displacement size should be 12 to 16 bits.

Immediate size should be 8 to 16 bits.

Page 64: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 64

Instruction Operations

Arithmetic and Logical:– add, subtract, and , or, etc.

Data transfer:– Load, Store, etc.

Control– Jump, branch, call, return, trap, etc.

Synchronization:– Test & Set.

String:– string move, compare, search.

Page 65: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 65

Top-9 x86 Instructions

Simple Instructions dominates instruction frequency.

1 Load 22%2 Conditional branch 20%3 Compare 16%4 Store 12%5 Add 8%6 And 6%

7 Sub 5%8 Move register-register 4%9 Call 1%

Page 66: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 66

Methods of Testing Condition

Condition code: Status bits are set by ALU operations.– Add r1, r2, r3 and bz label– Extra status bits

Condition register:– cmp r1, r2, r3 and bgt r1, label– Simple, but use up a register

Compare and branch– bgt r1, r2, label– One instruction– Too much work per instruction

Page 67: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 67

Conditional Branch Distance

Short displacement fields often sufficient for branch

0%5%

10%15%

20%25%

30%35%

40%

0 2 4 6 8 10 12 14

Bits of Branch Displacement

Page 68: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 68

Conditional Branch Addressing

PC-relative, since most branches from current PC address– At least 8 bits.

Compare Equal/Not Equal most important for integer programs.

7%

7%

87%

40%

23%

37%

0% 50% 100%

LT/GE

GT/LE

EQ/NEQ

Frequency of comparison types

FP

Integer

Page 69: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 69

Data Types and Usage

Byte, half word (16 bits), word (32 bits), double word (64 bits).

Arithmetic:

– Decimal: 4bit per digit.

– Integers: 2’s complement

– Floating-point: IEEE standard-- single, double, extended precision.

7%

19%

74%

0%

0%

0%

31%

69%

0% 20% 40% 60% 80%

Byte

Half Word

Word

DoubleWord

Frequency of comparison types

FPInteger

Page 70: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 70

Instruction Format

Fixed– Operation, address specifier 1, address specifier 2, address specifier

3.– MIPS, SPARC, Power PC.

Variable– Operation & # of operands, address specifier1, …, specifier n.– VAX

Hybrid– Intel x86– operation, address specifier, address field.– Operation, address specifier 1, address specifier 2, address field.– Operation, address field, address specifier 1, address specifier 2.

Summary:– If code size is most important, use variable format.– If performance is most important, use fixed format.

Page 71: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 71

Types of Addressing Modes (VAX)

Memory

1. Register direct Ri

2. Immediate (literal) #n

3. Displacement M[Ri + #n]

4. Register indirect M[Ri]

5. Indexed M[Ri + Rj]

6. Direct (absolute) M[#n]

7. Memory Indirect M[M[Ri] ]

8. Autoincrement M[Ri++]

9. Autodecrement M[Ri - -]

10. Scaled M[Ri + Rj*d + #n]

Page 72: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 72

Frequency of Immediate Addressing on DLX

Not all instructions can take advantage of immediate addressing.

Operation SPECint92 SPECfp92Loads 10% 45%

Compares 87% 77%ALU ops 58% 78%

Overall 35% 10%

Page 73: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 73

Types of Operations

Arithmetic and Logic: AND, ADD

Data Transfer: MOVE, LOAD, STORE

Control BRANCH, JUMP, CALL

System OS CALL, VM

Floating PointADDF, MULF, DIVF

Decimal ADDD, CONVERT

String MOVE, COMPARE

Graphics (DE)COMPRESS

Page 74: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 74

80x86 Instruction Frequency

Rank Instruction Frequency1 load 22%2 branch 20%3 compare 16%4 store 12%5 add 8%6 and 6%7 sub 5%8 register move 4%

9

9 call 1%10 return 1%

Total 96%

Page 75: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 75

Relative Frequency of Control Instructions

Design hardware to handle branches quickly, since these occur most frequently

Operation SPECint92 SPECfp92Call/Return 13% 11%

Jumps 6% 4%Branches 81% 87%

Page 76: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 76

Frequency of Operand Sizeson 32-bit Load-Store Machine

For floating-point want good performance for 64 bit operands.

For integer operations want good performance for 32 bit operands.

Size SPECint92 SPECfp9264 bits 0% 69%32 bits 74% 31%16 bits 19% 0%

8 bits 19% 0%

Page 77: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 77

Encoding an Instruction set

a desire to have as many registers and addressing mode as possiblethe impact of size of register and addressing mode fields on the average instruction size and hence on the average program sizea desire to have instruction encode into lengths that will be easy to handle in the implementation

Page 78: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 78

Three choice for encoding the instruction set

Variable– Instruction length varies based on opcode and address

specifiers– For example, VAX instructions vary between 1 and 53

bytes– Good code density, but difficult to decode

Fixed– Only a single size for all instructions– For example, DLX, MIPS, Power PC, Sparc all have 32 bit

instructions– Not as good code density, but easier to decode

Hybrid– Have multiple format lengths specified by the opcode– For example, IBM 360/370 and Intel 80x86– Compromise between code density and ease of decode

Page 79: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 79

Compilers and ISA

Compiler Goals– All correct programs compile correctly– Most compiled programs execute quickly– Most programs compile quickly– Achieve small code size– Provide debugging support

Multiple Source Compilers– Same compiler can compiler different languages

Multiple Target Compilers– Same compiler can generate code for different

machines

Page 80: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 80

Compilers Phases

Compilers use phases to manage complexity (fig.2.18)– Front end

• Convert language to intermediate form

– High level optimizer• Procedure inlining and loop transformations

– Global optimizer• Global and local optimization, plus register

allocation

– Code generator (and assembler)• Dependency elimination, instruction selection,

pipeline scheduling

Page 81: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 81

The impact of Compiler Technology on The architect’s decision

How are variables allocated and addressed?

How many registers are needed to allocated variables appropriately?

Page 82: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 82

Allocation of Variables

Stack – used to allocate local variables– grown and shrunk on procedure calls and returns– register allocation works best for stack-allocated objects

Global data area– used to allocate global variables and constants– many of these objects are arrays or large data structures– impossible to allocate to registers if they are aliased

Heap– used to allocate dynamic objects– heap objects are accessed with pointers– never allocated to registers

Page 83: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 83

Designing ISA to Improve Compilation

Provide enough general purpose registers to ease register allocation ( more than 16). Provide regular instruction sets by keeping the operations, data types, and addressing modes orthogonal.Provide primitive constructs rather than trying to map to a high-level language.Simplify trade-off among alternatives. Allow compilers to help make the common case fast.

Page 84: Instruction Set Architecture

April 21, 2023204521 Digital System Architecture 84

Summary: ISA

Use general purpose registers with a load-store architecture. Support these addressing modes: displacement, immediate, register indirect.Support these simple instructions: load, store, add, subtract, move register, shift, compare equal, compare not equal, branch, jump, call, return.Support these data size: 8-,16-,32-bit integer, IEEE FP standard.Provide at least 16 general purpose registers plus separate FP registers and aim for a minimal instruction set.