3-isa 2 · 20/09/2012& ©&2012&david&black1schaﬀer& 4 10 11...

20/09/2012

© 2012 David Black-‐Schaffer 1

1

Instruc*on Set Architectures 2

Introduc<on to Computer Architecture David Black-‐Schaffer

2

Contents

•  Transla*on to machine code –  Instruc<on formats –  Large constants

•  Procedure calls –  Register conven<ons –  Stack memory

•  Other ISAs

3

Material that is not in this lecture

Readings from the book –  Sign extension for two’s complement numbers (2.4) –  Logical opera<ons (2.6) –  Assembler, linker, and loader (2.12)

You will need 2.4 and 2.6 for this lecture. (2.12 will be on the exam.)

The book has excellent descrip<ons of these topics. Please read the book before watching this lecture.

20/09/2012


4

Transla*on to machine code Encodings and formats

5

Instruc*on format (machine language) •  Machine Language

–  Computers do not understand “add R8, R17, R18” –  Instruc<ons are translated to machine language (1s and 0s)

•  Example: add R8, R17, R18 00000010 00110010 01000000 00100000

•  MIPS instruc<ons have logical fields:

000000 10001 10010 01000 00000 100000

opcode rs (src1)

rt (src1)

shamt funct rd (dest)

6

000000 10001 10010 01000 00000 100000

opcode rs (src1)

rt (src2)


Instruc*on fields

•  opcode Opera<on (e.g., “add” “lw”) •  rs First source register •  rt Second source register •  rd Des<na<on register •  shamt Shi] amount •  funct Func<on selector

(add = 32, sub =34)

Ques*on: Why are there 5 bits for rs, rt, and rd? Answer: 25=32. Need 5 bits to select from 32 registers.

Remember from 2’s complement: subtrac<on is basically addi<on, so it makes sense to share an

opcode.

6 bits 5 bits 5 bits 5 bits 5 bits 6 bits

20/09/2012


7

Constants (immediate) •  Small constants (immediates) are used all over code (~50%)

if (a==b) c=1; else c=2;

•  How can we support this in the processor? –  Put the “typical constants” in memory and load them (slow) –  Create hard-‐wired registers (like R0) for them (how many?)

•  MIPS does something in between: –  Some instruc<ons can have constants inside the instruc*on –  The control logic then sends the constants to the ALU –  addi R29, R29, 4 value 4 is inside the instruc<on

•  But there’s a problem: –  Instruc<ons have only 32 bits. Need some for opcode and registers.

–  How do we tradeoff space for constants and instruc*ons?

000000 10001 10010 01000 00000 100000

opcode rs (src1)

rt (src2)


How many bits needed to choose from all those registers?

Store the constant data in the

instruc<on, not the register file.

8

MIPS instruc*on formats

•  Different formats for different kids of instruc<ons

•  MIPS has 3 instruc*on formats: –  R: opera<on 3 registers no immediate –  I: opera<on 2 registers short immediate –  J: jump 0 registers long immediate

•  Formats use the instruc<on bits differently –  Tradeoff immediate space and registers

Ques*on: How large an immediate can you have for addi? Answer: I-‐format: 5+5+6 bits = 16 bits. 2’s complement -‐32,768 to +32767

N a me F iel d s C o m m e nt s F i e l d Si ze 6 bi ts 5 bi ts 5 bi ts 5 bi ts 5 bi ts 6 b i t s A l l MIPS instruc<ons 32 bits R -‐f or m at o p rs rt rd s h mt f un c t Arithme<c instruc<on format

J-‐ f o r m at o p tar ge t a d dr e ss J ump instruc<on format

I -‐f or m at o p rs rt a d dr e ss / i m m ed iat e Transfer (load/store), branch, immediate format

9

Data Register

Memory Address

Loading immediate values (constants)

Control tells the ALU to take one operand from the Register File and the other from the Instruc*on.

Control

Instruc*on Register

Program Counter

0

Register File Memory

ALU

0 R0 R1 R2 R3 R4 R5 R6 R7 R8

addi R6, R0, 100

323

0 4 8 12 16 20 24 28 32

addi R6, R0, 100

add

16 bit immediate is sign extended to 32 bits.

100

20/09/2012


10

11

Large constants and branches

12

R2: 10101010 10101010 00000000 00000000

R2:

Loading larger values

•  The immediate field is limited to 16 bits (-‐32,768 to +32,767) –  How do we load larger values?

•  Use two instruc<ons to combine two 16 bit immediates –  Load Upper Immediate (lui): Loads upper 16 bits –  Or Immediate (ori): Loads lower 16 bits

•  Example: 10101010 10101010 11110000 11110000

Ques*on: Is the immediate sign-‐extended for ori? Answer: No. If it was we would end up with all 1s in the top bits. (See the MIPS reference data in the book.) lui R2, 10101010 10101010

ori R2, 11110000 11110000

10101010 10101010

11110000 11110000

puts zeros in the lower bits

00000000 00000000

20/09/2012


13

00

Addresses in branches and jumps

•  Branch instruc*ons –  bne/beq I-‐format 16 bit immediate –  j J-‐format 26 bit immediate

•  But addresses are 32 bits! How do we handle this? –  Treat bne/beq as rela*ve offsets (add to current PC) –  Treat j as an absolute value (replace 26 bits of the PC)

Ques*on: How far can you jump with bne/beq? Answer: -‐32,767 to +32,768 instruc<ons from the current instruc<on

Current 32 bit PC

16 bit immediate sign extend

4

Next 32 bit PC

+ +

Current 32 bit PC

26 bit immediate

Next 32 bit PC

00

00

00

14

Jump addresses example: loops R5 = j; R6 = b; Addr Instruction Comment 0 addi R5, R0, 0 ; j 0 + 0 4 addi R1, R0, 10 ; R1 0 + 10 8 beq R5, R1, 24 ; if ( j == 10) goto 24 12 add R6, R6, R5 ; b b + j 16 addi R5, R5, 1 ; j j + 1 20 j 8 ; goto 8 24 ... ; done with loop

beq: PC=PC+4+(3<<2) j: PC=[PC(31:28):2]<<2

8

3 0

4

24

+ +

20

2

8

00

00

for (j=0; j<10; j++) { b = b + j; }

3 << 2 = 8 8+(3<<2)+4 = 24

3 << 2 = 8

15

Why do rela*ve branches work?

Most branches don’t go very far!

Ques*on: What is “Int” and “FP”? Answer: Integer and Floa<ng Point programs.

Bits of Branch Displacement

0%

10%

20%

30% 40%

0 1 2 3 4 5 6 7 8 9 10

11

12

13

14

15

Int. Avg. FP Avg.

20/09/2012


16

What kind of branches are common?

Frequency of comparison types in branches

0% 50% 100%

EQ/NE

GT/LE

LT/GE

37%

23%

40%

86%

7%

7%

Int Avg.

FP Avg.

Op<mize for this case because it is most

common. (E.g., MIPS)

17

Summary: machine code and immediates

•  Instruc<ons have different encodings to store different types of data (3 register vs. immediate)

•  MIPS has 3 types, for different uses

•  Encodings limit how much data we can have

•  These are tradeoffs in design: –  Op*mize for the common case (short immediates) –  Support the general case (long immidiates)

18

20/09/2012


19

Procedure calls

20

Procedure calls •  Procedures (func*ons/subrou*nes) are needed for structured

programming

•  The difficulty is that the procedure needs to: –  Put data where the procedure can access it –  Start execu*ng –  Do work/use registers –  Return to the caller –  Get the results back to the caller

But it needs to do this without messing up the caller’s registers!

main() { for (j=0; j<10; j++) if (a[j] == 0) a[j] = update(a[j], j); }

This procedure is likely not in your code. You don’t control

it’s implementa*on!

21

More specifically:

We need to: 1.  Put the parameters in a place where the procedure (callee) can get them 2.   Transfer control to the callee

3.  Acquire the registers needed for the procedure 4.  Execute the code 5.  Place the results in a place where the calling program (caller) can access

them

6.   Return control to where we were before we called the procedure …without messing up the caller’s registers!

main() { for (j=0; j<10; j++) if (a[j] == 0) a[j] = update(a[j], j); }

Caller: main() Callee: update() Parameters: a[j], j Results: (stored in) a[j]

20/09/2012


22

Caller context Example procedure: f(g,h,i,j)=(g+h) – (i+j)

add R1, R4, R5 ; g=R4, h=R5 add R2, R6, R7 ; i=R6, j=R7 sub R3, R1, R2

If the caller (e.g., main()) uses R1, R2 or R3 they would have to be saved because the callee overwrites them when it executes

Problems: •  The callee does not know which registers the

caller is using! (It could have mul<ple different callers)

•  The caller does not know which registers the callee will use! (Could call mul<ple sub-‐procuedures)

MIPS has a conven*on on who saves which registers

•  Divided between the callee and caller •  Following this conven<on allows any caller to call any callee

•  Callee and caller both know it what they need to save

23

Saving registers: MIPS conven*ons •  MIPS Conven*on

–  Agreed upon “contract” or “protocol” that everyone follows –  Specifies the correct (and expected) usage and some naming conven<ons –  Established as part of the architecture –  Used by all compilers, programs, and libraries –  Assures compa*bility

•  Callee saves the following registers if it uses them: –  $s0-‐$s7 (s=saved) –  $sp, $fp, $ra

•  Caller must save anything else it uses

Ques*on: What are registers $s0-‐$s8 and $sp, $fp, $ra? Answer: Just standard names for R16-‐R23 and R29-‐R31.

24

MIPS register names and conven*ons

Arguments (input to callee) and values (outputs to caller)

$s0 $s1 $s2 $s3 $s4 $s5 $s6 $s7 $t8 $t9 $k0 $k1 $gp $sp $fp $ra

$0 $at $v0 $v1 $a0 $a1 $a2 $a3 $t0 $t1 $t2 $t3 $t4 $t5 $t6 $t7

R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 R15

Constant 0

Reserved Temp. Return Values Procedure arguments Caller Save Temporaries: May be overwriqen by called procedures

Callee Save Temporaries: May not be overwriqen by called pro-‐ cedures Caller Save Temp

Reserved for Opera*ng Sys Global Pointer Callee Save Stack Pointer Frame Pointer Return Address

R16 R17 R18 R19 R20 R21 R22 R23 R24 R25 R26 R27 R28 R29 R30 R31

20/09/2012


25

How to do a procedure call •  Transfer control to the callee:

jal ProcedureAddress ; jump-‐and-‐link to the procedure

–  The return address (PC+4) is stored in $ra

•  Return control to the caller:

jr $ra ; jump-‐return to the address in $ra

–  This is why you need to store the return address!

•  Register conven<on for procedure calling: –  $a0-‐$a3: Argument registers (4) for passing parameters –  $v0-‐$v1: Value registers (2) for returning results –  $ra: Return address for where to go when done

26

27

Procedure call examples and the stack

20/09/2012


28

Example: which registers to save? add $a0, $t0, 2 ; set up the arguments add $a1, $s0, $zero add $a2, $s1, $t0 add $a3, $t0, 3 addi $sp, $sp, -4 ; adjust the stack to make room for one item sw $t0, 0($sp) ; save $t0 in case the callee uses it jal leaf_example ; call the leaf_example proceedure lw $t0, 0($sp) ; restore $t0 from the stack addi $sp, $sp, 4 ; adjust the stack to delete one item add $t2, $v0, $zero ; move the result into $t2

Cal

ler

Cal

lee

leaf_example: ; calculates f=(g+h)-‐(i+j) ; g, h, i, and j are in $a0, $a1, $a2, $a3

addi $sp, $sp, -4 ; adjust the stack to make room for one item sw $s0, 0($sp) ; save $s0 for the caller add $t0,$a0,$a1 ; g = $a0, h = $a1 add $t1,$a2,$a3 ; i = $a2, j = $a3 sub $s0,$t0,$t1 add $v0,$s0,$zero ; return f in the result register $v0 lw $s0, 0($sp) ; restore $s0 for the caller addi $sp, $sp, 4 ; adjust the stack to delete one item jr $ra ; jump back to the calling rou<ne

Ques*ons: 1.  What resources does the

caller use? $t0, $s0, $s1

2.  What resources does the callee use?

$t0, $t1, $s0

3.  What does the caller need to save?

$t0

4.  What does the callee need to save?

$s0

Caller needs to save anything not in the $s registers.

Callee needs to save anything that is in the

$s registers.

29

caller data caller data caller data

Stack during procedure call

Saving registers (on the stack) •  The stack is a part of memory for storing

temporary data. •  The Stack Pointer (kept in $sp) points to

the end of the stack in memory. In MIPS the stack grows down.

•  Procedures move the stack pointer when they store data on the stack.

•  Each procedure returns the stack to the state it was before it was called.

•  Gives procedures a secure place to store data that does not fit in registers. (e.g., saved registers!)

•  Each procedure manages its own stack space so they don’t interfere. •  Works great as long as you return the stack to the way it was before.

caller data caller data caller data $sp $sp

callee save $s0 callee save $s2 callee save $s4 callee save $ra

$sp $sp $sp $sp

Stack before procedure call

Stack aser procedure call

caller data caller data caller data $sp

30

Example: saving to the stack add $a0, $t0, 2 ; set up the arguments add $a1, $s0, $zero add $a2, $s1, $t0 add $a3, $t0, 3 addi $sp, $sp, -4 ; adjust the stack to make room for one item sw $t0, 0($sp) ; save $t0 in case the callee uses it jal leaf_example ; call the leaf_example proceedure lw $t0, 0($sp) ; restore $t0 from the stack addi $sp, $sp, 4 ; adjust the stack to delete one item add $t2, $v0, $zero ; move the result into $t2

Cal

ler

Cal

lee



Ques*ons: 1.  What resources does the

caller use? $t0, $s0, $s1

2.  What resources does the callee use?

$t0, $t1, $s0

3.  What does the caller need to save?

$t0

4.  What does the callee need to save?

$s0

Move the stack pointer down by 4 bytes (1 word) Then store the register to

that loca*on.

Read the register from the stack.

Then move the stack pointer up by 4 bytes

(1 word) back to where it was before.

20/09/2012


31

Example: saving to the stack add $a0, $t0, 2 ; set up the arguments add $a1, $s0, $zero add $a2, $s1, $t0 add $a3, $t0, 3 addi $sp, $sp, -4 ; adjust the stack to make room for one item sw $t0, 0($sp) ; save $t0 in case the callee uses it jal leaf_example ; call the leaf_example proceedure lw $t0, 0($sp) ; restore $t0 from the stack addi $sp, $sp, 4 ; adjust the stack to delete one item add $t2, $v0, $zero ; move the result into $t2

Cal

ler

Cal

lee



Caller uses $t0, $s0, $s1.

$t0 is not saved, so the caller has to save it. What about $s0, $s1?

A]er the call the caller restores $t0.

Result is in $v0. Why did we not save $t2?

Callee finds its arguments in $a0-‐$a3.

Callee uses $s0, so it must save it. What about $s1?

Results are placed in $v0.

Callee must restore $s0.

$s0 and $s1 are callee-‐saved. So the

caller does not need to save them.

$t2 is overwriqen so we don’t care if it was used by the callee.

Callee never writes to $s1, so it does not need to save it.

32

Nested calls

•  Some machines provide stacks as part of the architecture (VAX, JVM) •  Others implement them in so]ware

main() {

B() jal B { C() jal C { } jr

} jr

...

The stack grows and shrinks as procedure calls add (push) data when they are called and remove (pop) data when they return.

Stack is returned to the way B had it before C

was called.

33

Summary: procedure calls

•  Procedures need to: –  Not corrupt caller resources –  Return to the right place when done

•  To accomplish this we: –  Have conven<ons for who saves registers –  Save them on the stack –  Use jal and jr $ra to enter and exit procedures

•  As long as everyone follows the conven<on we get interoperability

20/09/2012


34

35

Other ISAs

36

Other ISAs •  We’ve looked at MIPS in detail, but there are a lot of

other ISAs: –  x86 (Intel/AMD) –  ARM (ARM) –  JVM (Java) –  PPC (IBM, Motorola) –  SPARC (Oracle, Fujitsu) –  PTX (Nvidia) –  etc.

•  Let’s take a look at a few issues: –  Machine types –  ISA classes –  Addressing modes –  Instruc<on width –  CISC vs. RISC

20/09/2012


37

Basic machine types •  Memory-‐to-‐Memory machines

–  Instruc<ons can directly manipulate memory •  Mem[0] = Mem[1] + Mem[2]

–  Problems: •  Need to store temporary values in memory •  Memory is slow •  Memory is big need lots of bits for addresses

•  Architectural registers –  Hold temporary variables –  Far faster than memory faster programs –  Fewer addresses in code smaller programs

•  But it’s never that simple… –  x86 has a few registers and supports memory opera<ons –  ARM has many addressing modes that complicate register opera<ons –  When you run out of registers you have to “spill” data to memory

E.g., f=(g+h)-‐(i+j)

38

Basic ISA classes

•  Accumulator (1 register) –  1 address add A acc acc + mem[A]

•  General purpose register file (load/store) –  3 addresses add Ra Rb Rc Ra Rb + Rc load Ra Rb Ra Mem[Rb]

•  General purpose register file (Register-‐Memory) –  2 address add Ra B Ra Mem[B]

•  Stack (not a register file but an operand stack) –  0 address add tos tos + next

tos = top of stack

•  Comparison: –  Bytes per instruc<on? Number of instruc<ons?

Cycles per instruc<on?

39

Comparing number of instruc*ons

Code for C = A + B

Stack Accumulator Register (Register-‐memory)

Register (Load-‐store)

Push A Push B Add Pop C

Load A Add A Store C

Load R1, A Add R1, B Store C, R1

Load R1, A Load R2, B Add R3, R2, R1 Store C, R3

JVM PDP-‐8, 8008, 8051 x86 MIPS, PPC, ARM, SPARC

Many very small instruc<ons (no registers)

Good unless you need

temporaries…

Small code, but hard to build in

hardware

Lots of simple instruc<ons.

20/09/2012


40

Addressing modes (not all are in MIPS) Addressing mode Example Meaning

Register add R4, R3 R4R4+R3

Immediate add R4, 23 R4R4+23

Displacement add R4, 100(R1) R4R4+Mem[100+R1]

Register indirect add R4, (R1) R4R4+Mem[R1]

Indexed/Base add R3,(R1+R2) R3R3+Mem[R1+R2]

Direct or absolute add R1,(1001) R1R1+Mem[1001]

Memory indirect add R1,@(R3) R1R1+Mem[Mem[R3]]

Auto-‐increment add R1,(R2)+ R1R1+Mem[R2]; R2R2+d

Auto-‐decrement add R1,-‐(R2) R2R2-‐d; R1R1+Mem[R2]

Scaled add R1, 100(R2)[R3] R1R1+Mem[100+R2+R3*d]

Ques*on: Why auto-‐increment/decrement? Why scaled? Answer: Helpful for walking through arrays.

MIPS

41

Instruc*on widths (number of bits) •  Variable width

–  Different widths for different instruc<ons –  x86: 2-‐6 bytes for add, 2-‐4 bytes for load –  Beqer for genera<ng compact code –  Hard for hardware to know where instruc<ons start/stop

•  Fixed width –  Same width for every instruc<on –  MIPS: 4 bytes for add, 4 bytes for load –  Larger code size –  Easy for hardware to decode

•  Mul*ple widths –  ARM and MIPS support both 32-‐bit and 16-‐bit instruc<ons –  16-‐bit instruc<ons are limited, but can reduce code size

42

General purpose register machines dominate

•  Literally all machines use general purpose registers

•  Advantages –  Faster than memory

(way faster than memory) –  Can hold temporary variables

(easier to break up complex opera<ons) –  Easier for compilers to use

(regular structure and uniform use) –  Improved code density

(fewer bits to select a register than a memory address)

But we just talked about how x86 was a memory-‐register architecture…what’s going on?

20/09/2012


43

The truth about ISAs

44

The ISA lies, but you can trust it •  The ISA presents a simple view of the processor

–  Atomic — instruc<ons execute one at a <me –  Sequen*al — instruc<ons execute in order –  Flat memory — can access any loca<on easily

AMD’s 8-‐core Bulldozer Processor

Convert x86 into MIPS-‐like instruc*ons

Schedule instruc*ons to run out of order

Execute mul*ple instruc*ons at the same *me

Make memory look faster than it really is (most of the *me)

Turn the power off to save energy.

45

CISC vs. RISC •  “Simple” computa*ons are not always simple

–  O]en requires a sequence of more primi<ve instruc<ons –  E.g., Mem[R1] Mem[R2] + R3

•  Architectures that provide complex instruc<ons are Complex Instruc*on Set Compu*ng = CISC

•  PRO: assembly programs are easier to write, denser code •  CON: hardware gets really, really complicated by rarely-‐used instruc<ons.

Compilers are hard to write.

•  Architectures that provide only primi<ve instruc<ons are Reduced Instruc*on Set Compu*ng = RISC

•  CON: compiler generate lots of instruc<ons for even simple code •  PRO: hardware and compiler are easier to design and op<mize

Everything is RISC inside today to make the hardware simpler

20/09/2012


46

Summary: ISAs

•  Architecture = what’s visible to the program about the machine –  Not everything in the implementa<on is “visible” –  The implementa<on may not follow the architecture –  The invisible stuff is the “microarchitecture” and it’s very messy,

but very fun (huge engineering challenges; lots of money)

•  A big piece of the ISA is the assembly language structure –  Primi<ve instruc<ons (appear to) execute sequen*ally and

atomically –  Formats, computa<ons, addressing modes, etc.

•  CISC: lots of complicated instruc<ons •  RISC: a few basic instruc<ons •  All recent machines are RISC, but x86 is s<ll CISC

(although they do RISC tricks on the inside)

47

3-isa 2 · 20/09/2012& ©&2012&david&black1schaﬀer& 4 10 11...

Documents