3-isa 2 · 20/09/2012& ©&2012&david&black1schaffer& 4 10 11...
TRANSCRIPT
20/09/2012
© 2012 David Black-‐Schaffer 1
1
Instruc*on Set Architectures 2
Introduc<on to Computer Architecture David Black-‐Schaffer
2
Contents
• Transla*on to machine code – Instruc<on formats – Large constants
• Procedure calls – Register conven<ons – Stack memory
• Other ISAs
3
Material that is not in this lecture
Readings from the book – Sign extension for two’s complement numbers (2.4) – Logical opera<ons (2.6) – Assembler, linker, and loader (2.12)
You will need 2.4 and 2.6 for this lecture. (2.12 will be on the exam.)
The book has excellent descrip<ons of these topics. Please read the book before watching this lecture.
20/09/2012
© 2012 David Black-‐Schaffer 2
4
Transla*on to machine code Encodings and formats
5
Instruc*on format (machine language) • Machine Language
– Computers do not understand “add R8, R17, R18” – Instruc<ons are translated to machine language (1s and 0s)
• Example: add R8, R17, R18 00000010 00110010 01000000 00100000
• MIPS instruc<ons have logical fields:
000000 10001 10010 01000 00000 100000
opcode rs (src1)
rt (src1)
shamt funct rd (dest)
6
000000 10001 10010 01000 00000 100000
opcode rs (src1)
rt (src2)
shamt funct rd (dest)
Instruc*on fields
• opcode Opera<on (e.g., “add” “lw”) • rs First source register • rt Second source register • rd Des<na<on register • shamt Shi] amount • funct Func<on selector
(add = 32, sub =34)
Ques*on: Why are there 5 bits for rs, rt, and rd? Answer: 25=32. Need 5 bits to select from 32 registers.
Remember from 2’s complement: subtrac<on is basically addi<on, so it makes sense to share an
opcode.
6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
20/09/2012
© 2012 David Black-‐Schaffer 3
7
Constants (immediate) • Small constants (immediates) are used all over code (~50%)
if (a==b) c=1; else c=2;
• How can we support this in the processor? – Put the “typical constants” in memory and load them (slow) – Create hard-‐wired registers (like R0) for them (how many?)
• MIPS does something in between: – Some instruc<ons can have constants inside the instruc*on – The control logic then sends the constants to the ALU – addi R29, R29, 4 value 4 is inside the instruc<on
• But there’s a problem: – Instruc<ons have only 32 bits. Need some for opcode and registers.
– How do we tradeoff space for constants and instruc*ons?
000000 10001 10010 01000 00000 100000
opcode rs (src1)
rt (src2)
shamt funct rd (dest)
How many bits needed to choose from all those registers?
Store the constant data in the
instruc<on, not the register file.
8
MIPS instruc*on formats
• Different formats for different kids of instruc<ons
• MIPS has 3 instruc*on formats: – R: opera<on 3 registers no immediate – I: opera<on 2 registers short immediate – J: jump 0 registers long immediate
• Formats use the instruc<on bits differently – Tradeoff immediate space and registers
Ques*on: How large an immediate can you have for addi? Answer: I-‐format: 5+5+6 bits = 16 bits. 2’s complement -‐32,768 to +32767
N a me F iel d s C o m m e nt s F i e l d Si ze 6 bi ts 5 bi ts 5 bi ts 5 bi ts 5 bi ts 6 b i t s A l l MIPS instruc<ons 32 bits R -‐f or m at o p rs rt rd s h mt f un c t Arithme<c instruc<on format
J-‐ f o r m at o p tar ge t a d dr e ss J ump instruc<on format
I -‐f or m at o p rs rt a d dr e ss / i m m ed iat e Transfer (load/store), branch, immediate format
9
Data Register
Memory Address
Loading immediate values (constants)
Control tells the ALU to take one operand from the Register File and the other from the Instruc*on.
Control
Instruc*on Register
Program Counter
0
Register File Memory
ALU
0 R0 R1 R2 R3 R4 R5 R6 R7 R8
addi R6, R0, 100
323
0 4 8 12 16 20 24 28 32
addi R6, R0, 100
add
16 bit immediate is sign extended to 32 bits.
100
20/09/2012
© 2012 David Black-‐Schaffer 4
10
11
Large constants and branches
12
R2: 10101010 10101010 00000000 00000000
R2:
Loading larger values
• The immediate field is limited to 16 bits (-‐32,768 to +32,767) – How do we load larger values?
• Use two instruc<ons to combine two 16 bit immediates – Load Upper Immediate (lui): Loads upper 16 bits – Or Immediate (ori): Loads lower 16 bits
• Example: 10101010 10101010 11110000 11110000
Ques*on: Is the immediate sign-‐extended for ori? Answer: No. If it was we would end up with all 1s in the top bits. (See the MIPS reference data in the book.) lui R2, 10101010 10101010
ori R2, 11110000 11110000
10101010 10101010
11110000 11110000
puts zeros in the lower bits
00000000 00000000
20/09/2012
© 2012 David Black-‐Schaffer 5
13
00
Addresses in branches and jumps
• Branch instruc*ons – bne/beq I-‐format 16 bit immediate – j J-‐format 26 bit immediate
• But addresses are 32 bits! How do we handle this? – Treat bne/beq as rela*ve offsets (add to current PC) – Treat j as an absolute value (replace 26 bits of the PC)
Ques*on: How far can you jump with bne/beq? Answer: -‐32,767 to +32,768 instruc<ons from the current instruc<on
Current 32 bit PC
16 bit immediate sign extend
4
Next 32 bit PC
+ +
Current 32 bit PC
26 bit immediate
Next 32 bit PC
00
00
00
14
Jump addresses example: loops R5 = j; R6 = b; Addr Instruction Comment 0 addi R5, R0, 0 ; j 0 + 0 4 addi R1, R0, 10 ; R1 0 + 10 8 beq R5, R1, 24 ; if ( j == 10) goto 24 12 add R6, R6, R5 ; b b + j 16 addi R5, R5, 1 ; j j + 1 20 j 8 ; goto 8 24 ... ; done with loop
beq: PC=PC+4+(3<<2) j: PC=[PC(31:28):2]<<2
8
3 0
4
24
+ +
20
2
8
00
00
for (j=0; j<10; j++) { b = b + j; }
3 << 2 = 8 8+(3<<2)+4 = 24
3 << 2 = 8
15
Why do rela*ve branches work?
Most branches don’t go very far!
Ques*on: What is “Int” and “FP”? Answer: Integer and Floa<ng Point programs.
Bits of Branch Displacement
0%
10%
20%
30% 40%
0 1 2 3 4 5 6 7 8 9 10
11
12
13
14
15
Int. Avg. FP Avg.
20/09/2012
© 2012 David Black-‐Schaffer 6
16
What kind of branches are common?
Frequency of comparison types in branches
0% 50% 100%
EQ/NE
GT/LE
LT/GE
37%
23%
40%
86%
7%
7%
Int Avg.
FP Avg.
Op<mize for this case because it is most
common. (E.g., MIPS)
17
Summary: machine code and immediates
• Instruc<ons have different encodings to store different types of data (3 register vs. immediate)
• MIPS has 3 types, for different uses
• Encodings limit how much data we can have
• These are tradeoffs in design: – Op*mize for the common case (short immediates) – Support the general case (long immidiates)
18
20/09/2012
© 2012 David Black-‐Schaffer 7
19
Procedure calls
20
Procedure calls • Procedures (func*ons/subrou*nes) are needed for structured
programming
• The difficulty is that the procedure needs to: – Put data where the procedure can access it – Start execu*ng – Do work/use registers – Return to the caller – Get the results back to the caller
But it needs to do this without messing up the caller’s registers!
main() { for (j=0; j<10; j++) if (a[j] == 0) a[j] = update(a[j], j); }
This procedure is likely not in your code. You don’t control
it’s implementa*on!
21
More specifically:
We need to: 1. Put the parameters in a place where the procedure (callee) can get them 2. Transfer control to the callee
3. Acquire the registers needed for the procedure 4. Execute the code 5. Place the results in a place where the calling program (caller) can access
them
6. Return control to where we were before we called the procedure …without messing up the caller’s registers!
main() { for (j=0; j<10; j++) if (a[j] == 0) a[j] = update(a[j], j); }
Caller: main() Callee: update() Parameters: a[j], j Results: (stored in) a[j]
20/09/2012
© 2012 David Black-‐Schaffer 8
22
Caller context Example procedure: f(g,h,i,j)=(g+h) – (i+j)
add R1, R4, R5 ; g=R4, h=R5 add R2, R6, R7 ; i=R6, j=R7 sub R3, R1, R2
If the caller (e.g., main()) uses R1, R2 or R3 they would have to be saved because the callee overwrites them when it executes
Problems: • The callee does not know which registers the
caller is using! (It could have mul<ple different callers)
• The caller does not know which registers the callee will use! (Could call mul<ple sub-‐procuedures)
MIPS has a conven*on on who saves which registers
• Divided between the callee and caller • Following this conven<on allows any caller to call any callee
• Callee and caller both know it what they need to save
23
Saving registers: MIPS conven*ons • MIPS Conven*on
– Agreed upon “contract” or “protocol” that everyone follows – Specifies the correct (and expected) usage and some naming conven<ons – Established as part of the architecture – Used by all compilers, programs, and libraries – Assures compa*bility
• Callee saves the following registers if it uses them: – $s0-‐$s7 (s=saved) – $sp, $fp, $ra
• Caller must save anything else it uses
Ques*on: What are registers $s0-‐$s8 and $sp, $fp, $ra? Answer: Just standard names for R16-‐R23 and R29-‐R31.
24
MIPS register names and conven*ons
Arguments (input to callee) and values (outputs to caller)
$s0 $s1 $s2 $s3 $s4 $s5 $s6 $s7 $t8 $t9 $k0 $k1 $gp $sp $fp $ra
$0 $at $v0 $v1 $a0 $a1 $a2 $a3 $t0 $t1 $t2 $t3 $t4 $t5 $t6 $t7
R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 R15
Constant 0
Reserved Temp. Return Values Procedure arguments Caller Save Temporaries: May be overwriqen by called procedures
Callee Save Temporaries: May not be overwriqen by called pro-‐ cedures Caller Save Temp
Reserved for Opera*ng Sys Global Pointer Callee Save Stack Pointer Frame Pointer Return Address
R16 R17 R18 R19 R20 R21 R22 R23 R24 R25 R26 R27 R28 R29 R30 R31
20/09/2012
© 2012 David Black-‐Schaffer 9
25
How to do a procedure call • Transfer control to the callee:
jal ProcedureAddress ; jump-‐and-‐link to the procedure
– The return address (PC+4) is stored in $ra
• Return control to the caller:
jr $ra ; jump-‐return to the address in $ra
– This is why you need to store the return address!
• Register conven<on for procedure calling: – $a0-‐$a3: Argument registers (4) for passing parameters – $v0-‐$v1: Value registers (2) for returning results – $ra: Return address for where to go when done
26
27
Procedure call examples and the stack
20/09/2012
© 2012 David Black-‐Schaffer 10
28
Example: which registers to save? add $a0, $t0, 2 ; set up the arguments add $a1, $s0, $zero add $a2, $s1, $t0 add $a3, $t0, 3 addi $sp, $sp, -4 ; adjust the stack to make room for one item sw $t0, 0($sp) ; save $t0 in case the callee uses it jal leaf_example ; call the leaf_example proceedure lw $t0, 0($sp) ; restore $t0 from the stack addi $sp, $sp, 4 ; adjust the stack to delete one item add $t2, $v0, $zero ; move the result into $t2
Cal
ler
Cal
lee
leaf_example: ; calculates f=(g+h)-‐(i+j) ; g, h, i, and j are in $a0, $a1, $a2, $a3
addi $sp, $sp, -4 ; adjust the stack to make room for one item sw $s0, 0($sp) ; save $s0 for the caller add $t0,$a0,$a1 ; g = $a0, h = $a1 add $t1,$a2,$a3 ; i = $a2, j = $a3 sub $s0,$t0,$t1 add $v0,$s0,$zero ; return f in the result register $v0 lw $s0, 0($sp) ; restore $s0 for the caller addi $sp, $sp, 4 ; adjust the stack to delete one item jr $ra ; jump back to the calling rou<ne
Ques*ons: 1. What resources does the
caller use? $t0, $s0, $s1
2. What resources does the callee use?
$t0, $t1, $s0
3. What does the caller need to save?
$t0
4. What does the callee need to save?
$s0
Caller needs to save anything not in the $s registers.
Callee needs to save anything that is in the
$s registers.
29
caller data caller data caller data
Stack during procedure call
Saving registers (on the stack) • The stack is a part of memory for storing
temporary data. • The Stack Pointer (kept in $sp) points to
the end of the stack in memory. In MIPS the stack grows down.
• Procedures move the stack pointer when they store data on the stack.
• Each procedure returns the stack to the state it was before it was called.
• Gives procedures a secure place to store data that does not fit in registers. (e.g., saved registers!)
• Each procedure manages its own stack space so they don’t interfere. • Works great as long as you return the stack to the way it was before.
caller data caller data caller data $sp $sp
callee save $s0 callee save $s2 callee save $s4 callee save $ra
$sp $sp $sp $sp
Stack before procedure call
Stack aser procedure call
caller data caller data caller data $sp
30
Example: saving to the stack add $a0, $t0, 2 ; set up the arguments add $a1, $s0, $zero add $a2, $s1, $t0 add $a3, $t0, 3 addi $sp, $sp, -4 ; adjust the stack to make room for one item sw $t0, 0($sp) ; save $t0 in case the callee uses it jal leaf_example ; call the leaf_example proceedure lw $t0, 0($sp) ; restore $t0 from the stack addi $sp, $sp, 4 ; adjust the stack to delete one item add $t2, $v0, $zero ; move the result into $t2
Cal
ler
Cal
lee
leaf_example: ; calculates f=(g+h)-‐(i+j) ; g, h, i, and j are in $a0, $a1, $a2, $a3
addi $sp, $sp, -4 ; adjust the stack to make room for one item sw $s0, 0($sp) ; save $s0 for the caller add $t0,$a0,$a1 ; g = $a0, h = $a1 add $t1,$a2,$a3 ; i = $a2, j = $a3 sub $s0,$t0,$t1 add $v0,$s0,$zero ; return f in the result register $v0 lw $s0, 0($sp) ; restore $s0 for the caller addi $sp, $sp, 4 ; adjust the stack to delete one item jr $ra ; jump back to the calling rou<ne
Ques*ons: 1. What resources does the
caller use? $t0, $s0, $s1
2. What resources does the callee use?
$t0, $t1, $s0
3. What does the caller need to save?
$t0
4. What does the callee need to save?
$s0
Move the stack pointer down by 4 bytes (1 word) Then store the register to
that loca*on.
Read the register from the stack.
Then move the stack pointer up by 4 bytes
(1 word) back to where it was before.
20/09/2012
© 2012 David Black-‐Schaffer 11
31
Example: saving to the stack add $a0, $t0, 2 ; set up the arguments add $a1, $s0, $zero add $a2, $s1, $t0 add $a3, $t0, 3 addi $sp, $sp, -4 ; adjust the stack to make room for one item sw $t0, 0($sp) ; save $t0 in case the callee uses it jal leaf_example ; call the leaf_example proceedure lw $t0, 0($sp) ; restore $t0 from the stack addi $sp, $sp, 4 ; adjust the stack to delete one item add $t2, $v0, $zero ; move the result into $t2
Cal
ler
Cal
lee
leaf_example: ; calculates f=(g+h)-‐(i+j) ; g, h, i, and j are in $a0, $a1, $a2, $a3
addi $sp, $sp, -4 ; adjust the stack to make room for one item sw $s0, 0($sp) ; save $s0 for the caller add $t0,$a0,$a1 ; g = $a0, h = $a1 add $t1,$a2,$a3 ; i = $a2, j = $a3 sub $s0,$t0,$t1 add $v0,$s0,$zero ; return f in the result register $v0 lw $s0, 0($sp) ; restore $s0 for the caller addi $sp, $sp, 4 ; adjust the stack to delete one item jr $ra ; jump back to the calling rou<ne
Caller uses $t0, $s0, $s1.
$t0 is not saved, so the caller has to save it. What about $s0, $s1?
A]er the call the caller restores $t0.
Result is in $v0. Why did we not save $t2?
Callee finds its arguments in $a0-‐$a3.
Callee uses $s0, so it must save it. What about $s1?
Results are placed in $v0.
Callee must restore $s0.
$s0 and $s1 are callee-‐saved. So the
caller does not need to save them.
$t2 is overwriqen so we don’t care if it was used by the callee.
Callee never writes to $s1, so it does not need to save it.
32
Nested calls
• Some machines provide stacks as part of the architecture (VAX, JVM) • Others implement them in so]ware
main() {
B() jal B { C() jal C { } jr
} jr
...
The stack grows and shrinks as procedure calls add (push) data when they are called and remove (pop) data when they return.
Stack is returned to the way B had it before C
was called.
33
Summary: procedure calls
• Procedures need to: – Not corrupt caller resources – Return to the right place when done
• To accomplish this we: – Have conven<ons for who saves registers – Save them on the stack – Use jal and jr $ra to enter and exit procedures
• As long as everyone follows the conven<on we get interoperability
20/09/2012
© 2012 David Black-‐Schaffer 12
34
35
Other ISAs
36
Other ISAs • We’ve looked at MIPS in detail, but there are a lot of
other ISAs: – x86 (Intel/AMD) – ARM (ARM) – JVM (Java) – PPC (IBM, Motorola) – SPARC (Oracle, Fujitsu) – PTX (Nvidia) – etc.
• Let’s take a look at a few issues: – Machine types – ISA classes – Addressing modes – Instruc<on width – CISC vs. RISC
20/09/2012
© 2012 David Black-‐Schaffer 13
37
Basic machine types • Memory-‐to-‐Memory machines
– Instruc<ons can directly manipulate memory • Mem[0] = Mem[1] + Mem[2]
– Problems: • Need to store temporary values in memory • Memory is slow • Memory is big need lots of bits for addresses
• Architectural registers – Hold temporary variables – Far faster than memory faster programs – Fewer addresses in code smaller programs
• But it’s never that simple… – x86 has a few registers and supports memory opera<ons – ARM has many addressing modes that complicate register opera<ons – When you run out of registers you have to “spill” data to memory
E.g., f=(g+h)-‐(i+j)
38
Basic ISA classes
• Accumulator (1 register) – 1 address add A acc acc + mem[A]
• General purpose register file (load/store) – 3 addresses add Ra Rb Rc Ra Rb + Rc load Ra Rb Ra Mem[Rb]
• General purpose register file (Register-‐Memory) – 2 address add Ra B Ra Mem[B]
• Stack (not a register file but an operand stack) – 0 address add tos tos + next
tos = top of stack
• Comparison: – Bytes per instruc<on? Number of instruc<ons?
Cycles per instruc<on?
39
Comparing number of instruc*ons
Code for C = A + B
Stack Accumulator Register (Register-‐memory)
Register (Load-‐store)
Push A Push B Add Pop C
Load A Add A Store C
Load R1, A Add R1, B Store C, R1
Load R1, A Load R2, B Add R3, R2, R1 Store C, R3
JVM PDP-‐8, 8008, 8051 x86 MIPS, PPC, ARM, SPARC
Many very small instruc<ons (no registers)
Good unless you need
temporaries…
Small code, but hard to build in
hardware
Lots of simple instruc<ons.
20/09/2012
© 2012 David Black-‐Schaffer 14
40
Addressing modes (not all are in MIPS) Addressing mode Example Meaning
Register add R4, R3 R4R4+R3
Immediate add R4, 23 R4R4+23
Displacement add R4, 100(R1) R4R4+Mem[100+R1]
Register indirect add R4, (R1) R4R4+Mem[R1]
Indexed/Base add R3,(R1+R2) R3R3+Mem[R1+R2]
Direct or absolute add R1,(1001) R1R1+Mem[1001]
Memory indirect add R1,@(R3) R1R1+Mem[Mem[R3]]
Auto-‐increment add R1,(R2)+ R1R1+Mem[R2]; R2R2+d
Auto-‐decrement add R1,-‐(R2) R2R2-‐d; R1R1+Mem[R2]
Scaled add R1, 100(R2)[R3] R1R1+Mem[100+R2+R3*d]
Ques*on: Why auto-‐increment/decrement? Why scaled? Answer: Helpful for walking through arrays.
MIPS
41
Instruc*on widths (number of bits) • Variable width
– Different widths for different instruc<ons – x86: 2-‐6 bytes for add, 2-‐4 bytes for load – Beqer for genera<ng compact code – Hard for hardware to know where instruc<ons start/stop
• Fixed width – Same width for every instruc<on – MIPS: 4 bytes for add, 4 bytes for load – Larger code size – Easy for hardware to decode
• Mul*ple widths – ARM and MIPS support both 32-‐bit and 16-‐bit instruc<ons – 16-‐bit instruc<ons are limited, but can reduce code size
42
General purpose register machines dominate
• Literally all machines use general purpose registers
• Advantages – Faster than memory
(way faster than memory) – Can hold temporary variables
(easier to break up complex opera<ons) – Easier for compilers to use
(regular structure and uniform use) – Improved code density
(fewer bits to select a register than a memory address)
But we just talked about how x86 was a memory-‐register architecture…what’s going on?
20/09/2012
© 2012 David Black-‐Schaffer 15
43
The truth about ISAs
44
The ISA lies, but you can trust it • The ISA presents a simple view of the processor
– Atomic — instruc<ons execute one at a <me – Sequen*al — instruc<ons execute in order – Flat memory — can access any loca<on easily
AMD’s 8-‐core Bulldozer Processor
Convert x86 into MIPS-‐like instruc*ons
Schedule instruc*ons to run out of order
Execute mul*ple instruc*ons at the same *me
Make memory look faster than it really is (most of the *me)
Turn the power off to save energy.
45
CISC vs. RISC • “Simple” computa*ons are not always simple
– O]en requires a sequence of more primi<ve instruc<ons – E.g., Mem[R1] Mem[R2] + R3
• Architectures that provide complex instruc<ons are Complex Instruc*on Set Compu*ng = CISC
• PRO: assembly programs are easier to write, denser code • CON: hardware gets really, really complicated by rarely-‐used instruc<ons.
Compilers are hard to write.
• Architectures that provide only primi<ve instruc<ons are Reduced Instruc*on Set Compu*ng = RISC
• CON: compiler generate lots of instruc<ons for even simple code • PRO: hardware and compiler are easier to design and op<mize
Everything is RISC inside today to make the hardware simpler
20/09/2012
© 2012 David Black-‐Schaffer 16
46
Summary: ISAs
• Architecture = what’s visible to the program about the machine – Not everything in the implementa<on is “visible” – The implementa<on may not follow the architecture – The invisible stuff is the “microarchitecture” and it’s very messy,
but very fun (huge engineering challenges; lots of money)
• A big piece of the ISA is the assembly language structure – Primi<ve instruc<ons (appear to) execute sequen*ally and
atomically – Formats, computa<ons, addressing modes, etc.
• CISC: lots of complicated instruc<ons • RISC: a few basic instruc<ons • All recent machines are RISC, but x86 is s<ll CISC
(although they do RISC tricks on the inside)
47