machine/assembler language control flow & compiling function calls
DESCRIPTION
COMP 40: Machine Structure and Assembly Language Programming – Spring 2014. Machine/Assembler Language Control Flow & Compiling Function Calls. Noah Mendelsohn Tufts University Email: [email protected] Web: http://www.cs.tufts.edu/~noah. Goals for today – learn:. - PowerPoint PPT PresentationTRANSCRIPT
Machine/Assembler LanguageControl Flow & Compiling Function Calls
Noah MendelsohnTufts UniversityEmail: [email protected]: http://www.cs.tufts.edu/~noah
COMP 40: Machine Structure and
Assembly Language Programming – Spring 2014
© 2010 Noah Mendelsohn
Goals for today – learn:
Review last week’s intro to machine & assembler lang
More examples
Function calls
Control flow
2
© 2010 Noah Mendelsohn
Review & A Few Updates
© 2010 Noah Mendelsohn
General Purpose Registers
4
%ah $al%ax%eax%rax
063
© 2010 Noah Mendelsohn
General Purpose Registers
5
%ah $al%ax%eax%rax
063
mov $123,%rax
© 2010 Noah Mendelsohn
General Purpose Registers
6
%ah $al%ax%eax%rax
063
mov $123,%ax
© 2010 Noah Mendelsohn
General Purpose Registers
7
$al%ah%ax%eax%rax
063
mov $123,%eax
© 2010 Noah Mendelsohn
X86-64 / AMD 64 / IA 64 General Purpose Registers
8
031
%eax
%ecx
%edx
%ebx
%esi
%edi
%esp
%ebp
%ah $al%ax
%ch $cl%cx
%dh $dl%dx
%bh $bl%bx
%si
%di
%sp
%bp
%rax
%rcx
%rdx
%rbx
%rsi
%rdi
%rsp
%rbp
63
%r8
%r9
%r10
%r11
%r12
%r13
%r14
%r15
063
© 2010 Noah Mendelsohn
Classes of AMD 64 registers new since last lecture
General purpose registers– 16 registers, 64 bits each– Used to compute integer and pointer values– Used for integer call/return values to functions
XMM registers– 16 Registers, 128 bits each– Used to compute float/double values, and for parallel integer computation– Used to pass double/float call/return values
X87 Floating Point registers– 8 registers, 80 bits each– Used to compute, pass/return long double
9
© 2010 Noah Mendelsohn10
Machine code (typical)
Simple instructions – each does small unit of work
Stored in memory
Bitpacked into compact binary form
Directly executed by transistor/hardware logic*
* We’ll show later that some machines execute user-provided machine code directly, some convert it to an even lower level machine code and execute that.
© 2010 Noah Mendelsohn11
Here’s the machine code for our function
int times16(int i){
return i * 16;}
Remember:
This is what’s really in memory and what the machine executes!
89 f8c1 e0 04c3
© 2010 Noah Mendelsohn12
Here’s the machine code for our function
int times16(int i){
return i * 16;}
But what does it mean??
Does it really implement the times16 function?
89 f8c1 e0 04c3
© 2010 Noah Mendelsohn13
Consider a simple function in C
int times16(int i){
return i * 16;}
0: 89 f8 mov %edi,%eax 2: c1 e0 04 shl $0x4,%eax 5: c3 retq
© 2010 Noah Mendelsohn14
Consider a simple function in C
int times16(int i){
return i * 16;}
0: 89 f8 mov %edi,%eax 2: c1 e0 04 shl $0x4,%eax 5: c3 retq
Load i into result register %eax
© 2010 Noah Mendelsohn15
Consider a simple function in C
int times16(int i){
return i * 16;}
0: 89 f8 mov %edi,%eax 2: c1 e0 04 shl $0x4,%eax 5: c3 retq
Shifting left by 4 is quick way to multiply
by 16.
© 2010 Noah Mendelsohn16
Consider a simple function in C
int times16(int i){
return i * 16;}
0: 89 f8 mov %edi,%eax 2: c1 e0 04 shl $0x4,%eax 5: c3 retq
Return to caller, which will look for result in
%eax
REMEMBER: you can see the assembler code for any C program by running gcc with the –S flag. Do it!!
© 2010 Noah Mendelsohn17
INTERPRETERSoftware or hardware that does what the instructions say
COMPILERSoftware that converts a program to another language
ASSEMBLERLike a compiler, but the input assembler language is (mostly)1-to-1 with machine instructions
© 2010 Noah Mendelsohn
Very simplified view of computer
18
Cach
e
Memory
© 2010 Noah Mendelsohn
Very simplified view of computer
19
Cach
e
Memory
89 f8c1 e0 04c3
© 2010 Noah Mendelsohn
Instructions fetched and decoded
20
Cach
e
Memory
89 f8c1 e0 04c3
© 2010 Noah Mendelsohn
Instructions fetched and decoded
21
Cach
e
Memory
89 f8c1 e0 04c3
ALU
Arithmetic and Logic Unit executes instructions like add
and shiftupdating registers.
© 2010 Noah Mendelsohn22
The MIPS CPU Architecture
© 2010 Noah Mendelsohn23
Remember
We will teach some highlights of machine/assembler code here in class but you
must take significant time to learn and practice on your own!
Suggestions for teaching yourself:
• Read Bryant and O’Hallaron carefully• Remember that 64 bit section was added later
• Look for online reference material• Some linked from HW5
• Write examples, compile with –S, figure out resulting .s file!
© 2010 Noah Mendelsohn
Moving Data and Calculating Values
© 2010 Noah Mendelsohn25
Examples
%rax // contents of rax is data(%rax) // data pointed to by rax0x10(%rax) // get *(16 + rax)$0x4089a0(, %rax, 8) // Global array index // of 8-byte things(%ebx, %ecx, 2) // Add base and scaled index4(%ebx, %ecx, 2) // Add base and scaled // index plus offset 4
© 2010 Noah Mendelsohn26
Examples
%rax // contents of rax is data(%rax) // data pointed to by rax0x10(%rax) // get *(16 + rax)$0x4089a0(, %rax, 8) // Global array index // of 8-byte things(%ebx, %ecx, 2) // Add base and scaled index4(%ebx, %ecx, 2) // Add base and scaled // index plus offset 4
movl (%ebx, %ecx, 1), %edx // edx <- *(ebx + (ecx * 1))leal (%ebx, %ecx, 1), %edx // edx <- (ebx + (ecx * 1))
© 2010 Noah Mendelsohn27
Examples
%rax // contents of rax is data(%rax) // data pointed to by rax0x10(%rax) // get *(16 + rax)$0x4089a0(, %rax, 8) // Global array index // of 8-byte things(%ebx, %ecx, 2) // Add base and scaled index4(%ebx, %ecx, 2) // Add base and scaled // index plus offset 4
movl (%ebx, %ecx, 1), %edx // edx <- *(ebx + (ecx * 1))leal (%ebx, %ecx, 1), %edx // edx <- (ebx + (ecx * 1))
movl
Moves the data at the address:
similar to:
char *ebxp; char edx = ebxp[ecx]
© 2010 Noah Mendelsohn28
Examples
%rax // contents of rax is data(%rax) // data pointed to by rax0x10(%rax) // get *(16 + rax)$0x4089a0(, %rax, 8) // Global array index // of 8-byte things(%ebx, %ecx, 2) // Add base and scaled index4(%ebx, %ecx, 2) // Add base and scaled // index plus offset 4
movl (%ebx, %ecx, 1), %edx // edx <- *(ebx + (ecx * 1))leal (%ebx, %ecx, 1), %edx // edx <- (ebx + (ecx * 1))
leal
Moves the address itself:
similar to:
char *ebxp; char *edxp = &(ebxp[ecx]);
© 2010 Noah Mendelsohn29
Examples
%rax // contents of rax is data(%rax) // data pointed to by rax0x10(%rax) // get *(16 + rax)$0x4089a0(, %rax, 8) // Global array index // of 8-byte things(%ebx, %ecx, 2) // Add base and scaled index4(%ebx, %ecx, 2) // Add base and scaled // index plus offset 4
movl (%ebx, %ecx, 1), %edx // edx <- *(ebx + (ecx * 1))leal (%ebx, %ecx, 1), %edx // edx <- (ebx + (ecx * 1))movl (%ebx, %ecx, 4), %edx // edx <- *(ebx + (ecx * 4))leal (%ebx, %ecx, 4), %edx // edx <- (ebx + (ecx * 4))
scale factors support indexing larger types
similar to:
int *ebxp; int edxp = ebxp[ecx];
© 2010 Noah Mendelsohn
Control Flow
© 2010 Noah Mendelsohn31
Simple jumps
.L4:
…code here…
j .L4 // jump back to L4
© 2010 Noah Mendelsohn32
Conditional jumps
.L4:movq (%rdi,%rdx), %rcxleaq (%rax,%rcx), %rsitestq %rcx, %rcxcmovg %rsi, %raxaddq $8, %rdxcmpq %r8, %rdxjne .L4 // conditional: jump iff %r8 != %rdf
This technique is the key to compiling if statements, for loops, while loops, etc.
…code here…
© 2010 Noah Mendelsohn
CallingFunctions
© 2010 Noah Mendelsohn
Why have a standard “linkage” for calling functions?
Functions are compiled separately and linked together
We need to standardize enough that function calls will succeed!
Note: optimizing compilers may “cheat” when caller and callee are in the same source file– More on this later
34
© 2010 Noah Mendelsohn
An interesting example
35
int fact(int n){ if (n == 0) return 1; else return n * fact(n - 1);}
See course notes on “Calls and Returns”
© 2010 Noah Mendelsohn
The process memory illusion
Process thinks it's running in a private space
Separated into segments, from address 0
Stack: memory for executing subroutines
Heap: memory for malloc/new
Global static variables
Text segment: where program lives
36
Now we’ll learn how your program uses these!
© 2010 Noah Mendelsohn
The process memory illusion
Process thinks it's running in a private space
Separated into segments, from address 0
Stack: memory for executing subroutines
Heap: memory for malloc/new
Global static variables
Text segment: where program lives
37
Stack
Text(code)
Static initialized
Static uninitialized
Heap(malloc’d)
argv, environ
Loaded with your program
0
© 2010 Noah Mendelsohn
The process memory illusion
Process thinks it's running in a private space
Separated into segments, from address 0
Stack: memory for executing subroutines
Heap: memory for malloc/new
Global static variables
Text segment: where program lives
38
Stack
Text(code)
Static initialized
Static uninitialized
Heap(malloc’d)
argv, environ
Loaded with your program
0
We’re about to study in depth how function calls use the
stack.
© 2010 Noah Mendelsohn
Function calls on Linux/AMD 64
Caller “pushes” return address on stack
Where practical, arguments passed in registers
Exceptions:– Structs, etc.
– Too many
– What can’t be passed in registers is at known offsets from stack pointer!
Return values– In register, typically %rax for integers and pointers
– Exception: structures
Each function gets a stack frame– Leaf functions that make no calls may skip setting one up
39
© 2010 Noah Mendelsohn
The stack – general case
40
Before call
????%rsp
Arguments
Return address
After callq
%rsp
Arguments
%rsp
If callee needs frame
????Return address
Args to next call?
Callee vars
sub $0x{framesize},%rsp
Arguments
fram
esize
© 2010 Noah Mendelsohn
Arguments/return values in registers
41
Arguments and return values passed in registers when types are suitable and when there aren’t too many
Return values usually in %rax, %eax, etc.
Callee may change these and some other registers!
MMX and FP 87 registers used for floating point
Read the specifications for full details!
Operand Size
Argument Number
1 2 3 4 5 6
64 %rdi %rsi %rdx %rcx %r8 %r9
32 %edi %esi %edx %ecx %r8d %r9d
16 %di %si %dx %cx %r8w %r9w
8 %dil %sil %dl %cl %r8b %r9b
© 2010 Noah Mendelsohn
Factorial Revisited
42
int fact(int n){ if (n == 0) return 1; else return n * fact(n - 1);}
fact:.LFB2: pushq %rbx.LCFI0: movq %rdi, %rbx movl $1, %eax testq %rdi, %rdi je .L4 leaq -1(%rdi), %rdi call fact imulq %rbx, %rax.L4: popq %rbx ret
© 2010 Noah Mendelsohn
Function calls on Linux/AMD 64 (cont.)
Much of what you’ve seen can be skipped for “leaf” functions in the call tree
Inlining:– If small procedure is in same source file (or included header): just merge the
code
– Unless function is static (I.e. private to source file, the compiler still needs to create the normal version, in case anyone outside calls it!
Optimizing compilers “cheat”– Don’t build full stack
– Leave return address for called function (last thing A calls B; last think B does is call C B leaves return address on stack and branches to C instead of calling it…when C does normal return, it goes straigt back to A!)
– Many other wild optimizations, always done so other functions can’t tell anything unusual is happening!
43
© 2010 Noah Mendelsohn
Optimized version
44
int fact(int n){ if (n == 0) return 1; else return n * fact(n - 1);}
fact:.LFB2: pushq %rbx.LCFI0: movq %rdi, %rbx movl $1, %eax testq %rdi, %rdi je .L4 leaq -1(%rdi), %rdi call fact imulq %rbx, %rax.L4: popq %rbx ret
Lightly optimized = O1(what we saw before)
fact: .LFB2: testq %rdi, %rdi movl $1, %eax je .L6 .p2align 4,,7 .L5: imulq %rdi, %rax subq $1, %rdi jne .L5 .L6: rep ; ret
Heavily optimized = O2
What happened to the recursion?!?!?
This version doesn’t need to create a stack frame
either!
© 2010 Noah Mendelsohn
Getting the details on function call “linkages”
Bryant and O’Halloran has excellent introduction– Watch for differences between 32 and 64 bit
The official specification:– System V Application Binary Interface: AMD64 Architecture Processor
Supplement
– Find it at: http://www.cs.tufts.edu/comp/40/readings/amd64-abi.pdf
– See especially sections 3.1 and 3.2
45
© 2010 Noah Mendelsohn46
Summary
C code compiled to assembler
Data moved to registers for manipulation
Conditional and jump instructions for control flow
Stack used for function calls
Compilers play all sorts of tricks when compiling code