machine/assembler language control flow & compiling function calls noah mendelsohn tufts...

46
Machine/Assembler Language Control Flow & Compiling Function Calls Noah Mendelsohn Tufts University Email: [email protected] Web: http://www.cs.tufts.edu/~noah COMP 40: Machine Structure and Assembly Language Programming – Spring 2014

Upload: roderick-fox

Post on 31-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Machine/Assembler LanguageControl Flow & Compiling Function Calls

Noah MendelsohnTufts UniversityEmail: [email protected]: http://www.cs.tufts.edu/~noah

COMP 40: Machine Structure and

Assembly Language Programming – Spring 2014

© 2010 Noah Mendelsohn

Goals for today – learn:

Review last week’s intro to machine & assembler lang

More examples

Function calls

Control flow

2

© 2010 Noah Mendelsohn

Review & A Few Updates

© 2010 Noah Mendelsohn

General Purpose Registers

4

%ah $al%ax%eax%rax

063

© 2010 Noah Mendelsohn

General Purpose Registers

5

%ah $al%ax%eax%rax

063

mov $123,%rax

© 2010 Noah Mendelsohn

General Purpose Registers

6

%ah $al%ax%eax%rax

063

mov $123,%ax

© 2010 Noah Mendelsohn

General Purpose Registers

7

$al%ah%ax%eax%rax

063

mov $123,%eax

© 2010 Noah Mendelsohn

X86-64 / AMD 64 / IA 64 General Purpose Registers

8

031

%eax

%ecx

%edx

%ebx

%esi

%edi

%esp

%ebp

%ah $al%ax

%ch $cl%cx

%dh $dl%dx

%bh $bl%bx

%si

%di

%sp

%bp

%rax

%rcx

%rdx

%rbx

%rsi

%rdi

%rsp

%rbp

63

%r8

%r9

%r10

%r11

%r12

%r13

%r14

%r15

063

© 2010 Noah Mendelsohn

Classes of AMD 64 registers new since last lecture

General purpose registers– 16 registers, 64 bits each– Used to compute integer and pointer values– Used for integer call/return values to functions

XMM registers– 16 Registers, 128 bits each– Used to compute float/double values, and for parallel integer computation– Used to pass double/float call/return values

X87 Floating Point registers– 8 registers, 80 bits each– Used to compute, pass/return long double

9

© 2010 Noah Mendelsohn10

Machine code (typical)

Simple instructions – each does small unit of work

Stored in memory

Bitpacked into compact binary form

Directly executed by transistor/hardware logic*

* We’ll show later that some machines execute user-provided machine code directly, some convert it to an even lower level machine code and execute that.

© 2010 Noah Mendelsohn11

Here’s the machine code for our function

int times16(int i){

return i * 16;}

Remember:

This is what’s really in memory and what the machine executes!

89 f8c1 e0 04c3

© 2010 Noah Mendelsohn12

Here’s the machine code for our function

int times16(int i){

return i * 16;}

But what does it mean??

Does it really implement the times16 function?

89 f8c1 e0 04c3

© 2010 Noah Mendelsohn13

Consider a simple function in C

int times16(int i){

return i * 16;}

0: 89 f8 mov %edi,%eax 2: c1 e0 04 shl $0x4,%eax 5: c3 retq

© 2010 Noah Mendelsohn14

Consider a simple function in C

int times16(int i){

return i * 16;}

0: 89 f8 mov %edi,%eax 2: c1 e0 04 shl $0x4,%eax 5: c3 retq

Load i into result register %eax

© 2010 Noah Mendelsohn15

Consider a simple function in C

int times16(int i){

return i * 16;}

0: 89 f8 mov %edi,%eax 2: c1 e0 04 shl $0x4,%eax 5: c3 retq

Shifting left by 4 is quick way to multiply

by 16.

© 2010 Noah Mendelsohn16

Consider a simple function in C

int times16(int i){

return i * 16;}

0: 89 f8 mov %edi,%eax 2: c1 e0 04 shl $0x4,%eax 5: c3 retq

Return to caller, which will look for result in

%eax

REMEMBER: you can see the assembler code for any C program by running gcc with the –S flag. Do it!!

© 2010 Noah Mendelsohn17

INTERPRETERSoftware or hardware that does what the instructions say

COMPILERSoftware that converts a program to another language

ASSEMBLERLike a compiler, but the input assembler language is (mostly)1-to-1 with machine instructions

© 2010 Noah Mendelsohn

Very simplified view of computer

18

Cach

e

Memory

© 2010 Noah Mendelsohn

Very simplified view of computer

19

Cach

e

Memory

89 f8c1 e0 04c3

© 2010 Noah Mendelsohn

Instructions fetched and decoded

20

Cach

e

Memory

89 f8c1 e0 04c3

© 2010 Noah Mendelsohn

Instructions fetched and decoded

21

Cach

e

Memory

89 f8c1 e0 04c3

ALU

Arithmetic and Logic Unit executes instructions like add

and shiftupdating registers.

© 2010 Noah Mendelsohn22

The MIPS CPU Architecture

© 2010 Noah Mendelsohn23

Remember

We will teach some highlights of machine/assembler code here in class but you

must take significant time to learn and practice on your own!

Suggestions for teaching yourself:

• Read Bryant and O’Hallaron carefully• Remember that 64 bit section was added later

• Look for online reference material• Some linked from HW5

• Write examples, compile with –S, figure out resulting .s file!

© 2010 Noah Mendelsohn

Moving Data and Calculating Values

© 2010 Noah Mendelsohn25

Examples

%rax // contents of rax is data(%rax) // data pointed to by rax0x10(%rax) // get *(16 + rax)$0x4089a0(, %rax, 8) // Global array index // of 8-byte things(%ebx, %ecx, 2) // Add base and scaled index4(%ebx, %ecx, 2) // Add base and scaled // index plus offset 4

© 2010 Noah Mendelsohn26

Examples

%rax // contents of rax is data(%rax) // data pointed to by rax0x10(%rax) // get *(16 + rax)$0x4089a0(, %rax, 8) // Global array index // of 8-byte things(%ebx, %ecx, 2) // Add base and scaled index4(%ebx, %ecx, 2) // Add base and scaled // index plus offset 4

movl (%ebx, %ecx, 1), %edx // edx <- *(ebx + (ecx * 1))leal (%ebx, %ecx, 1), %edx // edx <- (ebx + (ecx * 1))

© 2010 Noah Mendelsohn27

Examples

%rax // contents of rax is data(%rax) // data pointed to by rax0x10(%rax) // get *(16 + rax)$0x4089a0(, %rax, 8) // Global array index // of 8-byte things(%ebx, %ecx, 2) // Add base and scaled index4(%ebx, %ecx, 2) // Add base and scaled // index plus offset 4

movl (%ebx, %ecx, 1), %edx // edx <- *(ebx + (ecx * 1))leal (%ebx, %ecx, 1), %edx // edx <- (ebx + (ecx * 1))

movl

Moves the data at the address:

similar to:

char *ebxp; char edx = ebxp[ecx]

© 2010 Noah Mendelsohn28

Examples

%rax // contents of rax is data(%rax) // data pointed to by rax0x10(%rax) // get *(16 + rax)$0x4089a0(, %rax, 8) // Global array index // of 8-byte things(%ebx, %ecx, 2) // Add base and scaled index4(%ebx, %ecx, 2) // Add base and scaled // index plus offset 4

movl (%ebx, %ecx, 1), %edx // edx <- *(ebx + (ecx * 1))leal (%ebx, %ecx, 1), %edx // edx <- (ebx + (ecx * 1))

leal

Moves the address itself:

similar to:

char *ebxp; char *edxp = &(ebxp[ecx]);

© 2010 Noah Mendelsohn29

Examples

%rax // contents of rax is data(%rax) // data pointed to by rax0x10(%rax) // get *(16 + rax)$0x4089a0(, %rax, 8) // Global array index // of 8-byte things(%ebx, %ecx, 2) // Add base and scaled index4(%ebx, %ecx, 2) // Add base and scaled // index plus offset 4

movl (%ebx, %ecx, 1), %edx // edx <- *(ebx + (ecx * 1))leal (%ebx, %ecx, 1), %edx // edx <- (ebx + (ecx * 1))movl (%ebx, %ecx, 4), %edx // edx <- *(ebx + (ecx * 4))leal (%ebx, %ecx, 4), %edx // edx <- (ebx + (ecx * 4))

scale factors support indexing larger types

similar to:

int *ebxp; int edxp = ebxp[ecx];

© 2010 Noah Mendelsohn

Control Flow

© 2010 Noah Mendelsohn31

Simple jumps

.L4:

…code here…

j .L4 // jump back to L4

© 2010 Noah Mendelsohn32

Conditional jumps

.L4:movq (%rdi,%rdx), %rcxleaq (%rax,%rcx), %rsitestq %rcx, %rcxcmovg %rsi, %raxaddq $8, %rdxcmpq %r8, %rdxjne .L4 // conditional: jump iff %r8 != %rdf

This technique is the key to compiling if statements, for loops, while loops, etc.

…code here…

© 2010 Noah Mendelsohn

CallingFunctions

© 2010 Noah Mendelsohn

Why have a standard “linkage” for calling functions?

Functions are compiled separately and linked together

We need to standardize enough that function calls will succeed!

Note: optimizing compilers may “cheat” when caller and callee are in the same source file– More on this later

34

© 2010 Noah Mendelsohn

An interesting example

35

int fact(int n){ if (n == 0) return 1; else return n * fact(n - 1);}

See course notes on “Calls and Returns”

© 2010 Noah Mendelsohn

The process memory illusion

Process thinks it's running in a private space

Separated into segments, from address 0

Stack: memory for executing subroutines

Heap: memory for malloc/new

Global static variables

Text segment: where program lives

36

Now we’ll learn how your program uses these!

© 2010 Noah Mendelsohn

The process memory illusion

Process thinks it's running in a private space

Separated into segments, from address 0

Stack: memory for executing subroutines

Heap: memory for malloc/new

Global static variables

Text segment: where program lives

37

Stack

Text(code)

Static initialized

Static uninitialized

Heap(malloc’d)

argv, environ

Loaded with your program

0

© 2010 Noah Mendelsohn

The process memory illusion

Process thinks it's running in a private space

Separated into segments, from address 0

Stack: memory for executing subroutines

Heap: memory for malloc/new

Global static variables

Text segment: where program lives

38

Stack

Text(code)

Static initialized

Static uninitialized

Heap(malloc’d)

argv, environ

Loaded with your program

0

We’re about to study in depth how function calls use the

stack.

© 2010 Noah Mendelsohn

Function calls on Linux/AMD 64

Caller “pushes” return address on stack

Where practical, arguments passed in registers

Exceptions:– Structs, etc.

– Too many

– What can’t be passed in registers is at known offsets from stack pointer!

Return values– In register, typically %rax for integers and pointers

– Exception: structures

Each function gets a stack frame– Leaf functions that make no calls may skip setting one up

39

© 2010 Noah Mendelsohn

The stack – general case

40

Before call

????%rsp

Arguments

Return address

After callq

%rsp

Arguments

%rsp

If callee needs frame

????Return address

Args to next call?

Callee vars

sub $0x{framesize},%rsp

Arguments

fram

esize

© 2010 Noah Mendelsohn

Arguments/return values in registers

41

Arguments and return values passed in registers when types are suitable and when there aren’t too many

Return values usually in %rax, %eax, etc.

Callee may change these and some other registers!

MMX and FP 87 registers used for floating point

Read the specifications for full details!

Operand Size

Argument Number

1 2 3 4 5 6

64 %rdi %rsi %rdx %rcx %r8 %r9

32 %edi %esi %edx %ecx %r8d %r9d

16 %di %si %dx %cx %r8w %r9w

8 %dil %sil %dl %cl %r8b %r9b

© 2010 Noah Mendelsohn

Factorial Revisited

42

int fact(int n){ if (n == 0) return 1; else return n * fact(n - 1);}

fact:.LFB2: pushq %rbx.LCFI0: movq %rdi, %rbx movl $1, %eax testq %rdi, %rdi je .L4 leaq -1(%rdi), %rdi call fact imulq %rbx, %rax.L4: popq %rbx ret

© 2010 Noah Mendelsohn

Function calls on Linux/AMD 64 (cont.)

Much of what you’ve seen can be skipped for “leaf” functions in the call tree

Inlining:– If small procedure is in same source file (or included header): just merge the

code

– Unless function is static (I.e. private to source file, the compiler still needs to create the normal version, in case anyone outside calls it!

Optimizing compilers “cheat”– Don’t build full stack

– Leave return address for called function (last thing A calls B; last think B does is call C B leaves return address on stack and branches to C instead of calling it…when C does normal return, it goes straigt back to A!)

– Many other wild optimizations, always done so other functions can’t tell anything unusual is happening!

43

© 2010 Noah Mendelsohn

Optimized version

44

int fact(int n){ if (n == 0) return 1; else return n * fact(n - 1);}

fact:.LFB2: pushq %rbx.LCFI0: movq %rdi, %rbx movl $1, %eax testq %rdi, %rdi je .L4 leaq -1(%rdi), %rdi call fact imulq %rbx, %rax.L4: popq %rbx ret

Lightly optimized = O1(what we saw before)

fact: .LFB2:     testq    %rdi, %rdi     movl    $1, %eax     je    .L6     .p2align 4,,7 .L5:     imulq    %rdi, %rax     subq    $1, %rdi     jne    .L5 .L6:     rep ; ret

Heavily optimized = O2

What happened to the recursion?!?!?

This version doesn’t need to create a stack frame

either!

© 2010 Noah Mendelsohn

Getting the details on function call “linkages”

Bryant and O’Halloran has excellent introduction– Watch for differences between 32 and 64 bit

The official specification:– System V Application Binary Interface: AMD64 Architecture Processor

Supplement

– Find it at: http://www.cs.tufts.edu/comp/40/readings/amd64-abi.pdf

– See especially sections 3.1 and 3.2

45

© 2010 Noah Mendelsohn46

Summary

C code compiled to assembler

Data moved to registers for manipulation

Conditional and jump instructions for control flow

Stack used for function calls

Compilers play all sorts of tricks when compiling code