modular programming, stack frames, and high-level language interfacing

1

Modular Programming, Stack Frames, Modular Programming, Stack Frames, and High-Level Language Interfacingand High-Level Language Interfacing

Read Chapters 8 and 12 of textbookRead Chapters 8 and 12 of textbook

2

Modular ProgrammingModular Programming

Large projects need to be broken into small modules with clean interfaces The way to program a module should only depend

on the interfaces provided by other modules – not their implementation

One possibility would be to place groups of related procedures into different files and then include them with the include directive The include directive instructs the assembler to

include the file (at assembly time) at the place of the directive

We must then insure that the code will be placed in the .code segment and the data will be placed in the .data segment

3

Modular Programming (cont.)Modular Programming (cont.) Hence, in each file, we should always put .code before the

code and .dada before the data. Ex:

File my_prog.asm.386.model flatinclude cs266.inc.data msg1 db "In main",10,0.codemain:

putstr msg1call procAcall procB

retinclude procA.asminclude procB.asmend

File procA.asm.code procA proc

putstr msg2ret

procA endp.datamsg2 db "In procA",10,0File procB.asm.code procB proc

putstr msg3ret

procB endp.datamsg3 db "In procB",10,0

4

Modular Programming (cont.)Modular Programming (cont.)

Hence, by doingBcc32 my_prog.asm

The assembler will create a single object file my_prog.obj which will contain all the included code and data

The scope of each name used (in any included file) will be the object module in which they will be assembled. Here it is my_prog.obj Hence an error will be detected by the assembler if two

different included files use the same name Hence this method of included files should be avoided for

large projects Instead, we should assemble each file separately to obtain a

separate object module for each file and, thus, have a private namespace for each file Make sure, however, to use the .386, .MODEL FLAT, and

END directives in each file.

5

Separately Assembled ModulesSeparately Assembled Modules

However any module that wants to be used need to provide at least one name to be used by others

Use the directive PUBLIC to enable other modules to use names defined in the module where PUBLIC is. Ex:

public procA,varC,labelB Note that the usage is the same for any kind of names

(procedures, variables...) Use the directive EXTRN to declare names that are defined

in other modules But now we need to provide the qualifiers:

PROC for procedure names BYTE, WORD, DWORD... for variable names

Example:extrn procA:proc, varA:dword, varB:word

Place the directives extrn and public just after .model flat

6

ExampleExample

File my_prog.asm.386.model flatextrn procA:proc, procB: procinclude cs266.inc.data msg1 db "In main",10,0.code main:

putstr msg1call procAcall procB

retend

File procA.asm.386.model flatpublic procAinclude iomacros.inc.code procA:

putstr msg1ret

.data msg1 db "In procA",10,0end

File procB.asm.386.model flatpublic procBinclude iomacros.inc.code procB:

putstr msg1ret

.data msg1 db "In procB",10,0end

7

Example (cont.)Example (cont.)

To assemble each file separately and link them do:bcc32 –c procA.asmbcc32 –c procB.asmbcc32 my_prog.asm procA.obj procB.obj

The –c is the “compile only” option: it only produces an object file [no executable file is produced]

The last command will produce my_prog.obj and link all the .obj files to produce my_prog.exe

All .data segments will be concatenated into a single .data segment and all .code segments will be concatenated into a single .code segment

Each .asm file now provides a separate namespace since each file has been assembled separately Note that all three files are using the same name msg1.

These refer to different memory locations since the assembler and linker will produce a different memory address for each variable msg1.

8

The Program’s Entry PointThe Program’s Entry Point

An executable program must have only one entry point (the address of the first instruction to execute).

This entry point must be called “_main” and made public when using bcc32 to assemble and link This is why I have included the following directives in

csi266.inc (near to top of the file)public _mainmain equ <_main>

The second directive makes “main” equivalent to “_main” so that “main” can be used instead to label the entry point.

But since a program must have only one entry point, these two directives must be present only in one .asm file: the one containing the entry point

If the macros in cs266.inc are needed in other modules, then include instead another file, called it iomacros.inc, which is identical to cs266.inc but does not contain these two directives (see previous example again)

9

Using Global VariablesUsing Global Variables

A variable made public in one object module will be accessible to every other object module that will be linked into the same .exe file As long as the other object modules are declaring

this variable to be extern Such a variable, which is said to be global, can be

used by procedures to pass a value across different modules. This mechanism increases the complexity of the

interfaces (since every module must be aware of all the global variables)

Hence the number of global variables should be minimal

10

Global Variable ExampleGlobal Variable Example

File procA.asm.386.model flatpublic procAextrn varA:dwordinclude iomacros.inc

.codeprocA: putint varA retend

File mp.asm.386.model flatpublic varAextrn procA:procinclude cs2661.inc

.data varA dd ?

.codemain: mov varA,333 call procA retend

To assemble and link, you can do:bcc32 mp.asm procA.asm

11

Parameter PassingParameter Passing We currently have two ways to pass parameters to a

procedure By using registers By using global variables

However these mechanisms to pass parameters are not suited if we want To use a variable number of parameters

[Limited # of registers] To permit a procedure to call itself (for using recursion)

[Global variables are static] In these circumstances we can pass parameters via

the stack This is the mechanism of parameter passing used by

high level languages

12

Stack ParametersStack Parameters

Suppose that we have a procedure, called IMUL2, who’s task is to multiply two signed numbers, pushed onto the stack, and return the result into EAX. Let us use IMUL2 like this:

push varA ;push a dword variablepush varB ;another dword variablecall IMUL2 ;result in eax, stack unchanged

add esp,8 ;restore ESP We have assumed that IMUL2 did not changed the stack:

ESP just after returning from IMUL2 is pointing to the same place as it was just before calling IMUL2.

But, since 8 bytes of parameters were pushed on the stack, we need to increase ESP by 8 after returning from IMUL2 Otherwise, ESP would be decreased by 8 at each IMUL2

usage and, consequently, the stack could overflow if the 3 first statements were inside a loop

We say that the stack has been restored by the caller This is the method used by C/C++ compilers

13

Stack Parameters (cont.)Stack Parameters (cont.)

Given that IMUL2 is called that way, we can write it like this:

IMUL2: push ebp mov ebp,esp mov eax,[ebp+12] imul eax,[ebp+8] pop ebp ret

We use EBP to access the stack parameters (not ESP)Compilers are using this method. But, more simply, we could have used ESP instead...

These are called stack frames (or activation records)

varA

varB

ret addr.

ebp orig.

after mov ebp,esp

ebpesp

varA

varB esp

after ret

14

Stack Parameters (cont.)Stack Parameters (cont.)

The other method is to let the called procedure the responsibility of restoring the stackThis is the method used by Pascal compilers

The caller would simply dopush varA push varB call IMUL2 ;do not increm. ESP

But the procedure would now use ret n to returnThis performs a RET instruction and then increments ESP further by n

The called procedure would now be:

IMUL2:

push ebp

mov ebp,esp

mov eax,[ebp+12]

imul eax,[ebp+8]

pop ebp

ret 8Since 8 bytes of parameters have been pushed onto the stack

15

Passing a Variable Number of ParametersPassing a Variable Number of Parameters

To pass a variable number of arguments by the stack just push, as the last parameter, the number of argumentsBy popping this parameter, the procedure knows how much arguments were passed

The caller:push 35push –63push 23push 3 ;# of argscall AddSomeadd esp,16

AddSome proc push ebp push ecx mov ebp,esp

mov ecx,[ebp+12];arg count xor eax,eax ;hold sum add ebp,16 ;last argL1: add eax,[ebp] add ebp,4 ;point to next loop L1

pop ecx pop ebp retAddSome endp

The called procedure:

16

RecursionRecursion A recursive procedure is one that calls itself

Recursive procedures can easily be implemented in ASM when parameter passing is done via the stack

Ex: a C implementation of factorial:

int factorial(int n)

{

if (n<=1) { return 1; }

else { return n*factorial(n-1); }

} An ASM caller needs to push the argument into the

stack:

push 8

call factorial ;result in EAX = 40320

add esp,4 ;restore the stack

17

A Recursive Procedure in ASMA Recursive Procedure in ASM

factorial: mov eax,[esp+4] ;get n cmp eax,1 ;n<=1? ja L1 ;no, continue mov eax,1 ;yes, return 1 jmp exitL1: dec eax push eax ;factorial n-1 call factorial ;result in eax add esp,4 ;restore stack mov ebx,[esp+4] ;get n mul ebx ;edx:eax = eax*ebxexit: ret ;eax = result

Stack usage on Factorial 3:

3ret.add. in main

2ret.add. in fact.

1ret.add. in fact.

18

ExercisesExercises

Ex1: Rewrite the factorial procedure when stack cleaning is done by the called procedure (ie: in the Pascal way)

Ex2: Write a procedure who’s task is to fill with value 0 the first k bytes of a byte array. All parameters must be passed by the stack and stack cleaning must be done by the caller. Give an example of how this procedure would be called.

Ex3: Rewrite the AddSome procedure when stack cleaning is done by the called procedure (ie: in the Pascal way)

19

Why Interfacing with High Level Languages?Why Interfacing with High Level Languages?

Good ASM programs give faster machine code than high level language (HLL) programs because ASM code is closer to machine code

But it takes too long to develop large-scale applications in assembly language instead we first write the application in a HLL then, to optimize speed, we rewrite in ASM the

parts of code that are executed most often we do not need to write too much ASM code since,

typically, the CPU spends most of its time in less then 10% of the application’s code

20

Two Methods for Mixing ASM and HLL CodesTwo Methods for Mixing ASM and HLL Codes

ASM code in a separate ASM module 1) assemble the .asm file into a .obj module 2) compile HLL files into .obj modules 3) link together all .obj modules to obtain the .exe file the most powerful method (and preserves modularity)

Inline ASM code (embedded within HLL code) The easiest method (no linking issues involved). We just use

a preprocessor directive like asm{...} to include asm instructions directly into the HLL code.

But this usually forces the compiler to generate sub optimal code outside the ASM region

We present here only the first method for the C language when using the C/C++ compiler from Borland: bcc32.exe

21

Writing Separate ASM modulesWriting Separate ASM modules

Such an ASM module can contain: Variables and procedures that will be used by

other HLL modules and/or ASM modules ASM instructions that uses variables and

procedures defined in other HLL modules and/or ASM modules

Hence, the ASM programmer must know: The memory model used by the HLL compiler (this

is the flat memory model for bcc32) How external names are generated by the HLL

compiler The calling convention (of procedures) used by the

HLL compiler

22

Generation of ASM code by the C compiler*Generation of ASM code by the C compiler*

Is the only way to discover what the C compiler is really doing

To generate hello.asm from hello.c , just do:bcc32 -S hello.c

Immediate observations: uses the following 32-bit segments named:

_TEXT : for code_DATA : for data (and the stack)_BSS : for un-initialized data

you can remove all references to the ?debug macro and labels that are not used

I have cleaned up the file hello.asm and removed unused directives to obtain the simpler (but equivalent) file helloClean.asm

23

The Naming Convention of C CompilersThe Naming Convention of C Compilers

These compilers insert a “_” in front of all names used in C source files “main” has been change to “_main” “printf” has been changed to “_printf” ....

Hence, all public names in a ASM file that are to be used from a C source file should start with a “_” Ex: if a C file contains a call to the myProc()

function, then this procedure in a ASM file should be named _myProc

Names recognized by C compilers are case sensitive Fortunately case sensitiveness for user defined

names is preserved when using the bcc32 command to assemble

24

Further Observations on helloClean.asmFurther Observations on helloClean.asm

Bcc32 assumes, by default, that the entry point is _main and requires _main to be public This is why I have included these 2 directives in

cs266.incpublic _main

main equ <_main>

The value returned by _main is in EAX The argument of _printf (address of a null

terminating string) is pushed on stack and stack cleanup is done by the caller

25

How to write an ASM procedure that is How to write an ASM procedure that is called by a C program?*called by a C program?* To discover how to do this, let us first write a C program

comp.c that uses a C function Then use bcc32 to convert this C program into the ASM

program comp.asm We see that two bytes are allocated to a short int and 4

bytes to a int. In fact we have the following correspondence between C/C++ types and ASM types for bcc32

C Data Type Storage Bytes ASM Typechar, unsigned char 1 byteshort int 2 wordint, long 4 dwordpointer 4 dwordfloat 4 dworddouble 8 qword

26

How to write an ASM procedure that is How to write an ASM procedure that is called by a C program? (cont.)called by a C program? (cont.) To call f1(), bcc32 has generated the following instructions

; z = f1(a, b, c, d); ; EAX = a, EDX = b, ECX = c, ESI = dpush esipush ecxpush edxpush eaxcall _f1add esp,16

We see that arguments are pushed onto the stack by starting from the last one and that the stack is cleaned by the caller. This is known as the C calling convention.

Also: arguments passed to a function are pushed onto the stack as dwords (even when the corresponding types are only 2 bytes)

Notice that _f1 does not preserve the content of ECX and EDX (but it preserves EBX and EBP).

27

How to write an ASM procedure that is How to write an ASM procedure that is called by a C program? (cont.)called by a C program? (cont.)

When a C function returns an integer value that is less or equal to 4 bytes, it is returned in EAX But when a C function returns a float or a double, it is

returned in ST(0) [see lectures on floating-point arithmetic] Bcc32 assumes that the content of EBX, EBP, ESI, and EDI

will be preserved by a procedure: make sure to preserve them (you do not need to preserve the other registers).

Therefore, we can write the f1() function in ASM and place it in the fff1.asm file Notice that I have optimized f1() by removing one MOV and

two MOVSX instructions and by using ESP to access the stack arguments.

The C caller is now fcomp.c and the fcomp.exe file is obtained by doing:

bcc32 fcomp.c fff1.asm

28

ExercisesExercises

Ex4: In C/C++, a function argument can be a pointer to a function. This enables us to construct functions that can use as an argument another function. An example would be a function who’s task is to find the maximum value of (another, arbitrarily chosen) function in some interval. How can you do this in ASM? Generate ASM code from a C compiler to find out

Ex5: What is the difference between a C++ pointer and a C++ reference? What is the difference between passing a pointer and passing a reference to a function? Generate ASM code from a C compiler to find out

29

Memory AllocationMemory Allocation

The mem.c program allocates storage for variables and arrays in various ways

Inspection of mem.asm reveals how it is done: Variables are first allocated to registers and then to

the stack “Normal” arrays are allocated to the stack: EBP is

used to access the array elements Dynamic allocation with calloc() returns an offset

address into EAX: array elements are stored starting at that address. The allocated memory block is located in the heap.

30

ebp ebp-72 z[16]

ebp–4 d ebp-76 z[15]

ebp–8 e ebp-80 z[14]

ebp-12 f ebp-84 z[13]

ebp-16 g ebp-88 z[12]

ebp-20 h ebp-92 z[11]

ebp-24 i ebp-96 z[10]

ebp-28 j ebp-100 z[9]

ebp-32 k ebp-104 z[8]

ebp-36 l ebp-108 z[7]

ebp-40 z[24] ebp-112 z[6]

ebp-44 z[23] ebp-116 z[5]

ebp-48 z[22] ebp-120 z[4]

ebp-52 z[21] ebp-124 z[3]

ebp-56 z[20] ebp-128 z[2]

ebp-60 z[19] ebp-132 z[1]

ebp-64 z[18] ebp-136 z[0]

ebp-68 z[17]

EBX = a

ESI = b

EDI = c

ECX = m

EAX = count

EDX = y

Memory AllocationMemory Allocation mem.c*

Stack Setup

Register Usage

31

Memory Allocation (cont.)Memory Allocation (cont.)

A static variable (in C) is defined with the keyword “static” its value is preserved through successive

invocations of the function inside which it is defined A automatic variable (in C) is defined without the

keyword “static” its value is not preserved through successive

invocations of the function inside which it is defined Ex: static.asm is obtained from static.c

automatic variables are allocated on the stackThe stack frame thus contains all the “environment” of a procedure

static variables are permanently allocated on the data segment and given a name

32

Win32 AssemblyWin32 Assembly

The bcc32.exe automatically links with the import32.lib library

This enable us to call directly most of the Win32 API procedures

msgbox.asm is a minimal Win32 assembly program that calls the Windows MessageBoxA procedure to display a message box

Note that: Stack cleaning is done by the Win32 procedure Win32 procedure names do not start with “_”

Practical Win32 apps are much more complex than this one...

modular programming, stack frames, and high-level language interfacing

Documents

included code

code main

executable file

asminclude procb

data msg1 db

code segment

model flatextrn proca

single object file