modular programming, stack frames, and high-level language interfacing
DESCRIPTION
Modular Programming, Stack Frames, and High-Level Language Interfacing. Read Chapters 8 and 12 of textbook. Modular Programming. Large projects need to be broken into small modules with clean interfaces - PowerPoint PPT PresentationTRANSCRIPT
1
Modular Programming, Stack Frames, Modular Programming, Stack Frames, and High-Level Language Interfacingand High-Level Language Interfacing
Read Chapters 8 and 12 of textbookRead Chapters 8 and 12 of textbook
2
Modular ProgrammingModular Programming
Large projects need to be broken into small modules with clean interfaces The way to program a module should only depend
on the interfaces provided by other modules – not their implementation
One possibility would be to place groups of related procedures into different files and then include them with the include directive The include directive instructs the assembler to
include the file (at assembly time) at the place of the directive
We must then insure that the code will be placed in the .code segment and the data will be placed in the .data segment
3
Modular Programming (cont.)Modular Programming (cont.) Hence, in each file, we should always put .code before the
code and .dada before the data. Ex:
File my_prog.asm.386.model flatinclude cs266.inc.data msg1 db "In main",10,0.codemain:
putstr msg1call procAcall procB
retinclude procA.asminclude procB.asmend
File procA.asm.code procA proc
putstr msg2ret
procA endp.datamsg2 db "In procA",10,0File procB.asm.code procB proc
putstr msg3ret
procB endp.datamsg3 db "In procB",10,0
4
Modular Programming (cont.)Modular Programming (cont.)
Hence, by doingBcc32 my_prog.asm
The assembler will create a single object file my_prog.obj which will contain all the included code and data
The scope of each name used (in any included file) will be the object module in which they will be assembled. Here it is my_prog.obj Hence an error will be detected by the assembler if two
different included files use the same name Hence this method of included files should be avoided for
large projects Instead, we should assemble each file separately to obtain a
separate object module for each file and, thus, have a private namespace for each file Make sure, however, to use the .386, .MODEL FLAT, and
END directives in each file.
5
Separately Assembled ModulesSeparately Assembled Modules
However any module that wants to be used need to provide at least one name to be used by others
Use the directive PUBLIC to enable other modules to use names defined in the module where PUBLIC is. Ex:
public procA,varC,labelB Note that the usage is the same for any kind of names
(procedures, variables...) Use the directive EXTRN to declare names that are defined
in other modules But now we need to provide the qualifiers:
PROC for procedure names BYTE, WORD, DWORD... for variable names
Example:extrn procA:proc, varA:dword, varB:word
Place the directives extrn and public just after .model flat
6
ExampleExample
File my_prog.asm.386.model flatextrn procA:proc, procB: procinclude cs266.inc.data msg1 db "In main",10,0.code main:
putstr msg1call procAcall procB
retend
File procA.asm.386.model flatpublic procAinclude iomacros.inc.code procA:
putstr msg1ret
.data msg1 db "In procA",10,0end
File procB.asm.386.model flatpublic procBinclude iomacros.inc.code procB:
putstr msg1ret
.data msg1 db "In procB",10,0end
7
Example (cont.)Example (cont.)
To assemble each file separately and link them do:bcc32 –c procA.asmbcc32 –c procB.asmbcc32 my_prog.asm procA.obj procB.obj
The –c is the “compile only” option: it only produces an object file [no executable file is produced]
The last command will produce my_prog.obj and link all the .obj files to produce my_prog.exe
All .data segments will be concatenated into a single .data segment and all .code segments will be concatenated into a single .code segment
Each .asm file now provides a separate namespace since each file has been assembled separately Note that all three files are using the same name msg1.
These refer to different memory locations since the assembler and linker will produce a different memory address for each variable msg1.
8
The Program’s Entry PointThe Program’s Entry Point
An executable program must have only one entry point (the address of the first instruction to execute).
This entry point must be called “_main” and made public when using bcc32 to assemble and link This is why I have included the following directives in
csi266.inc (near to top of the file)public _mainmain equ <_main>
The second directive makes “main” equivalent to “_main” so that “main” can be used instead to label the entry point.
But since a program must have only one entry point, these two directives must be present only in one .asm file: the one containing the entry point
If the macros in cs266.inc are needed in other modules, then include instead another file, called it iomacros.inc, which is identical to cs266.inc but does not contain these two directives (see previous example again)
9
Using Global VariablesUsing Global Variables
A variable made public in one object module will be accessible to every other object module that will be linked into the same .exe file As long as the other object modules are declaring
this variable to be extern Such a variable, which is said to be global, can be
used by procedures to pass a value across different modules. This mechanism increases the complexity of the
interfaces (since every module must be aware of all the global variables)
Hence the number of global variables should be minimal
10
Global Variable ExampleGlobal Variable Example
File procA.asm.386.model flatpublic procAextrn varA:dwordinclude iomacros.inc
.codeprocA: putint varA retend
File mp.asm.386.model flatpublic varAextrn procA:procinclude cs2661.inc
.data varA dd ?
.codemain: mov varA,333 call procA retend
To assemble and link, you can do:bcc32 mp.asm procA.asm
11
Parameter PassingParameter Passing We currently have two ways to pass parameters to a
procedure By using registers By using global variables
However these mechanisms to pass parameters are not suited if we want To use a variable number of parameters
[Limited # of registers] To permit a procedure to call itself (for using recursion)
[Global variables are static] In these circumstances we can pass parameters via
the stack This is the mechanism of parameter passing used by
high level languages
12
Stack ParametersStack Parameters
Suppose that we have a procedure, called IMUL2, who’s task is to multiply two signed numbers, pushed onto the stack, and return the result into EAX. Let us use IMUL2 like this:
push varA ;push a dword variablepush varB ;another dword variablecall IMUL2 ;result in eax, stack unchanged
add esp,8 ;restore ESP We have assumed that IMUL2 did not changed the stack:
ESP just after returning from IMUL2 is pointing to the same place as it was just before calling IMUL2.
But, since 8 bytes of parameters were pushed on the stack, we need to increase ESP by 8 after returning from IMUL2 Otherwise, ESP would be decreased by 8 at each IMUL2
usage and, consequently, the stack could overflow if the 3 first statements were inside a loop
We say that the stack has been restored by the caller This is the method used by C/C++ compilers
13
Stack Parameters (cont.)Stack Parameters (cont.)
Given that IMUL2 is called that way, we can write it like this:
IMUL2: push ebp mov ebp,esp mov eax,[ebp+12] imul eax,[ebp+8] pop ebp ret
We use EBP to access the stack parameters (not ESP)Compilers are using this method. But, more simply, we could have used ESP instead...
These are called stack frames (or activation records)
varA
varB
ret addr.
ebp orig.
after mov ebp,esp
ebpesp
varA
varB esp
after ret
14
Stack Parameters (cont.)Stack Parameters (cont.)
The other method is to let the called procedure the responsibility of restoring the stackThis is the method used by Pascal compilers
The caller would simply dopush varA push varB call IMUL2 ;do not increm. ESP
But the procedure would now use ret n to returnThis performs a RET instruction and then increments ESP further by n
The called procedure would now be:
IMUL2:
push ebp
mov ebp,esp
mov eax,[ebp+12]
imul eax,[ebp+8]
pop ebp
ret 8Since 8 bytes of parameters have been pushed onto the stack
15
Passing a Variable Number of ParametersPassing a Variable Number of Parameters
To pass a variable number of arguments by the stack just push, as the last parameter, the number of argumentsBy popping this parameter, the procedure knows how much arguments were passed
The caller:push 35push –63push 23push 3 ;# of argscall AddSomeadd esp,16
AddSome proc push ebp push ecx mov ebp,esp
mov ecx,[ebp+12];arg count xor eax,eax ;hold sum add ebp,16 ;last argL1: add eax,[ebp] add ebp,4 ;point to next loop L1
pop ecx pop ebp retAddSome endp
The called procedure:
16
RecursionRecursion A recursive procedure is one that calls itself
Recursive procedures can easily be implemented in ASM when parameter passing is done via the stack
Ex: a C implementation of factorial:
int factorial(int n)
{
if (n<=1) { return 1; }
else { return n*factorial(n-1); }
} An ASM caller needs to push the argument into the
stack:
push 8
call factorial ;result in EAX = 40320
add esp,4 ;restore the stack
17
A Recursive Procedure in ASMA Recursive Procedure in ASM
factorial: mov eax,[esp+4] ;get n cmp eax,1 ;n<=1? ja L1 ;no, continue mov eax,1 ;yes, return 1 jmp exitL1: dec eax push eax ;factorial n-1 call factorial ;result in eax add esp,4 ;restore stack mov ebx,[esp+4] ;get n mul ebx ;edx:eax = eax*ebxexit: ret ;eax = result
Stack usage on Factorial 3:
3ret.add. in main
2ret.add. in fact.
1ret.add. in fact.
18
ExercisesExercises
Ex1: Rewrite the factorial procedure when stack cleaning is done by the called procedure (ie: in the Pascal way)
Ex2: Write a procedure who’s task is to fill with value 0 the first k bytes of a byte array. All parameters must be passed by the stack and stack cleaning must be done by the caller. Give an example of how this procedure would be called.
Ex3: Rewrite the AddSome procedure when stack cleaning is done by the called procedure (ie: in the Pascal way)
19
Why Interfacing with High Level Languages?Why Interfacing with High Level Languages?
Good ASM programs give faster machine code than high level language (HLL) programs because ASM code is closer to machine code
But it takes too long to develop large-scale applications in assembly language instead we first write the application in a HLL then, to optimize speed, we rewrite in ASM the
parts of code that are executed most often we do not need to write too much ASM code since,
typically, the CPU spends most of its time in less then 10% of the application’s code
20
Two Methods for Mixing ASM and HLL CodesTwo Methods for Mixing ASM and HLL Codes
ASM code in a separate ASM module 1) assemble the .asm file into a .obj module 2) compile HLL files into .obj modules 3) link together all .obj modules to obtain the .exe file the most powerful method (and preserves modularity)
Inline ASM code (embedded within HLL code) The easiest method (no linking issues involved). We just use
a preprocessor directive like asm{...} to include asm instructions directly into the HLL code.
But this usually forces the compiler to generate sub optimal code outside the ASM region
We present here only the first method for the C language when using the C/C++ compiler from Borland: bcc32.exe
21
Writing Separate ASM modulesWriting Separate ASM modules
Such an ASM module can contain: Variables and procedures that will be used by
other HLL modules and/or ASM modules ASM instructions that uses variables and
procedures defined in other HLL modules and/or ASM modules
Hence, the ASM programmer must know: The memory model used by the HLL compiler (this
is the flat memory model for bcc32) How external names are generated by the HLL
compiler The calling convention (of procedures) used by the
HLL compiler
22
Generation of ASM code by the C compiler*Generation of ASM code by the C compiler*
Is the only way to discover what the C compiler is really doing
To generate hello.asm from hello.c , just do:bcc32 -S hello.c
Immediate observations: uses the following 32-bit segments named:
_TEXT : for code_DATA : for data (and the stack)_BSS : for un-initialized data
you can remove all references to the ?debug macro and labels that are not used
I have cleaned up the file hello.asm and removed unused directives to obtain the simpler (but equivalent) file helloClean.asm
23
The Naming Convention of C CompilersThe Naming Convention of C Compilers
These compilers insert a “_” in front of all names used in C source files “main” has been change to “_main” “printf” has been changed to “_printf” ....
Hence, all public names in a ASM file that are to be used from a C source file should start with a “_” Ex: if a C file contains a call to the myProc()
function, then this procedure in a ASM file should be named _myProc
Names recognized by C compilers are case sensitive Fortunately case sensitiveness for user defined
names is preserved when using the bcc32 command to assemble
24
Further Observations on helloClean.asmFurther Observations on helloClean.asm
Bcc32 assumes, by default, that the entry point is _main and requires _main to be public This is why I have included these 2 directives in
cs266.incpublic _main
main equ <_main>
The value returned by _main is in EAX The argument of _printf (address of a null
terminating string) is pushed on stack and stack cleanup is done by the caller
25
How to write an ASM procedure that is How to write an ASM procedure that is called by a C program?*called by a C program?* To discover how to do this, let us first write a C program
comp.c that uses a C function Then use bcc32 to convert this C program into the ASM
program comp.asm We see that two bytes are allocated to a short int and 4
bytes to a int. In fact we have the following correspondence between C/C++ types and ASM types for bcc32
C Data Type Storage Bytes ASM Typechar, unsigned char 1 byteshort int 2 wordint, long 4 dwordpointer 4 dwordfloat 4 dworddouble 8 qword
26
How to write an ASM procedure that is How to write an ASM procedure that is called by a C program? (cont.)called by a C program? (cont.) To call f1(), bcc32 has generated the following instructions
; z = f1(a, b, c, d); ; EAX = a, EDX = b, ECX = c, ESI = dpush esipush ecxpush edxpush eaxcall _f1add esp,16
We see that arguments are pushed onto the stack by starting from the last one and that the stack is cleaned by the caller. This is known as the C calling convention.
Also: arguments passed to a function are pushed onto the stack as dwords (even when the corresponding types are only 2 bytes)
Notice that _f1 does not preserve the content of ECX and EDX (but it preserves EBX and EBP).
27
How to write an ASM procedure that is How to write an ASM procedure that is called by a C program? (cont.)called by a C program? (cont.)
When a C function returns an integer value that is less or equal to 4 bytes, it is returned in EAX But when a C function returns a float or a double, it is
returned in ST(0) [see lectures on floating-point arithmetic] Bcc32 assumes that the content of EBX, EBP, ESI, and EDI
will be preserved by a procedure: make sure to preserve them (you do not need to preserve the other registers).
Therefore, we can write the f1() function in ASM and place it in the fff1.asm file Notice that I have optimized f1() by removing one MOV and
two MOVSX instructions and by using ESP to access the stack arguments.
The C caller is now fcomp.c and the fcomp.exe file is obtained by doing:
bcc32 fcomp.c fff1.asm
28
ExercisesExercises
Ex4: In C/C++, a function argument can be a pointer to a function. This enables us to construct functions that can use as an argument another function. An example would be a function who’s task is to find the maximum value of (another, arbitrarily chosen) function in some interval. How can you do this in ASM? Generate ASM code from a C compiler to find out
Ex5: What is the difference between a C++ pointer and a C++ reference? What is the difference between passing a pointer and passing a reference to a function? Generate ASM code from a C compiler to find out
29
Memory AllocationMemory Allocation
The mem.c program allocates storage for variables and arrays in various ways
Inspection of mem.asm reveals how it is done: Variables are first allocated to registers and then to
the stack “Normal” arrays are allocated to the stack: EBP is
used to access the array elements Dynamic allocation with calloc() returns an offset
address into EAX: array elements are stored starting at that address. The allocated memory block is located in the heap.
30
ebp ebp-72 z[16]
ebp–4 d ebp-76 z[15]
ebp–8 e ebp-80 z[14]
ebp-12 f ebp-84 z[13]
ebp-16 g ebp-88 z[12]
ebp-20 h ebp-92 z[11]
ebp-24 i ebp-96 z[10]
ebp-28 j ebp-100 z[9]
ebp-32 k ebp-104 z[8]
ebp-36 l ebp-108 z[7]
ebp-40 z[24] ebp-112 z[6]
ebp-44 z[23] ebp-116 z[5]
ebp-48 z[22] ebp-120 z[4]
ebp-52 z[21] ebp-124 z[3]
ebp-56 z[20] ebp-128 z[2]
ebp-60 z[19] ebp-132 z[1]
ebp-64 z[18] ebp-136 z[0]
ebp-68 z[17]
EBX = a
ESI = b
EDI = c
ECX = m
EAX = count
EDX = y
Memory AllocationMemory Allocation mem.c*
Stack Setup
Register Usage
31
Memory Allocation (cont.)Memory Allocation (cont.)
A static variable (in C) is defined with the keyword “static” its value is preserved through successive
invocations of the function inside which it is defined A automatic variable (in C) is defined without the
keyword “static” its value is not preserved through successive
invocations of the function inside which it is defined Ex: static.asm is obtained from static.c
automatic variables are allocated on the stackThe stack frame thus contains all the “environment” of a procedure
static variables are permanently allocated on the data segment and given a name
32
Win32 AssemblyWin32 Assembly
The bcc32.exe automatically links with the import32.lib library
This enable us to call directly most of the Win32 API procedures
msgbox.asm is a minimal Win32 assembly program that calls the Windows MessageBoxA procedure to display a message box
Note that: Stack cleaning is done by the Win32 procedure Win32 procedure names do not start with “_”
Practical Win32 apps are much more complex than this one...