the compiler, assembler, linker, loader and process address space tutorial - hacking the process of...

5
|< The main() and command line arguments | Main | C Memory Allocation and De-allocation Functions >| Site Index | Download | COMPILER, ASSEMBLER, LINKER AND LOADER: A BRIEF STORY The compiler, assembler, linker, loader and process address space tutorial... http://www.tenouk.com/ModuleW.html 1 of 17 10/6/2011 9:38 PM

Upload: haseeb-ahmad

Post on 28-Jul-2015

582 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Compiler, Assembler, Linker, Loader and Process Address Space Tutorial - Hacking the Process of Building Programs Using C Language_ Notes and Illustrations

|< The main() and command line arguments | Main | C Memory Allocation andDe-allocation Functions >| Site Index | Download |

COMPILER, ASSEMBLER, LINKER ANDLOADER:

A BRIEF STORY

The compiler, assembler, linker, loader and process address space tutorial... http://www.tenouk.com/ModuleW.html

1 of 17 10/6/2011 9:38 PM

Page 2: The Compiler, Assembler, Linker, Loader and Process Address Space Tutorial - Hacking the Process of Building Programs Using C Language_ Notes and Illustrations

My Training Period: xx hours Note:This Module presents quite a detail story of a process (running program). However, it is an excerpt from morebuffer overflow Tutorial. It tries to investigate how the C/C++ source codes preprocessed, compiled, linked running program. It is based on the GCC (GNU Compiler Collection). When you use thEnvironment) compilers such as Microsoft Visual C++, Borland C++ Builder etc. the prtransparent. The commands and examples of the gcc, gdb, g++, gas and friends are digas 1 and Linux gnu gcc, g++, gdb and gas 2. Have a nice day! The C compiler ability:

Able to understand and appreciate the processes involved in preprocessing, compilingprograms.

W.1 COMPILERS, ASSEMBLERS and LINKERS

Normally the C’s program building process involves four stages and utilizes differencompiler, assembler, and linker.At the end there should be a single executable file. Below are the stages that happsystem/compiler and graphically illustrated in Figure w.1.

Preprocessing is the first pass of any C compilation. It processes include-files, conditional cominstructions and macros.

1.

Compilation is the second pass. It takes the output of the preprocessor, and the source code, aassembler source code.

2.

Assembly is the third stage of compilation. It takes the assembly source code and produces alisting with offsets. The assembler output is stored in an object file.

3.

Linking is the final stage of compilation. It takes one or more object files or libraries athem to produce a single (usually executable) file. In doing so, it resolves referenassigns final addresses to procedures/functions and variables, and revises code and addresses (a process called relocation).

4.

Bear in mind that if you use the IDE type compilers, these processes quite transpareNow we are going to examine more details about the process that happen before and afinput file, the file name suffix (file extension) determines what kind of compilation is done and the examin Table w.1.In UNIX/Linux, the executable or binary file doesn’t have extension whereas in Windohave .exe, .com and .dll.

File extension Descriptionfile_name.c C source code which must be preprocessed.file_name.i C source code which should not be preprocessed.file_name.ii C++ source code which should not be preprocessed.file_name.h C header file (not to be compiled or linked).file_name.ccfile_name.cpfile_name.cxxfile_name.cppfile_name.c++file_name.C

C++ source code which must be preprocessed. For file_name.cxxliterally character x and file_name.C, is capital c.

file_name.s Assembler code.file_name.S Assembler code which must be preprocessed.

file_name.o Object file by default, the object file name for a source file is made by replacinextension .c, .i, .s etc with .o

Table w.1

The compiler, assembler, linker, loader and process address space tutorial... http://www.tenouk.com/ModuleW.html

2 of 17 10/6/2011 9:38 PM

Page 3: The Compiler, Assembler, Linker, Loader and Process Address Space Tutorial - Hacking the Process of Building Programs Using C Language_ Notes and Illustrations

The following Figure shows the steps involved in the process of building the C progrthe loading of the executable image into the memory for program running.

Figure w.1: Compile, link and execute stages for running program (a process)

W.2 OBJECT FILES and EXECUTABLE

After the source code has been assembled, it will produce an Object files (e.g. .oexecutable files.An object and executable come in several formats such as ELF (Executable and Linking Format) and Object-File Format). For example, ELF is used on Linux systems, while COFF is used Other object file formats are listed in the following Table.

Object FileFormat Description

a.out

The a.out format is the original file format for Unix. It consists of three sections: for program code, initialized data, and uninitialized data, respectively. This formhave any reserved place for debugging information. The only debugging format for encoded as a set of normal symbols with distinctive attributes.

COFFThe COFF (Common Object File Format) format was introduced with System V Release 3 (files may have multiple sections, each prefixed by a header. The number of sections

The compiler, assembler, linker, loader and process address space tutorial... http://www.tenouk.com/ModuleW.html

3 of 17 10/6/2011 9:38 PM

Page 4: The Compiler, Assembler, Linker, Loader and Process Address Space Tutorial - Hacking the Process of Building Programs Using C Language_ Notes and Illustrations

specification includes support for debugging but the debugging information was limitextension for this format.

ECOFF A variant of COFF. ECOFF is an Extended COFF originally introduced for Mips and Alp

XCOFFThe IBM RS/6000 running AIX uses an object file format called XCOFF (esymbols, and line numbers are used, but debugging symbols are dbx-style stabs whose strinthe .debug section (rather than the string table). The default name for an XCOFF executable

PE Windows 9x and NT use the PE (Portable Executable) format for their executables. PE is basadditional headers. The extension normally .exe.

ELFThe ELF (Executable and Linking Format) format came with System V Release 4 (SVR4) UCOFF in being organized into a number of sections, but it removes many of COFF's limmost modern Unix systems, including GNU/Linux, Solaris and Irix. Also used on many e

SOM/ESOM SOM (System Object Module) and ESOM (Extended SOM) is HP's object file and debug forconfused with IBM's SOM, which is a cross-language Application Binary Interface - AB

Table w.2

When we examine the content of these object files there are areas called sections. Sections canhold executable code, data, dynamic linking information, debugging data, symbol tables,relocation information, comments, string tables, and notes.Some sections are loaded into the process image and some provide information needed in thebuilding of a process image while still others are used only in linking object files.There are several sections that are common to all executable formats (may be named differently,depending on the compiler/linker) as listed below:

Section Description

.text This section contains the executable instruction codes and is shared among every process running the same bsection usually has READ and EXECUTE permissions only. This section is the one most

.bss

BSS stands for ‘Block Started by Symbol’. It holds un-initialized global and static variables. Since the BSS variables that don't have any values yet, it doesn't actually need to store the image of these variables. The sizwill require at runtime is recorded in the object file, but the BSS (unlike the data section) doesn't take up any acin the object file.

.data Contains the initialized global and static variables and their values. It is usually the largest part of the exeusually has READ/WRITE permissions.

.rdata Also known as .rodata (read-only data) section. This contains constants and string literals.

.reloc Stores the information required for relocating the image while loading.

Symbol tableA symbol is basically a name and an address. Symbol table holds information needed to locate and relocate a symbolic definitions and references. A symbol table index is a subscript into this array. Index 0 both designates entry in the table and serves as the undefined symbol index. The symbol table contains an array of symbol ent

Relocationrecords

Relocation is the process of connecting symbolic references with symbolic definitions. For example, when a proa function, the associated call instruction must transfer control to the proper destination address at execution.Re-locatable files must have relocation entries’ which are necessary because they contain information that descto modify their section contents, thus allowing executable and shared object files to hold the right information foprocess's program image. Simply said relocation records are information used by the linker to adjust section co

Table w.3: Segments in executable file

The following is an example of the object file content dumping using readelf program. Otherutility can be used is objdump. These utilities presented in Linux gcc, g++, gdb and gas 1 andLinux gcc, g++, gdb and gas 2.For Windows dumpbin utility (coming with Visual C++ compiler) or more powerful one is a freePEBrowse program that can be used for the same purpose.

/* testprog1.c */#include <stdio.h>static void display(int i, int *ptr);

The compiler, assembler, linker, loader and process address space tutorial... http://www.tenouk.com/ModuleW.html

4 of 17 10/6/2011 9:38 PM

Page 5: The Compiler, Assembler, Linker, Loader and Process Address Space Tutorial - Hacking the Process of Building Programs Using C Language_ Notes and Illustrations

int main(void){ int x = 5; int *xptr = &x; printf("In main() program:\n"); printf("x value is %d and is stored at address %p.\n", x, &x); printf("xptr pointer points to address %p which holds a value of %d.\n", xptr, *xptr); display(x, xptr); return 0;} void display(int y, int *yptr){ char var[7] = "ABCDEF"; printf("In display() function:\n"); printf("y value is %d and is stored at address %p.\n", y, &y); printf("yptr pointer points to address %p which holds a value of %d.\n", yptr, *yptr);} [bodo@bakawali test]$ gcc -c testprog1.c[bodo@bakawali test]$ readelf -a testprog1.oELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: REL (Relocatable file) Machine: Intel 80386 Version: 0x1 Entry point address: 0x0 Start of program headers: 0 (bytes into file) Start of section headers: 672 (bytes into file) Flags: 0x0 Size of this header: 52 (bytes) Size of program headers: 0 (bytes) Number of program headers: 0 Size of section headers: 40 (bytes) Number of section headers: 11 Section header string table index: 8 Section Headers: [Nr] Name Type Addr Off Size ES Flg LkInf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .text PROGBITS 00000000 000034 0000de 00 AX 0 0 4 [ 2] .rel.text REL 00000000 00052c 000068 08 9 1 4 [ 3] .data PROGBIT 00000000 000114 000000 00 WA 0 0 4 [ 4] .bss NOBIT 00000000 000114 000000 00 WA 0 0 4 [ 5] .rodata PROGBITS 00000000 000114 00010a 00 A 0 0 4 [ 6] .note.GNU-stack PROGBITS 00000000 00021e 000000 00

The compiler, assembler, linker, loader and process address space tutorial... http://www.tenouk.com/ModuleW.html

5 of 17 10/6/2011 9:38 PM