1 machine-level representation of programs i. 2 outline compiler drivers history of the intel ia-32...

43
1 Machine-Level Representation of Programs I

Upload: cynthia-mcdaniel

Post on 01-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

1

Machine-Level Representation of Programs

I

Page 2: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

2

Outline

• Compiler drivers• History of the Intel IA-32 architecture• Assembly code and object code• Memory and Registers• Addressing Mode• Data Formats

• Suggested reading

– Chap 1.2, 1.4.1, 1.7.3, 3.1, 3.2, 3.3, 3.4.1

Page 3: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

3

The Hello Program

• It begins life as a high-level C program

– Can be read and understand by human beings

• The individual C statements must be

translated by compiler drivers

– So that the hello program can run on a

computer system

– Compiler :编译器

Page 4: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

4

The Hello Program

• The C programs are translated into – A sequence of low-level machine-language

instructions

• These instructions are then packaged in a form – called an object program

• Object program are stored as a binary disk file– Also referred to as executable object files

Page 5: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

5

The Context of a Compiler (gcc)

Source program (text)hello.c

Preprocessor (cpp)

Modified source program (text)hello.i

Assembly program (text)

Compiler (cc1)

hello.s

Assembler (as)

Relocatable object program (binary)hello.o

Linker (ld)

Executable object program (binary)hello

Figure 1.3 P5

Compiler: 编译器Assembler: 汇编器Linker: 连接器

Page 6: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

6

Characteristics of the high level programming languages

• Abstraction – Productive– reliable

• Type checking• As efficient as hand written code• Can be compiled and executed on a number of

different machines, whereas assembly code is highly machine specific

Productive :多产的Reliable: 可靠的

Page 7: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

7

Characteristics of the assembly programming languages

• Managing memory• Low level instructions to carry out the

computation• Highly machine specific

Page 8: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

8

Why should we understand the assembly code

• Understand the optimization capabilities of the compiler

• Analyze the underlying inefficiencies in the code

• Sometimes the run-time behavior of a program is needed

Page 9: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

9

From writing assembly code to understand assembly code

• Different set of skills– Transformations– Relation between source code and assembly

code

• Reverse engineering– Trying to understand the process by which a

system was created • By studying the system and • By working backward

Backward: 回溯

Page 10: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

10

A Historical Perspective

• Long evolutionary development

– Started from rather primitive 16-bit processors

– Added more features

• Take the advantage of the technology improvements

• Satisfy the demands for higher performance and for

supporting more advanced operating systems

– Laden with features providing backward compatibility

that are obsolete

* laden with: 承载

* compatibility: 兼容性

* obsolete: 陈旧的

Page 11: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

11

X86 family

• 8086(1978, 29K)

– The heart of the IBM PC & DOS

– 1M bytes addressable, 640K for users

• 80286(1982, 134K)

– More (now obsolete) addressing modes

– Basis of the IBM PC-AT & Windows

Page 12: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

12

X86 family

• i386(1985, 275K)

– 32 bits architecture, flat addressing model

– Support a Unix operating system

• I486(1989, 1.9M)

– Integrated the floating-point unit onto the

processor chip

Page 13: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

13

X86 family

• Pentium(1993, 3.1M)

• PentiumPro(1995, 6.5M)

– P6 microarchitecture

– Conditional mov

• Pentium/MMX(1997, 4.5M)

– New class of instructions for manipulating

vectors of integers

Page 14: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

14

X86 family

• Pentium II(1997, 7M)

– Implementing MMX instructions within P6

• Pentium III(1999, 8.2M)

– New class of instructions for manipulating

vectors of floating-point numbers(SSE, Stream

SIMD Extension)

Page 15: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

15

X86 family

• Pentium 4(2001, 42M)

– Netburst microarchitecture

– 144 new SSE2 instructions

Page 16: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

16

X86 family

• Advanced Micro Devices (AMD)

– Now are close competitors to Intel

– Developing own extension to 64-bits

Page 17: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

17

X86 family

• Transmeta

– In January of 2002, introduced CrucoeTM processor

– Radically different approach to implementation

• Translates x86 code into “Very Long Instruction Word”

(VLIW) code

• High degree of parallelism

– Shooting for low-power market such as lap-top

computers

Page 18: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

18

Hardware Organization Figure 1.4 P7

•CPU: Central Processing Unit•ALU: Arithmetic/Logic Unit•PC: Program Counter•USB: Universal Serial Bus

Page 19: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

19

Virtual spaces

• A linear array of bytes– each with its own unique address (array index)

starting at zero

… … … …

0xffffffff

0xfffffffe

0x2

0x1

0x0

addresses contents

Page 20: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

20

Data layout

• Object model in C– Different data types can be declared

Page 21: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

21

Data layout

• Object model in assembly– A large, byte-addressable array– No distinctions even between signed or

unsigned integers– Code, user data, OS data– Run-time stack for managing procedure call

and return– Blocks of memory allocated by user

Page 22: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

22•Figure 1.13 P17

Page 23: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

23

Operations in C constructs

• Arithmetic expression evaluation

• Loops

• Procedure calls and returns

• Translated into sequences of instructions

Page 24: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

24

Operations in Assembly Instructions

• Performs only a very elementary operation

• Normally one by one in sequential

• Operate data stored in registers

• Transfer data between memory and a

register

• Conditionally branch to a new instruction

address

Page 25: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

25

Assembly Programmer’s View Figure 3.2 P136

FF

BF

7F

3F

C0

80

40

00

Stack

DLLs

TextDataHeap

Heap

08

%eax

%edx

%ecx

%ebx

%esi

%edi

%esp

%ebp

%al%ah

%dl%dh

%cl%ch

%bl%bh

%eip

%eflag

Addresses

Data

Instructions

Page 26: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

26

Programmer-Visible States P129

• Program Counter(%eip)

– Address of the next instruction

• Register File

– Heavily used program data

– Integer and floating-point

Page 27: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

27

Programmer-Visible States

• Conditional code register

– Hold status information about the most recently

executed instruction

– Implement conditional changes in the control

flow

Page 28: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

28

Code Examples P130

C codeint sum(int x, int y){ int t = x+y; return t;}

_sum:pushl %ebpmovl %esp,%ebpmovl 12(%ebp),%eaxaddl 8(%ebp),%eaxmovl %ebp,%esppopl %ebpret

Obtain with command

gcc –O2 -S code.c

Assembly file code.s

Page 29: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

29

Code Examples P131

55 89 e5 8b 45 0c 03 45 08 01 05 00 00 00 00 89 ec 5d c3

Obtain with command

gcc –O2 -c code.c

Relocatable object file code.o

Page 30: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

30

Code Examples

Obtain with command

objdump -d code.o

Disassembly output (P132 反汇编输出 )0x80483b4 <sum>:0x80483b4 550x80483b5 89 e50x80483b7 8b 45 0c0x80483ba 03 45 080x80483bd 01 05 00 00 00 000x80483c3 89 ec0x80483c5 5d0x80483c6 c3

push %ebp mov %esp,%ebp mov 0xc(%ebp),%eax add 0x8(%ebp),%eax mov %ebp,%esp add %eax, 0x0 pop %ebp ret nop

Page 31: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

31

C Code

• Add two signed integers

• int t = x+y;

Page 32: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

32

Assembly Code

• Operands:– x: Register %eax– y: Memory M[%ebp+8]– t: Register %eax

• Instruction– addl 8(%ebp),%eax– Add 2 4-byte integers– Similar to expression x +=y

• Return function value in %eax

Page 33: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

33

Object Code

• 3-byte instruction

• Stored at address 0x80483b7

• 0x80483b7: 03 45 08

Page 34: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

34

Operands P137

• In high level languages

– Either constants (常数)

– Or variable (变量)

• Example

– A = A + 4

variabl

e

constant

Page 35: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

35

Operands

• Counterparts in assembly languages– Immediate ( constant )

– Register ( variable )

– Memory ( variable )

• Examplemovl 8(%ebp), %eaxaddl $4, %eax

memory

register

immediate

Page 36: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

36

Simple Addressing Mode

• Immediate– represents a constant – The format is $imm ($4, $0xffffffff)

• Registers – The fastest storage units in computer systems– Typically 32-bit long

– Register mode Ea

• The value stored in the register

• Noted as R[Ea]

Page 37: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

37

Virtual spaces

• A linear array of bytes– each with its own unique address (array index)

starting at zero

… … … …

0xffffffff

0xfffffffe

0x2

0x1

0x0

addresses contents

Page 38: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

38

Memory References

• The name of the array is annotated as M

• If addr is a memory address

• M[addr] is the content of the memory starting at addr

• addr is used as an array index

• How many bytes are there in M[addr]?– It depends on the context

Page 39: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

39

Memory Addressing Mode

• An expression for – a memory address (or an array index)

• Most general form – imm (Eb, Ei, s)

– s: 1, 2, 4, 8

• The address represented by the above form– imm + R[Eb] + R[Ei] * s

• It gives the value– M[imm + R[Eb] + R[Ei] * s]

Page 40: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

40

Type Form Operand value Name

Immediate

$Imm Imm Immediate

Register Ea R[Ea] Register

Memory Imm M[Imm] Absolute

Memory (Ea) M[R[Ea]] Indirect

Memory Imm(Eb) M[Imm+ R[Eb]] Base+displacement

Memory (Eb, Ei) M[R[Eb]+ R[Ei]] Indexed

Memory Imm(Eb, Ei) M[Imm+ R[Eb]+ R[Ei]] Scaled indexed

Memory (, Ei, s) M[R[Ei]*s] Scaled indexed

Memory (Eb, Ei, s) M[R[Eb]+ R[Ei]*s] Scaled indexed

Memory Imm(Eb, Ei, s)

M[Imm+ R[Eb]+ R[Ei]*s]

Scaled indexed

Addressing Mode Figure 3.3 P137

Page 41: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

41

Address

Value

0x100 0xFF

0x104 0xAB

0x108 0x13

0x10C 0x11

Register

Value

%eax 0x100

%ecx 0x1

%edx 0x3

0x130x108

0x13260(%ecx,%edx)

0x11(%eax,%edx,4)

0x108$0x108

0xFF(%eax)

0x100%eax

ValueOperand

•Practice problem 3.1 P138

Comment

Register

Immediate

Address 0x100

Absolute address

Address 0x108

Address 0x10C

Page 42: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

42

Data Formats Figure 3.1 P135

C declaration Intel data type GAS suffix Size (byte)

char short int unsigned long int unsigned long char * float double long double

ByteWordDouble wordDouble wordDouble wordDouble wordDouble wordSingle precisionDouble precisionExtended precision

bwlllllslt

124444448

10/12

Page 43: 1 Machine-Level Representation of Programs I. 2 Outline Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory

43

Data Formats

• Move data instruction– mov (general)– movb (move byte)– movw (move word)– movl (move double word)