x86 programming memory accessing modes, characters, and strings computer architecture

31
x86 Programming x86 Programming Memory Accessing Modes, Memory Accessing Modes, Characters, and Strings Characters, and Strings Computer Architecture Computer Architecture

Upload: deborah-poole

Post on 18-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: X86 Programming Memory Accessing Modes, Characters, and Strings Computer Architecture

x86 Programmingx86 ProgrammingMemory Accessing Modes,Memory Accessing Modes,

Characters, and StringsCharacters, and Strings

x86 Programmingx86 ProgrammingMemory Accessing Modes,Memory Accessing Modes,

Characters, and StringsCharacters, and Strings

Computer ArchitectureComputer Architecture

Page 2: X86 Programming Memory Accessing Modes, Characters, and Strings Computer Architecture

Multi byte storage• Multi-byte data types include:

– word/short (2 bytes)– int (4 bytes)– long or quad (8 bytes)

• Conceptual representation– Most significant byte (MSB) is left most byte– Least significant byte (LSB) is right most byte– Example:

• Number: 0xaabb• MSB: 0xaa• LSB: 0xbb

• In memory representation (applicable only to multi byte storage)– Big Endian

• MSB is stored at the lower memory address– Little Endian

• MSB is stored at the higher memory address

Page 3: X86 Programming Memory Accessing Modes, Characters, and Strings Computer Architecture

Big vs. Little Endian

• Consider the integer: 0x11aa22bb

• Big Endian Storage

• Little Endian Storage (x86 architecture)

0x1000 0x1001 0x1002 0x1003

0x11 0xaa 0x22 0xbb

0x1000 0x1001 0x1002 0x1003

0xbb 0x22 0xaa 0x11

Memory Address

Memory Address

Page 4: X86 Programming Memory Accessing Modes, Characters, and Strings Computer Architecture

Characters

• Characters are simply represented using an unsigned 8-bit (byte) numbers– In memory as well as in instructions.– The number is interpreted and displayed as

characters for Input-Output (I/O) purposes only!– The mapping from byte values to character (as

displayed on screen) is based on the American Standard Code for Information Interchange (ASCII)

• It is used all over the world by all I/O devices– Like: Monitors, keyboards, etc.

Page 5: X86 Programming Memory Accessing Modes, Characters, and Strings Computer Architecture

Standard ASCII Codes

• Here is a short table illustrating standard ASCII codes that are frequently used:

Range of ASCII Codes Range of Characters

4810 to 5710 ‘0’ to ‘9’

6510 to 9010 ‘A’ to ‘Z’

9710 to 12210 ‘a’ to ‘z’

Page 6: X86 Programming Memory Accessing Modes, Characters, and Strings Computer Architecture

Characters in assembly

• Example assembly code with 5 characters – Note that the characters stored at consecutive

memory addresses! It is guaranteed by the assembler!

/* Assembly program involving characters */.text /* Instructions */.datachar1: .byte 72 /* ASCII code for ‘H’ */char2: .byte 101 /* ASCII code for ‘e’ */char3: .byte 108 /* ASCII code for ‘l’ */char4: .byte 108 /* ASCII code for ‘l’ */char5: .byte 111 /* ASCII code for ‘o’ */

Page 7: X86 Programming Memory Accessing Modes, Characters, and Strings Computer Architecture

For the Java programmer…

• Assembler permits direct representation of characters– It converts characters to ASCII codes

/* Assembly program involving characters */.text /* Instructions */.datachar1: .byte ’ H’ /* Assembler converts the */char2: .byte ’e’ /* characters to ASCII */char3: .byte ’l’ char4: .byte ’ l’ char5: .byte ’ o’

Page 8: X86 Programming Memory Accessing Modes, Characters, and Strings Computer Architecture

Memory organization

• Bytes declared consecutively in the assembly source are stored at consecutive memory locations– Assume that the assembler places char1 (‘H’) at

address 0x20, then other characters have the following memory addresses:

0x20

H

0x21

e

0x22

l

0x23

l

0x24

o

Addresses

Page 9: X86 Programming Memory Accessing Modes, Characters, and Strings Computer Architecture

Working with characters

• All characters (including other symbols) have 2 unique values associated with them– The address in memory

• Accessed by prefixing the symbol with a $ (dollar) sign• The memory address is always 32-bits (4 bytes) on 32-

bit x86 processors– It is 64-bits wide on 64-bit x86 processors.

– The value contained in the memory location• Accessed without any prefixes to the symbol.• The bytes read depends on the type of the symbol

– 1 byte for byte, 4 bytes for int etc.

• This is exactly how we have been doing it so far.

Page 10: X86 Programming Memory Accessing Modes, Characters, and Strings Computer Architecture

Cross Check

• Given the following memory layout and symbol table what are the values of:– $letter: 0x20– Yellow: ‘e’– $k: 0x22– e: ‘o’

0x20

H

0x21

e

0x22

l

0x23

l

0x24

o

Address

letter 0x20

Symbol Address

Yellow 0x21

k 0x22

e 0x24

Addresses of symbols (expressions with a $ sign) are obtained from the symbol table while values of symbols (expressions without $ sign) are obtained from the memory layout shown below.

Page 11: X86 Programming Memory Accessing Modes, Characters, and Strings Computer Architecture

Example assembly

/* Example use of characters */.text

movb char1, %al /* al = ASCII(‘H’) */addb $1, %al /* al = ASCII(‘I’) */movb %al, char1 /* char1 = (‘I’) */

movl $char1, %ebx /* ebx = addressOf(char1) */

.data

char1: .byte ‘H’

Page 12: X86 Programming Memory Accessing Modes, Characters, and Strings Computer Architecture

What’s the use of addresses?

• Why bother loading addresses into registers?– x86 permits indirect memory access and

manipulation using addresses stored in registers!– A variety of mechanisms are supported by x86

processors for generating the final memory address for retrieving data

• The variety of mechanism is collectively called memory Addressing Modes

Page 13: X86 Programming Memory Accessing Modes, Characters, and Strings Computer Architecture

Addressing Modes

• x86 supports the following addressing modes

1. Register mode

2. Immediate mode

3. Direct mode

4. Register direct mode

5. Base displacement mode

6. Base-index scaled mode

Page 14: X86 Programming Memory Accessing Modes, Characters, and Strings Computer Architecture

Register mode Instructions involving only registers

This is the simplest and fastest mechanism Data is loaded and stored to registers. In this mode, the processor does not access

RAM.

.text movb %al, %ah /* ah = al */ addl %eax, %ebx /* ebx += eax */ mull %ebx /* eax *= ebx */

Page 15: X86 Programming Memory Accessing Modes, Characters, and Strings Computer Architecture

Immediate mode

Instructions involving registers & constants This mode is used to load constant values into

registers The constant value to be loaded is encoded as a

part of the instruction. Consequently, there is no real memory access

.text movb $5, %ah /* ah = 5 */ addl $-35, %ebx /* ebx += -35 */

Page 16: X86 Programming Memory Accessing Modes, Characters, and Strings Computer Architecture

Direct Mode

• Standard mode used with symbols– Address to load/store data is part of instruction

• Involves 1 memory access using the address• Number of bytes loaded depends on type• Symbols are used to represent addresses

– Source/Destination has to be a register!

.text movb char1, %ah /* ah = ‘H’ */ addl %eax, i1 /* i1 += eax */.datachar1: .byte ‘H’i1: .int 100

Page 17: X86 Programming Memory Accessing Modes, Characters, and Strings Computer Architecture

Register direct mode

• Address for memory references are obtained from a register.– The address needs to be loaded into a register.

• Addresses can be manipulated as a regular number!

.text /* eax = addressOf(char1) */ movl $char1, %eax movb (%eax), %bl /* bl = ‘H’ */ inc %eax /* eax++ */ movb %bl, (%eax) /* char2 = char1 */.datachar1: .byte ‘H’char2: .byte ‘e’

Page 18: X86 Programming Memory Accessing Modes, Characters, and Strings Computer Architecture

Register direct mode (Contd.)

• Register direct mode is most frequently used!– It is analogous to accessing using references in

Java– Note that one of the operands in register direct

mode has to be a register– Pay attention to the following syntax

• $symbol: To obtain address of symbol– Address is always 32-bits!

• (%register): Data stored at the memory address contained in register.

– The number of bytes read from the given memory location depends on the instruction.

Page 19: X86 Programming Memory Accessing Modes, Characters, and Strings Computer Architecture

Base Displacement Mode

• Constant offset from a given address stored in a register– Used to access parameters to a method

• We will see the use for this mode in the near future.

.text /* eax = addressOf(char1) */ movl $char1, %eax movb 1(%eax), %bl /* bl = char2 */ inc %eax movb %bl, -1(%eax) /* char1 = char2 */.datachar1: .byte ‘H’char2: .byte ‘e’

Displacement value is constant. The base value is contained in registers!

Page 20: X86 Programming Memory Accessing Modes, Characters, and Strings Computer Architecture

Base-Index scaled Mode

• Most complex form of memory referencing• Involves a displacement constant• A base register• An index register• A scale factor (must be 0, 1, 2, 4, or 8)

– Final address for accessing memory is computed as: address = base_register +

(index_register * scale_factor) + displacement_constant

Page 21: X86 Programming Memory Accessing Modes, Characters, and Strings Computer Architecture

Base-Index scaled Mode

• Examples of this complex mode is shown below:

.text /* eax = addressOf(char1) */ movl $char1, %eax movl $0, %ebx movb 1(%eax, %ebx, 4), %bl /*bl=char2*/ inc %eax movl $1, %ebx movb %bl, -1(%eax, %ebx, 0).datachar1: .byte ‘H’char2: .byte ‘e’

Address = %eax + (%ebx * 4) + 1 = %eax + (0 * 4) + 1 = %eax + 1

Address = %eax + (%ebx * 0) - 1 = %eax + (1 * 0) - 1 = %eax - 1

Page 22: X86 Programming Memory Accessing Modes, Characters, and Strings Computer Architecture

LEA Instruction

• X86 architecture provides a special instruction called LEA (Load Effective Address) – This instruction loads the effective address

resulting from applying various memory access modes into a given register.

– Examples:•LEA -1(%eax, %ebx, 0), %edi•LEA (%eax, %ebx), %edi•LEA -5(%eax), %edi

Page 23: X86 Programming Memory Accessing Modes, Characters, and Strings Computer Architecture

LEA Example (Contd.)

• Here is an example of the LEA instruction

.text /* eax = addressOf(char1) */ movl $char1, %eax movl $0, %ebx lea 1(%eax, %ebx, 2), %edi /*edi = address of char2*/ movb $’h’, (%edi) /* change ‘e’ to ‘h’*/.datachar1: .byte ‘H’char2: .byte ‘e’

Page 24: X86 Programming Memory Accessing Modes, Characters, and Strings Computer Architecture

Strings

• Strings are simply represented as a sequence (or array) of characters in memory– Each character is stored at a consecutive

memory address!– Every string is terminated by ASCII value 0

• Represented as ‘\0’ in assembly source

Page 25: X86 Programming Memory Accessing Modes, Characters, and Strings Computer Architecture

Declaring Strings in Assembly

• Strings are defined using the .string directive

.text /* Instructions go here */.datamsg1: .string “Hello\n”msg2: .string “World!\n”

Page 26: X86 Programming Memory Accessing Modes, Characters, and Strings Computer Architecture

Memory representation

• Given the previous example, the strings (msg1 and msg2) are stored in memory as shown below:

H e l l o

W o r l d

\n \0

! \n \0

20 21 22 23 24 25 26

27 28 29 2A 2B 2C 2D 2E

msg1=20

msg2=27

.text /* Instructions go here */.datamsg1: .string “Hello\n”msg2: .string “World!\n”

Page 27: X86 Programming Memory Accessing Modes, Characters, and Strings Computer Architecture

Displaying Strings

• Strings or characters can be displayed on standard output (analogous to System.out) using System call:– Set eax to 4

• To write characters to a file (stream)• Changing eax to 3 will cause reading characters instead!

– Set ebx to 1• Destination steam is standard output• You may set ebx to 2 for standard error• If ebx is 0 it indicates standard input (you can write to it!)

– Set ecx to address of message to display– Set number of characters to display in edx– Call int 0x80

Page 28: X86 Programming Memory Accessing Modes, Characters, and Strings Computer Architecture

Complete Example/* Console output example */text.global _start_start: mov $4, %eax /* System call to write to a file handle */ mov $1, %ebx /* File handle=1 implies standard output */ mov $msg, %ecx /* Address of message to be displayed */ mov $14, %edx /* Number of bytes to be displayed */ int $0x80 /* Call OS to display the characters. */

mov $1,%eax /* The system call for exit (sys_exit) */ mov $0,%ebx /* Exit with return code of 0 (no error) */ int $0x80

.data/* The data to be displayed */msg: .string "Hello!\nWorld!\n"

Calculated value by hand! Can be cumbersome for

large strings.

Page 29: X86 Programming Memory Accessing Modes, Characters, and Strings Computer Architecture

Rewritten using Macro!/* Console output example */text.global _start_start: mov $4, %eax /* System call to write to a file handle */ mov $1, %ebx /* File handle=1 implies standard output */ mov $msg, %ecx /* Address of message to be displayed */ mov $len, %edx /* Number of bytes to be displayed */ int $0x80 /* Call OS to display the characters. */

mov $1,%eax /* The system call for exit (sys_exit) */ mov $0,%ebx /* Exit with return code of 0 (no error) */ int $0x80.data/* The data to be displayed */msg: .string "Hello!\nWorld!\n“.equ len, . - msg

Compute a assembler constant len by subtracting address of msg from

current address, represented by special symbol • (dot). Every use of $msg is replaced with the resulting

constant value.

Page 30: X86 Programming Memory Accessing Modes, Characters, and Strings Computer Architecture

Compute string length

• The previous examples use fixed length strings– For strings that change values or change lengths,

the string length must be computed using suitable assembly code.

– The corresponding Java source is shown below:

public static int length(char[] str) { int i; for(i = 0; (str[i] != ‘\0’); i++); return i;}

Page 31: X86 Programming Memory Accessing Modes, Characters, and Strings Computer Architecture

Compute string length

_length: /* Let eax correspond to i */ movl $0, %eax /* eax = 0 * / /* Let ebx correspond to str */ movl $str, %ebx /* ebx = address(str) */loop: cmpb $0, (%ebx, %eax) /* str[i] != ‘\0’ */ je done /* We have hit the ‘\0’ in string */ inc %eax /* i++ */ jmp loop /* Continue the loop */done:

Base register = ebxOffset register = eaxDisplacement (implicit)= 0Scale value (implicit) = 1