assembly process. machine code generation assembling a program entails translating the assembly...

18
Assembly Process Assembly Process

Upload: hunter-alan

Post on 15-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Assembly Process. Machine Code Generation Assembling a program entails translating the assembly language into binary machine code This requires more than

Assembly ProcessAssembly Process

Page 2: Assembly Process. Machine Code Generation Assembling a program entails translating the assembly language into binary machine code This requires more than

Machine Code GenerationMachine Code GenerationAssembling a program entails translating the Assembling a program entails translating the assembly language into binary machine codeassembly language into binary machine code

This requires more than simply mapping assembly This requires more than simply mapping assembly instructions to machine instructionsinstructions to machine instructions Each instruction is bound to an addressEach instruction is bound to an address Labels are bound to addressesLabels are bound to addresses Assembly instructions which refer to labels generate Assembly instructions which refer to labels generate

machine instructions which contain the label's addressmachine instructions which contain the label's address Pseudo-instructions are translated into one or more Pseudo-instructions are translated into one or more

machine instructionsmachine instructions

Page 3: Assembly Process. Machine Code Generation Assembling a program entails translating the assembly language into binary machine code This requires more than

Instruction FormatInstruction Format

addi $13,$7,50

0010 00 00111 01101 0000 0000 0011 0010

6 bits 5 bits 5 bits 16 bits

opcode

add $13,$7,$8

immediate operand

0000 00 00 111 01000 01101 000 0010 0000

opcodeextended opcode

Page 4: Assembly Process. Machine Code Generation Assembling a program entails translating the assembly language into binary machine code This requires more than

The symbol tableThe symbol table

The assembler scans the source code and generates The assembler scans the source code and generates the appropriate bit string for each line encounteredthe appropriate bit string for each line encounteredThe assembler must remember The assembler must remember what memory locations have been allocated what memory locations have been allocated to which address each label is boundto which address each label is bound

A A symbol tablesymbol table is a list of (label, address) pairs is a list of (label, address) pairsWhen the data and text segments have been When the data and text segments have been generated, they are stored as an executable file generated, they are stored as an executable file The file is used by a program called theThe file is used by a program called the loader loader to to initialize memory to the appropriate state before initialize memory to the appropriate state before executionexecution

Page 5: Assembly Process. Machine Code Generation Assembling a program entails translating the assembly language into binary machine code This requires more than

InstructionsInstructionsThe .text directive tells the assembler that the lines which The .text directive tells the assembler that the lines which follow are instructions. follow are instructions. By default, the text segment starts at 0x00400000By default, the text segment starts at 0x00400000

In some cases, a symbol may not have an assigned address In some cases, a symbol may not have an assigned address yet when the assembler scans the line where it belongsyet when the assembler scans the line where it belongs A second pass through the code can update instructions containing A second pass through the code can update instructions containing

unresolved labelsunresolved labels Maintain a list of addresses in which each unresolved label Maintain a list of addresses in which each unresolved label

appearsappearsWhen the labeled is added to the symbol table, all locations in the When the labeled is added to the symbol table, all locations in the corresponding list are updated to hold the address associated with the corresponding list are updated to hold the address associated with the labellabel

Page 6: Assembly Process. Machine Code Generation Assembling a program entails translating the assembly language into binary machine code This requires more than

Branch offset in the MIPS R2000Branch offset in the MIPS R2000In machine code, the target address in a branch must be In machine code, the target address in a branch must be specified as an offset from the address of the branch.specified as an offset from the address of the branch.

During execution, this offset is simply added to the During execution, this offset is simply added to the program counter to fetch the next instructionprogram counter to fetch the next instruction PC contains the addressPC contains the address Offset is measured in words, not bytesOffset is measured in words, not bytes

PC_NEW = offset*4 + PC_OLDPC_NEW = offset*4 + PC_OLD

To calculate the offset, the assembler uses the formula:To calculate the offset, the assembler uses the formula:

offset = offset = ((target instruction address – target instruction address – (branch instruction address) (branch instruction address)))/4/4

Page 7: Assembly Process. Machine Code Generation Assembling a program entails translating the assembly language into binary machine code This requires more than

Branch offset calculationBranch offset calculation

The offset is stored in the instruction as a The offset is stored in the instruction as a word offsetword offset rather than a byte offset.rather than a byte offset. Instructions are only stored at word boundaries Instructions are only stored at word boundaries For both target and branch instruction, the least two bits of For both target and branch instruction, the least two bits of

the address are zerothe address are zero

An offset maybe negativeAn offset maybe negative If the target instruction preceded the branch instructionIf the target instruction preceded the branch instruction

The offset is stored in the 16-bit immediate fieldThe offset is stored in the 16-bit immediate field This means the branch can only jump about 2This means the branch can only jump about 21515

instructions before or after the current addressinstructions before or after the current address221515 instructions (words) = 2 instructions (words) = 21717 bytes bytes

Page 8: Assembly Process. Machine Code Generation Assembling a program entails translating the assembly language into binary machine code This requires more than

Branch offset calculationBranch offset calculation

[0x00400068] 0x1440ffe6 bne $2, $0, -104 [__start-0x00400068]; 44: bnez $v0, __start

An entry in the SPIM instruction listAn entry in the SPIM instruction list

orignal assembly code

line number in source file

offset calculation, in bytesignores PC increment

offset in bytes (__start = 0x00400000)0x00400000 – (0x00400068) = - 104

machine code

stored offsetffe6 = -26 = -104/4

instruction address

Page 9: Assembly Process. Machine Code Generation Assembling a program entails translating the assembly language into binary machine code This requires more than

Jump target calculationJump target calculation

The jump instruction has two formsThe jump instruction has two forms Pseudo-direct, for Pseudo-direct, for jj and and jaljal Register direct for Register direct for jrjr and and jalrjalr

jrjr and and jalrjalr specify a register specify a register containing the address to be loaded containing the address to be loaded into the PCinto the PC

jj and jal specify most of the address and jal specify most of the address of the target within the instruction. of the target within the instruction. However, they have a range of at most However, they have a range of at most

one-sixteenth of the memory spaceone-sixteenth of the memory space

fedcba9876543210

Page 10: Assembly Process. Machine Code Generation Assembling a program entails translating the assembly language into binary machine code This requires more than

Jump target calculationJump target calculationThe target address is a 32 bit quantityThe target address is a 32 bit quantity Since all word addresses are multiples of 4 there is no need Since all word addresses are multiples of 4 there is no need

to store the last two bitsto store the last two bits The jump instruction format has 26 bits for the target addressThe jump instruction format has 26 bits for the target address

The remaining 6 bits of the instruction are used for the opcodeThe remaining 6 bits of the instruction are used for the opcode The highest-order 4 bits of the target are taken from the The highest-order 4 bits of the target are taken from the

address currently stored in the program counteraddress currently stored in the program counter

PC

opcode Jump target bits (26)

00

Page 11: Assembly Process. Machine Code Generation Assembling a program entails translating the assembly language into binary machine code This requires more than

Jump Target CalculationJump Target Calculation

jump instructions have a jump instructions have a range of range of 222626 words or words or 2226 26 x x 2222 =2 =22828 bytes bytes This range is NOT symmetric This range is NOT symmetric

about the jump instructionabout the jump instruction

fedcba9876543210

0x80000080

-0x00000080

+0x0fffff7c

Page 12: Assembly Process. Machine Code Generation Assembling a program entails translating the assembly language into binary machine code This requires more than

Program relocationProgram relocationIt is possible that program modules are It is possible that program modules are developed separately by individual developed separately by individual programmers. When these programs are to be programmers. When these programs are to be loaded into memory they should not be assigned loaded into memory they should not be assigned overlapping memory space.overlapping memory space.

Thus,the modules have to be relocatedThus,the modules have to be relocated relative addresses are relocatablerelative addresses are relocatable Any absolute references must be "fixed" by the Any absolute references must be "fixed" by the

loaderloaderUse a logical base address known at load timeUse a logical base address known at load time

Absolute addresses are stored as offsets from this TBD Absolute addresses are stored as offsets from this TBD base base

Page 13: Assembly Process. Machine Code Generation Assembling a program entails translating the assembly language into binary machine code This requires more than

From source to executableFrom source to executable

compiler assembler

linker

loadermemory

exe

obj

obj

lib

asm

asm

high-levelsource code

Page 14: Assembly Process. Machine Code Generation Assembling a program entails translating the assembly language into binary machine code This requires more than

Some examples of assembling Some examples of assembling codecode

.data .data a1: .word 3 a1: .word 3 a2: .word 16, 16, 16, 16 a2: .word 16, 16, 16, 16 a3: .word 5 a3: .word 5 .text .text __start: __start: la $6, a2 la $6, a2 loop: loop: lw $7, 4($6) lw $7, 4($6) mul $9, $10, $7 mul $9, $10, $7 b loop b loop li $v0, 10li $v0, 10 syscall syscall

Page 15: Assembly Process. Machine Code Generation Assembling a program entails translating the assembly language into binary machine code This requires more than

Some examples of assembling Some examples of assembling codecode

Symbol TableSymbol Table symbolsymboladdressaddress a1a1 1000 0000 1000 0000 a2a2 1000 0004 1000 0004 a3a3 1000 0014 1000 0014 __start__start 0040 00000040 0000 looploop 0040 00080040 0008 Memory map of data sectionMemory map of data section addressaddress contentscontents 1000 00001000 0000 0000 00030000 0003 1000 00041000 0004 0000 00100000 0010 1000 00081000 0008 0000 00100000 0010 1000 000c1000 000c 0000 00100000 0010 1000 00101000 0010 0000 00100000 0010 1000 00141000 0014 0000 00050000 0005

.data a1: .word 3 a2: .word 16, 16, 16, 16 a3: .word 5 .text __start: la $6, a2 loop: lw $7, 4($6) mult $9, $10, $7 b loop li $v0, 10 syscall

Page 16: Assembly Process. Machine Code Generation Assembling a program entails translating the assembly language into binary machine code This requires more than

Translate pseudo-instructionsTranslate pseudo-instructions

lui $6, $6, lui $6, $6, 0x10000x1000

ori $6, $6, ori $6, $6, 0x00040x0004

lw $7, 4($6) lw $7, 4($6) mult $10, $7mult $10, $7 mflo $9 mflo $9 b loop b loop ori $v0, $0, 10ori $v0, $0, 10 syscall syscall

la $6, a2 loop: lw $7, 4($6) mul $9, $10, $7 b loop li $v0, 10 syscall

Page 17: Assembly Process. Machine Code Generation Assembling a program entails translating the assembly language into binary machine code This requires more than

Translate to machine codeTranslate to machine code

lui $6, 0x1000lui $6, 0x1000 ori $6, 0x0004ori $6, 0x0004 lw $7, 4($6)lw $7, 4($6) mult $10, $7mult $10, $7 mflo $9 mflo $9 b b looploop ori $v0, $0, 10ori $v0, $0, 10 syscallsyscall

address contents

00400000 3c06 1000 (lui)

00400004 34c6 0004 (ori)

00400008 8cc7 0004 (lw)

0040000c 012a 0018 (mult)

00400010 0000 4812 (mflo)

00400014 1000 xxxx (beq)

00400018 3402 000a (ori)

0040001c 0000 000c (syscall)

Page 18: Assembly Process. Machine Code Generation Assembling a program entails translating the assembly language into binary machine code This requires more than

Resolve relative referencesResolve relative references

lui $6, 0x1000lui $6, 0x1000 ori $6, 0x0004ori $6, 0x0004 lw $7, 4($6)lw $7, 4($6) mult $10, $7mult $10, $7 mflo $9 mflo $9 b b looploop ori $v0, $0, 10ori $v0, $0, 10 syscallsyscall

address contents

00400000 3c06 1000

00400004 34c6 0004

00400008 8cc7 0004

0040000c 012a 0018

00400010 0000 4812

00400014 1000 fffd (-3)

00400018 3402 000a

0040001c 0000 000c

[0x400008 - (0x400014)]/4 = -12/4 = -3 = 0xfffd