assembly process. machine code generation assembling a program entails translating the assembly...
TRANSCRIPT
Assembly ProcessAssembly Process
Machine Code GenerationMachine Code GenerationAssembling a program entails translating the Assembling a program entails translating the assembly language into binary machine codeassembly language into binary machine code
This requires more than simply mapping assembly This requires more than simply mapping assembly instructions to machine instructionsinstructions to machine instructions Each instruction is bound to an addressEach instruction is bound to an address Labels are bound to addressesLabels are bound to addresses Assembly instructions which refer to labels generate Assembly instructions which refer to labels generate
machine instructions which contain the label's addressmachine instructions which contain the label's address Pseudo-instructions are translated into one or more Pseudo-instructions are translated into one or more
machine instructionsmachine instructions
Instruction FormatInstruction Format
addi $13,$7,50
0010 00 00111 01101 0000 0000 0011 0010
6 bits 5 bits 5 bits 16 bits
opcode
add $13,$7,$8
immediate operand
0000 00 00 111 01000 01101 000 0010 0000
opcodeextended opcode
The symbol tableThe symbol table
The assembler scans the source code and generates The assembler scans the source code and generates the appropriate bit string for each line encounteredthe appropriate bit string for each line encounteredThe assembler must remember The assembler must remember what memory locations have been allocated what memory locations have been allocated to which address each label is boundto which address each label is bound
A A symbol tablesymbol table is a list of (label, address) pairs is a list of (label, address) pairsWhen the data and text segments have been When the data and text segments have been generated, they are stored as an executable file generated, they are stored as an executable file The file is used by a program called theThe file is used by a program called the loader loader to to initialize memory to the appropriate state before initialize memory to the appropriate state before executionexecution
InstructionsInstructionsThe .text directive tells the assembler that the lines which The .text directive tells the assembler that the lines which follow are instructions. follow are instructions. By default, the text segment starts at 0x00400000By default, the text segment starts at 0x00400000
In some cases, a symbol may not have an assigned address In some cases, a symbol may not have an assigned address yet when the assembler scans the line where it belongsyet when the assembler scans the line where it belongs A second pass through the code can update instructions containing A second pass through the code can update instructions containing
unresolved labelsunresolved labels Maintain a list of addresses in which each unresolved label Maintain a list of addresses in which each unresolved label
appearsappearsWhen the labeled is added to the symbol table, all locations in the When the labeled is added to the symbol table, all locations in the corresponding list are updated to hold the address associated with the corresponding list are updated to hold the address associated with the labellabel
Branch offset in the MIPS R2000Branch offset in the MIPS R2000In machine code, the target address in a branch must be In machine code, the target address in a branch must be specified as an offset from the address of the branch.specified as an offset from the address of the branch.
During execution, this offset is simply added to the During execution, this offset is simply added to the program counter to fetch the next instructionprogram counter to fetch the next instruction PC contains the addressPC contains the address Offset is measured in words, not bytesOffset is measured in words, not bytes
PC_NEW = offset*4 + PC_OLDPC_NEW = offset*4 + PC_OLD
To calculate the offset, the assembler uses the formula:To calculate the offset, the assembler uses the formula:
offset = offset = ((target instruction address – target instruction address – (branch instruction address) (branch instruction address)))/4/4
Branch offset calculationBranch offset calculation
The offset is stored in the instruction as a The offset is stored in the instruction as a word offsetword offset rather than a byte offset.rather than a byte offset. Instructions are only stored at word boundaries Instructions are only stored at word boundaries For both target and branch instruction, the least two bits of For both target and branch instruction, the least two bits of
the address are zerothe address are zero
An offset maybe negativeAn offset maybe negative If the target instruction preceded the branch instructionIf the target instruction preceded the branch instruction
The offset is stored in the 16-bit immediate fieldThe offset is stored in the 16-bit immediate field This means the branch can only jump about 2This means the branch can only jump about 21515
instructions before or after the current addressinstructions before or after the current address221515 instructions (words) = 2 instructions (words) = 21717 bytes bytes
Branch offset calculationBranch offset calculation
[0x00400068] 0x1440ffe6 bne $2, $0, -104 [__start-0x00400068]; 44: bnez $v0, __start
An entry in the SPIM instruction listAn entry in the SPIM instruction list
orignal assembly code
line number in source file
offset calculation, in bytesignores PC increment
offset in bytes (__start = 0x00400000)0x00400000 – (0x00400068) = - 104
machine code
stored offsetffe6 = -26 = -104/4
instruction address
Jump target calculationJump target calculation
The jump instruction has two formsThe jump instruction has two forms Pseudo-direct, for Pseudo-direct, for jj and and jaljal Register direct for Register direct for jrjr and and jalrjalr
jrjr and and jalrjalr specify a register specify a register containing the address to be loaded containing the address to be loaded into the PCinto the PC
jj and jal specify most of the address and jal specify most of the address of the target within the instruction. of the target within the instruction. However, they have a range of at most However, they have a range of at most
one-sixteenth of the memory spaceone-sixteenth of the memory space
fedcba9876543210
Jump target calculationJump target calculationThe target address is a 32 bit quantityThe target address is a 32 bit quantity Since all word addresses are multiples of 4 there is no need Since all word addresses are multiples of 4 there is no need
to store the last two bitsto store the last two bits The jump instruction format has 26 bits for the target addressThe jump instruction format has 26 bits for the target address
The remaining 6 bits of the instruction are used for the opcodeThe remaining 6 bits of the instruction are used for the opcode The highest-order 4 bits of the target are taken from the The highest-order 4 bits of the target are taken from the
address currently stored in the program counteraddress currently stored in the program counter
PC
opcode Jump target bits (26)
00
Jump Target CalculationJump Target Calculation
jump instructions have a jump instructions have a range of range of 222626 words or words or 2226 26 x x 2222 =2 =22828 bytes bytes This range is NOT symmetric This range is NOT symmetric
about the jump instructionabout the jump instruction
fedcba9876543210
0x80000080
-0x00000080
+0x0fffff7c
Program relocationProgram relocationIt is possible that program modules are It is possible that program modules are developed separately by individual developed separately by individual programmers. When these programs are to be programmers. When these programs are to be loaded into memory they should not be assigned loaded into memory they should not be assigned overlapping memory space.overlapping memory space.
Thus,the modules have to be relocatedThus,the modules have to be relocated relative addresses are relocatablerelative addresses are relocatable Any absolute references must be "fixed" by the Any absolute references must be "fixed" by the
loaderloaderUse a logical base address known at load timeUse a logical base address known at load time
Absolute addresses are stored as offsets from this TBD Absolute addresses are stored as offsets from this TBD base base
From source to executableFrom source to executable
compiler assembler
linker
loadermemory
exe
obj
obj
lib
asm
asm
high-levelsource code
Some examples of assembling Some examples of assembling codecode
.data .data a1: .word 3 a1: .word 3 a2: .word 16, 16, 16, 16 a2: .word 16, 16, 16, 16 a3: .word 5 a3: .word 5 .text .text __start: __start: la $6, a2 la $6, a2 loop: loop: lw $7, 4($6) lw $7, 4($6) mul $9, $10, $7 mul $9, $10, $7 b loop b loop li $v0, 10li $v0, 10 syscall syscall
Some examples of assembling Some examples of assembling codecode
Symbol TableSymbol Table symbolsymboladdressaddress a1a1 1000 0000 1000 0000 a2a2 1000 0004 1000 0004 a3a3 1000 0014 1000 0014 __start__start 0040 00000040 0000 looploop 0040 00080040 0008 Memory map of data sectionMemory map of data section addressaddress contentscontents 1000 00001000 0000 0000 00030000 0003 1000 00041000 0004 0000 00100000 0010 1000 00081000 0008 0000 00100000 0010 1000 000c1000 000c 0000 00100000 0010 1000 00101000 0010 0000 00100000 0010 1000 00141000 0014 0000 00050000 0005
.data a1: .word 3 a2: .word 16, 16, 16, 16 a3: .word 5 .text __start: la $6, a2 loop: lw $7, 4($6) mult $9, $10, $7 b loop li $v0, 10 syscall
Translate pseudo-instructionsTranslate pseudo-instructions
lui $6, $6, lui $6, $6, 0x10000x1000
ori $6, $6, ori $6, $6, 0x00040x0004
lw $7, 4($6) lw $7, 4($6) mult $10, $7mult $10, $7 mflo $9 mflo $9 b loop b loop ori $v0, $0, 10ori $v0, $0, 10 syscall syscall
la $6, a2 loop: lw $7, 4($6) mul $9, $10, $7 b loop li $v0, 10 syscall
Translate to machine codeTranslate to machine code
lui $6, 0x1000lui $6, 0x1000 ori $6, 0x0004ori $6, 0x0004 lw $7, 4($6)lw $7, 4($6) mult $10, $7mult $10, $7 mflo $9 mflo $9 b b looploop ori $v0, $0, 10ori $v0, $0, 10 syscallsyscall
address contents
00400000 3c06 1000 (lui)
00400004 34c6 0004 (ori)
00400008 8cc7 0004 (lw)
0040000c 012a 0018 (mult)
00400010 0000 4812 (mflo)
00400014 1000 xxxx (beq)
00400018 3402 000a (ori)
0040001c 0000 000c (syscall)
Resolve relative referencesResolve relative references
lui $6, 0x1000lui $6, 0x1000 ori $6, 0x0004ori $6, 0x0004 lw $7, 4($6)lw $7, 4($6) mult $10, $7mult $10, $7 mflo $9 mflo $9 b b looploop ori $v0, $0, 10ori $v0, $0, 10 syscallsyscall
address contents
00400000 3c06 1000
00400004 34c6 0004
00400008 8cc7 0004
0040000c 012a 0018
00400010 0000 4812
00400014 1000 fffd (-3)
00400018 3402 000a
0040001c 0000 000c
[0x400008 - (0x400014)]/4 = -12/4 = -3 = 0xfffd