the assembly process basically why does it all work
DESCRIPTION
The Assembly Process Basically why does it all work. The Assembly Process. A computer understands machine code People (and compilers) write assembly language An assembler is a program that translates each instruction to its binary machine code equivalent. It is relatively simple program - PowerPoint PPT PresentationTRANSCRIPT
The Assembly ProcessBasically why does it all
work
The Assembly Process
•A computer understands machine code•People (and compilers) write assembly language
•An assembler is a program that translates each instructionto its binary machine code equivalent.
•It is relatively simple program•A one-to-one or near one-to-one correspondencebetween assembly language instructions and machinelanguage instructions.•Assemblers now do some code manipulation
•Like MAL to TAL•Label resolution•A macro assembler can process simple macros likeputs, or preprocessor directives.
assemblerAssemblysource code
Machine code
MAL TAL
MAL is the set of instructions accepted by the assembler.TAL is a subset of MAL – the instructions that can be directlyturned into machine code.
•There are many MAL instructions that have no single TALequivalent.•To determine whether an instruction is a TAL instructionor not:
•Look in appendix C.•The assembler takes (non MIPS) MAL instructions andsynthesizes them into 1 or more MIPS instructions.
MAL TAL
mul $8, $17, $20
Becomes
•MIPS has 2 registers for results from integer multiplicationand division: HI and LO
•Each is a 32 bit register•mult and multu places the least significant 32 bits of itsresult into LO, and the most significant into HI.
•Multiplying two 32-bit numbers gives a 64-bit result•(232 – 1)(232 – 1) = 264 – 2x232 - 1
mult $17, $20mflo $8
MAL TAL
mflo, mtlo, mfhi, mthi
register lofrom move
register hito move
•Data is moved into or out of register HI or LO•One operand is needed to tell where the data is comingfrom or going to.•For division (div or divu)
•HI gets the dividend•LO gets the remainder
•Why aren’t these just put in $0-$31 directly?
MAL TAL
TAL has only base displacement addressinglw $8, label
Becomes:la $7, labellw $8, 0($7)
Which becomeslui $8, 0xMSPART of labelori $8, $8, 0xLSpart of labellw $8, 0($8)
MAL TAL
Instructions with immediates are synthesized with otherinstructions
add $sp, $sp, 4
Becomes:addi $sp, $sp, 4
For TAL:•add requires 3 operands in registers.•addi requires 2 operands in registers and
one operand that is immediate.•On the MIPS immediate instructions include:
•addi, addiu, andi, lui, ori, xori•Why not more?
MAL TAL
TAL implementation of I/O instructions:
putc $18
Becomesaddi $2, $0, 11 # code for putcadd $4, $18, $0 # put character argument in $4syscall # ask operating system to do a function
MAL TAL
getc $11
Becomes:addi $2, $0, 12syscalladd $11, $0, $2
puts $13
Becomes:addi $2, $0, 4add $4, $0, $13syscall
done
Becomes:addi $2, $0, 10syscall
MAL TAL
MAL TAL
move $4, $3 add $4, $3, $0
add $4, $3, 15 addi $4, $3, 15# also andi, ori, etc.
mul $8, $9, $10 mult $9, $10 #HI || LO product # never overflowmflo $8 # $8 $L0, ignore $HI!
div $8, $9, $10 div $9, $10 # $LO quotient # $HI remaindermflo $8
rem $8, $9, $10 div $9, $10mfhi $8
bltz, bgez, blez, bgtz, beqz, bnez, blt, bge, bgt, beq, bne
bltz, bgez, blez, bgtz, beq, bne
MAL TAL
MAL TAL
Branches:
beqz $4, loop beq $4, $0, loop
blt $4, $5, target slt $at, $4, $5 # $at is 1 if $4 < $5 # $at is 0 otherwisebne $at, $0, target
I/O instructions:
put, puts, putc, get, getc, done
Really “procedure call to OS”Assume $2 call typeAssume $4 input parameters
putc $12 addi $2, $0, 11 # putc is syscall 11 # see page 262add $4, $12, $0 # char to putc syscall # call OS
done addi $2, $0, 10 # done is syscall 10syscall
Assembly
The assembler will•Assign addresses•Generate machine code
If necessary, the assembler will•Translate (synthesize) from the accepted assemblyto the instructions available in the architecture•Provide macros and other features•Generate an image of what memory must look like forthe program to be executed.
Assembly
A 2-pass Assembler will1. Create complete symbol table, which is just a list
of the labels (symbols) together with the addressesassigned to each label by the assembler.
2. Complete machine code for instructions that didn’t getfinished in pass 1.
Assembler
What should the assembler do when it sees a directive?
• .data• .text• .space, .word, .byte• org (HC11)• equ (HC11)
How is the memory image formed?
Assembler
Example Data Declaration
•Assembler aligns data to word addresses unless told not to.•Assembly is very sequential.
.dataa1: .word 3a2: .byte ‘\n’a3: .space 5
Address Contents0x00001000 0x000000030x00001004 0x??????0a0x00001008 0x????????0x0000100c 0x????????
Assembler
Machine code generation from simple instructions:
•Opcode is 6 bits – addi is defined to be 001000•Rs is 5 bits, encoding of 20, 10100•Rt is 5 bits, encoding of 8, 01000The 32-bit instruction for addi $8, $20, 15 is:
001000 10100 01000 0000000000001111Or
0x2288000f
Assembly language: addi $8, $20, 15
Machine code format:
opcode
rt rs
immediate
31 0
opcode rs rt immediate
Instruction Formats
I-Type Instructions with 16-bit immediates
•ADDI, ORI, ANDI
•LW, SW
•BNE
OPC:6 rs1:5 rd:5 immediate:16
OPC:6 rs1:5 rs2/rd displacement:16
OPC:6 rs1:5 rs2:5 distance(instr):16
Instruction Formats
J-Type Instructions with 26-bit immediate•J, JAL
R-Type All other instructions•ADD, AND, OR, JR, JALR, SYSCALL, MULT, MFHI,LUI, SLT
OPC:6 26-bits of jump address
OPC:6 rs1:5 rs2:5 ALU function:11rd:5
Assembly Example
.dataa1: .word 3a2: .word 16:4a3: .word 5
.textmain:
la $6, a2loop: lw $7, 4($6)
mult $9, $10b loopdone
Assembly Example
Symbol Table
Symbol addressa1 0040 0000a2 0040 0004a3 0040 0014__start 0080 0000loop 0080 0008
address Contents (hex)
Contents (binary)
0040 0000 0000 0003 0000 0000 0000 0000 0000 0000 0000 0011
0040 0004 0000 0010 0000 0000 0000 0000 0000 0000 0001 0000
0040 0008 0000 0010 0000 0000 0000 0000 0000 0000 0001 0000
0040 000c 0000 0010 0000 0000 0000 0000 0000 0000 0001 0000
0040 0010 0000 0010 0000 0000 0000 0000 0000 0000 0001 0000
0040 0014 0000 0005 0000 0000 0000 0000 0000 0000 0000 0101
Memory map of data section
Assembly Example
Translation to TAL code.text__start: lui $6, 0x0040 # la $6, a2
ori $6, $6, 0x0004loop: lw $7, 4($6)
mult $9, $10beq $0, $0, loop # b loopori $2, $0, 10 # donesyscall
address Contents (hex)
Contents (binary)
0080 0000
3c06 0040 0011 1100 0000 0110 0000 0000 0100 0000 (lui)
0080 0004
34c6 0004 0011 0100 1100 0110 0000 0000 0000 0100 (ori)
0080 0008
8cc7 0004 1000 1100 1100 0111 0000 0000 0000 0100 (lw)
0080 000c
012a 0018 0000 0001 0010 1010 0000 0000 0001 1000 (mult)
0080 0010
1000 fffd 0001 0000 0000 0000 1111 1111 1111 1101 (beq)
0080 0014
3402 000a 0011 0100 0000 0010 0000 0000 0000 1010 (ori)
0080 0018
0000 000c 0000 0000 0000 0000 0000 0000 0000 1100 (sys)
Memory map of text section
Assembly
Branch offset computation.
•At execution time: PC NPC + {sign extended offset field,00}
•PC points to instruction after the beq when offsetis added.•At assembly time:
Byte offset = target addr – (address of branch + 4)= 00800008 – (00800010+00000004)= FFFFFFF4 (-12)
• 3 important observations:•Offset is stored in the instruction as a word offset•An offset may be negative•The field dedicated to the offset is 16 bits, range isthus limited.
Assembly
Jump target computation.
•At execution time:PC {most significant 4 bits of PC, target field, 00}
•At assembly time•Take 32 bit target address•Eliminate least significant 2 bits (since word aligned)•Eliminate most significant 4 bits•What remains is 26 bits, and goes in the target field
Linking and Loading
Object file
header start/size of other parts
text Machine Language
data static data – size and initial values
relocation info instructions and data with absolute addresses
symbol table addresses of external labels
Debuggin` info
Linking and Loading
Linker•Search libraries•Read object files•Relocate code/data•Resolve external references
Loader•Create address spaces for text & data•Copy text & data in memory•Initialize stack and copy args•Initialize regs (maybe)•Initialize other things (OS)•Jump to startup routine
•And then address of __start
Linking and Loading
•The data section starts at 0x00400000 for the MIPS RISC processor.•If the source code has,
.dataa1: .word 15a2: .word –2
then the assembler specifies initial configuration memory as
address contents0x00400000 0000 0000 0000 0000 0000 0000 0000 11110x00400004 1111 1111 1111 1111 1111 1111 1111 1110
•Like the data, the code needs to be placed starting at a specificlocation to make it work
Linking and Loading
•Consider the case where the assembly language code issplit across 2 files. Each is assembled separately.
File 1: File2:
.dataa1: .word 15a2: .word –2
.text__start: la $t0, a1
add $t1, $t0, $s3jal proc5done
.dataa3: .word 0
.textproc5: lw $t6, a1
sub $t2, $t0, $s4jr $ra
Linking and Loading
What happens to…
• a1• a3• __start• proc5• lw• la• jal
Linking and Loading
Problem: there are absolute addresses in the machinecode.
Solutions:
1. Only allow a single source file• Why not?
2. Allow linking and loading to• Relocate pieces of data and code sections• Finish the machine code where symbols were left
undefined• Basically makes absolute address a relative address
Linking and Loading
•The assembler will•Start both data and code sections at address 0, forall files.•Keep track of the size of every data and code section.•Keep track of all absolute addresses within the file.
•Linking and loading will:•Assign starting addresses for all data and code sections,based on their sizes.•The blocks of data and code go at non-overlappinglocations.•Fix all absolute addresses in the code•Place the linked code and data in memory at the location assigned•Start it up
MIPS Example
Code levels of abstraction (from James Larus)
“C” code
#include <stdio.h>int main (int argc, char *argv[]){
int I;int sum = 0;
for (I=0; I<=100; I++) sum += I * I;printf (“The sum 0..100=%d\n”,sum);
}Compile this HLL into a machine’s assembly language with thecompiler.
MIPS Example
.textmain:
subu $sp, 32sw $31, 20($sp)sw $4, 32($sp)sw $0, 24($sp)sw $0, 28($sp)
loop:lw $14, 28($sp)mul $15, $14, $14lw $24, 24($sp)addu $25, $24, $15
sw $8, 28($sp)ble $8, 100, loopla $4, strlw $5, 24($sp)jal printfmove $2, $0lw $31, 20($sp)addu $sp, 32jr $31
.data str: .asciiz “The sum 0..100=%d\n”
MIPS Assembly Language
addiu $sp, $sp,-32sw $ra, 20($sp)sw $a0, 32($sp)sw $a1, 36($sp)sw $0, 24($sp)sw $0, 28($sp)lw t6, 28($sp)lw $t8, 24($sp)multu $t6, $t6addiu $t0, $t6, 1slti $at, $t0, 101sw $t0, 28($sp)mflo $t7addu $t9, $t8, $t7bne $at, $0, -9sw $t9, 24($sp)
lui $a0,4096lw $a1, 24($sp)jal 1048812addiu $a0, $a0, 1072lw $ra, 20($sp)addiu $sp, $sp, 32jr $ra
Which then the assembler translates into binary machine codefor instructions and data.
Now resolve the labels…
MIPS Machine language 001001111011110111111111111000001010111110111111000000000001010010101111101001000000000000100000101011111010010100000000001001001010111110100000000000000001100010101111101000000000000000011100100011111010111000000000000111001000111110111000000000000001100000000001110011100000000000011001001001011100100000000000000000010010100100000001000000000110010110101111101010000000000000011100000000000000000001111000000100100000001100001111110010000010000100010100001000001111111111110111101011111011100100000000000110000011110000000100000100000000000010001111101001010000000000011000000011000001000000000000111011000010010010000100000001000011000010001111101111110000000000010100001001111011110100000000001000000000001111100000000000000000100000000000000000000001000000100001