2 pass assembler
TRANSCRIPT
2 PASS ASSEMBLER FOR 8085
1 ABSTRACT 2
2 INTRODUCTION 3
3 DESIGN ASPECTS OF ASSEMBLER 4
A HYPOTHETICAL AND SIMPLE ASSEMBLY
LANGUAGE
5
EXAMPLE 6
THE TASKS OF AN ASSEMBLER 6
LITERAL HANDLING 11
ONE PASS THE ASSEMBLER 12
4 BRIEF DESCRIPTION OF 8085 12
SOME OF 8085 INSTRUCTIONS 14
5 THE DETAILED VERSION OF THE ALGORITHM FOR AN
ASSEMBLER FOR 8085
16
MAIN FUNCTIONS USED IN TWO PASSES 18
6 RESULTS 19
7 CONCLUSION 21
8 REFERENCES 22
2 PASS ASSEMBLER FOR 8085
CONTENTSABSTRACT:
Assembler is a program that translates programs from assembly language to machine
language. Before an assembly program can be executed on the computer platform, it
must be translated into the machine language of the target hardware. A program called
assembler does the translation task. The assembler takes as input a stream of assembly
commands, and generates as output a stream of equivalent binary instructions. The
resulting code can be loaded as-is into the computer's memory, and then executed by
the hardware. Assembler is essentially a text-processing program, designed to provide
translation services. Carries out the following operations: Parse the symbolic command
into its underlying fields; For each field, generate the corresponding bits in the machine
language; Replace all symbolic references (if any) with numeric addresses of memory
locations; Assemble the binary codes into a complete machine instruction. The
translation of symbols to numeric addresses is done it two conceptual stages. First, the
assembler creates a symbol table, associating each symbol with a designated memory
address. Next, the assembler uses the symbol table to translate each occurrence of each
symbol in the program to its allocated address. Inside the 8085, instructions are really
stored as binary numbers, not a very good way to look at them and extremely difficult
to decipher. An assembler is a program that allows you to write instructions in, more or
less, English form, much more easily read and understood, and then converted or
assembled into hex numbers and finally into binary numbers.
2 pass Assembler
1
2 PASS ASSEMBLER FOR 8085
Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivePass 2Assemble instructions Generate data values defined by BYTE, WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly
listingINTRODUCTION: Assemblers perform one-to-one translation of symbolic source statements written in assembly language to the corresponding machine language instructions. Assembly language is intermediate between the high-level language and machine language. Assembly language is the symbolic representation of a computer’s binary encoding—machine language. Assembly language is more readable than machine language because it uses symbols instead of bits. The symbols in assembly language name commonly occurring bit patterns, such as opcodes and register specifiers, so that it can be read and remembered. In addition, assembly language permits programmers to use labels to identify and name particular memory words that hold instructions or data. An assembler reads a single assembly language source file and produces an object file containing machine instructions and bookkeeping information that helps combine several object files into a program. Figure illustrates how a program is built. Most programs consist of several files—also called modules — that are written, compiled, and assembled independently. A program may also use prewritten routines supplied in a program library. A module typically contains references to subroutines and data defined in other modules and in libraries. The code in a module cannot be executed when it contains unresolved references to labels in other object files or libraries. Another tool, called a linker, combines a collection of
object and library files into an executable file, which a computer can run.
Fig.1: THE PROCESS THAT PRODUCES AN EXECUTABLE FILEA statement contains an operation name mainly the following three types:Operation Code (Op-code): It is an easy to understand code name for a primitive machine instruction.
Assembler directives (Pseudo-op): It is a symbolic directive to the assembler that tells
the assembler how to translate a program but do not produce machine instructions.
Macro-name: It is a symbolic name, which represents a group of assembly statements.
The purpose of using a macro is to allow a programmer to physically insert a set of
instructions by means of a single symbolic name.
2
2 PASS ASSEMBLER FOR 8085
The operand field contains an address of the operand whose content is manipulated by the
opcode. It could also be the target label of a branch. In general could be a literal, the name of a
machine register or a simple expression involving addresses. In case of such an expression, the
assembler calculates the effective address.
An assembly language program contains absolute entities, relative and externally defined
entities. Absolute entities like op-codes, and fixed addresses are independent of the storage
locations the machine code will eventually occupy. Relative entities like symbolic references
are fixed only in respect to each other and can stated relative to the starting addresses of the
program. Externally defined entities are used but not defined within module. Output of an
assembler is primarily a file containing object code. In addition to the object file the assembler
also produces some other files to help the user in debugging the program. The object file
contains machine codes for the mnemonics and the addresses along with an identification of
whether it is relative, absolute or external.
DESIGN ASPECTS OF ASSEMBLER
Implementation of an assembler can be viewed as having 3 logical stages. During the first
stage, the macro definitions are collected and proper textual substitution is made for macro
calls. This stage is known as microprogramming. The assembler star forms the second stage.
At first scans through the entire text to collect all the symbols associating them with addresses
whenever possible. This stage is known as pass one. In the second pass, the assembler replaces
the mnemonics by machine codes, symbols by the corresponding addresses and generates the
output.
A HYPOTHETICAL AND SIMPLE ASSEMBLY LANGUAGE
Let us consider a model of a hypothetical computer in order to appreciate the functions of an
assembler. For simplicity of explanation and the ease of understanding, our model is a very
small set of assembly level instruction. We now present the detail of the instructions and
functions. In example cited below, A and B are the names of two variables, L is a label and
“ACC” stands for the accumulator. A and B are the 2 variables, L is a label and “ACC” stands
for accumulator. We also include two pseudo code operations: DEFW and CONST . DEFW is
used to reserve one word of storage. The assembly language for the program as shown.
3
2 PASS ASSEMBLER FOR 8085
Mnemonic Machine No of Length of Example Meaningcode code operands instruction
ADD 01 1 2 ADD A ACC:=ACC+A SUB 02 1 2 SUB A ACC:=ACC-A
MULT 03 1 2 MUL T ACC:=ACC*A
JMP 04 1 2 JMP L GOTO L
JNEG 05 1 2 JNEG L if ACC<0 GOTO L
JPOS 06 1 2 JZ if ACC>0 GOTO L
JZ 07 1 2 JZ L if ACC=0 GOTO L
LOAD 08 1 2 LOAD A ACC:=A
STORE 09 1 2 STORE A A:=ACC
READ 10 1 2 READ A A:= Input DataWRITE 11 1 2 WRITE A Output A
STOP 12 0 2 STOP STOP Execution EXAMPLE:
Presenting a small assembly language program in order to highlight the structure of a typical
assembly language program. Consider the problem of reading N numbers and finding their
sum. The details of the instructions and their functions are listed below.
Line no Label Mnemonic code Operand
1 READ N 2 LOAD ZERO 3 STORE COUNT 4 STORE SUM 5 LOOP READ X
4
2 PASS ASSEMBLER FOR 8085
6 LOAD X 7 ADD SUM 8 STORE SUM 9 LOAD COUNT10 ADD ONE11 STORE COUNT12 SUB N13 JZ OUTER14 JMP LOOP 15 OUTER WRITE SUM16 STOP17 ENDP 18 ZERO 19 ONE20 SUM DEFW21 COUNT DEFW22 N DEFW23 X DEFW24 END
THE TASKS OF AN ASSEMBLER:
1. Replace symbolic codes by machine instruction codes.
2. Replace symbolic references by numeric addresses.
3. Reserve storage to be occupied by instruction and data.
4. Translate the literals into their internal representations.
The translation of the program is as follows. Each line is typically of form:
Label opcode field operand(s)
Where one or more of the fields may be missing in an instruction. Assuming program
will be stored from the 0th word of memory onwards.
In line 1 there is a mnemonic read and the operand ‘N’. It is easy to substitute the
machine code for the mnemonic. We can maintain a list of all mnemonics and their
corresponding machine codes in a table usually known as Machine Op-code Table
(MOT). Whenever a mnemonic is encountered this table is searched to fined the
appropriate machine code. MOT also contains other important information’s like the
5
2 PASS ASSEMBLER FOR 8085
number of operands, length of the instruction etc. Which helps in identifying the
appropriate number of operands expected to follow the mnemonic in an assembly
language statement. For example, the entry for “READ” in the MOT tells us that there
should be exactly one operand (“N in this case). Therefore, occurrence of more than one
operand or its absence has to be treated as an error. However, it is not possible to which it
refers. The assembler has not yet encountered any instructions that reserved storage
pointed by N. Moreover, as the size of the assembly language code still unknown, the
available part of the memory where data can be stored can not be fixed in advance. So,
the assembler keeps note of the fact that “N” could not be replaced and postpones this
action till the address of N is resolved. This is achieved by the use of a Symbol table
(ST). Each entry in this table contains the name of a symbol, its address and other
attributes. Whenever a new symbol is encountered, its name and other known attributes
are inserted in the table. Hence, the assembler puts N in the symbol table. Note that the
other attributes of N still remain undefined.
Recall that we have assumed the program will be stored from the 0 th word onwards. The
assembler has to reserve two words of memory for line 1 namely, the word for the
Opcode (Read) and word 1 which will eventually hold the address of the operand (N).
Consequently, the assembler now finds word 2 onwards free for its use. Starting location
of this free part of the memory changes as we proceed along the source code. The
assembler keeps track of this using a variable called the location counter (LC). After each
instruction LC is incremented by L, the length of the machine code of that instruction.
This is precisely the reason why L is included along the mnemonic is the MOT.
Processing of lines 2, 3, and 4 is similar. When we come to line 5 LC becomes eight and
the symbols ZERO, COUNT, SUM have been inserted in the ST. The specialty of line 5
is that it has a label field “LOOP”, The label Loop can be used elsewhere in the program
only to transfer control to line 5i.e. to word 8(the current value of the location counter).
To remember this information for subsequent use, the name Loop and its attribute (with
value 8) is inserted inside the symbol table. Proceeding in this manner, we can continue
up to line 17 inserting the new symbols as encountered in the symbol table and updating
the location counter. The symbol table now contains the following information.
6
2 PASS ASSEMBLER FOR 8085
Symbol Name Type Address Other
N Id ---ZERO Id ---
COUNT Id --- SUM Id --- X Id --- LOOP Id 08 ONE Id --- OUTER Id 28
Note that, the type field can be seen to identify whether a symbol represents a label or an
identifier. T he significance of “ other” fields will be clarified later. The current value of
LC is clearly 31. In line 17, the assembler comes across a pseudo-op ENDP. So the
assembler next searches a similar table called the pseudo operation table (POT), which
contains information regarding all pseudo ops. Unlike mnemonics for machine codes, a
pseudo-op does not always alter the value of the LC. Hence LC remains 31 at the end of
line 17. Seeing the ENDP assembler directive, which signifies the physical end of the
program segment, the assembler can conclude that the remaining free memory words
may be used to store the data.
In line 18, the pseudo-op CONST indicates that the label ZERO is defined as a constant
having the value 0. The assembler then search for the entry of the symbol ZERO is
ST(creates one is not found) puts the LC value (31) as its address and keeps a tag to
remember that it is a constant so that any attempt to overwrite ZERO can be reported as
an error. LC is incremented by 1. When we come to line 20, a new pseudo-op DEFW is
found. Processing of DEFW is similar to that of CONST expect that it is marked as a
variable identifier. Once we come to END in line 24, which indicates the end of the
source text symbol table will be updated as shown below.
Symbol Name Type Address Other N Var Id 35
ZERO CONST Id 31 COUNT Var Id 34 SUM Var Id 33 X Var Id 36 LOOP LABEL 08
7
2 PASS ASSEMBLER FOR 8085
ONE CONST Id 32 OUTER LABEL 28
Now after scanning the entire source text, we are in a position to say the amount of
storage necessary for the instructions, the storage required by the data segment and the
addresses, each symbol should refer to. Recall that while scanning line 1, the assembler
reserved one word for the address of N. But since this address was not known the content
of word 1 had to remain undefined till the availability of the address of N. Actually
replacing such symbolic references is deferred until the end of pass one when unresolved
attributed of all symbols are expected to be completed. The assembler is now equipped
with all necessary information so that it can start generating the object code by going
through the source code for the second time. For example, the object code of the program
in Example1 can easily be generated as follows:
Line Source Text word Output
1 READ N 0 10
1 35
2 LOAD ZERO 2 08
3 31
3 STORE COUNT 4 09
5 34
4 STORE SUM 6 09
7 33
5 LOOP READ X 8 10
9 36
6 LOAD X 10 08
11 36
7 ADD SUM 12 01
13 33
8 STORE SUM 14 09
15 33
9 LOAD COUNT 16 08
17 34
10 ADD ONE 18 01
19 32
8
2 PASS ASSEMBLER FOR 8085
11 STORE COUNT 20 09
21 34
12 SUB N 22 02
23 35
13 JZ OUTER 24 07
25 28
14 JMP LOOP 26 04
27 08
15 OUTER WRITE SUM 28 11
29 33
16 STOP 30 12
17 ENDP
18 ZERO CONST 0 31 00
19 ONE CONST 1 32 01
20 SUM DEFW 33 XX
21 COUNT DEFW 34 XX
22 N DEFW 35 XX
23 X DEFW 36 XX
24 END
Note that the output has been written in decimal for case of understanding. Actually, the
assembler generates the output in binary. The data generation part (lines 18 to 23) needs
some more explanation. Consider line 18. In the first pass, the assembler assigns a value
(equal to 0) to that address. While processing the directive “DEFW” in the second pass,
only storage is reserved and it is not necessarily initialed with some specific value. To
indicate that contents of word 33-36 have been marked XX. However, some assemblers
might as well assign some value, possibly 0, to those words. The program in Example 1
and its translated version identify the major tasks of an assembler and also explain how
those tasks are performed. Let us now more formally present an assembler in an
algorithm form.
LITERAL HANDLING
9
2 PASS ASSEMBLER FOR 8085
An operand whose value is literally stated is called a literal. For example consider the
following two Add commands: (i) ADD A (ii) ADD @21.
In (i) ‘A’ refers to some memory location whose content is to be added.
In (ii), the intension is to add the value 21 (not the content of some explicitly stated
memory address as in (i)). So, 21 is a literal. The purpose of the special symbol @ is to
inform the assembler that what follows it is to be treated as a literal. Literals are very
useful for writing program.
In the first pass, the assembler puts the literals in a table Known as Literal Table.
The literal table consists of three fields, the literal, its corresponding address and the
value. The fields specified at the end of pass1. The data is also generated and the value
field is initialized. In pass 2; the assembler can easily replace the literals by their
addresses.
ONE PASS ASSEMBLER
In pass 1, the assembler may fail to resolve symbolic references mainly due to two
reasons:
(i) The program may contain forward referencing in jump instructions, and
(ii) The definition of data becomes available only at the end of program text
If these facilities are restricted then pass 2 of the assembler can be eliminated. It
is not a major restriction to force the user to define the data at the beginning. But
forward referencing can not be compromised in the jump instructions. Even then the
pass 2 of the assembler could be practically eliminated. To implement the assembler in
10
2 PASS ASSEMBLER FOR 8085
one pass, each undefined symbol along with the address of the operand of
corresponding jump statement can be entered into a table known as Branch Table. Note
that there may be many references to the same symbol because several branches to the
same symbol may occur in a program. At the end of pass 1, this table and ST may be
consulted to settle the unresolved references.
BRIEF DESCRIPTION OF 8085
The 8085 is an 8bit general purpose microprocessor capable of addressing 64K words
of memory. The microprocessor requires a 5volts power supply and can operate at a
3MHZ single phase clock. The functional diagram of 8085 microprocessor is given in
figure the Arithmetic Logic Unit (ALU) includes 8bit accumulator, a temporary
register, arithmetic and logic circuits and five flags. It has six general purpose registers
identified as B, C, D, E, H and L. They can be combined as register pair BC, DE, and
HL in order to perform sixteen bit operations. The accumulator is identified as A.
11
2 PASS ASSEMBLER FOR 8085
Interrupt control Serial I/O control
Accumulator (8) Temp.
Reg. (8)
InstructionReg (8)
Flag(5)Flip-Flops
ArithmeticLogic Unit
(ALUT) (8)
Timing And Control
Instruction Decoder
L (8)
Reg.
H (8)
Reg.
E (8)
Reg.
D (8)
Reg.
C (8)
Reg.
B (8)
Reg.
IncrementerDecrementer
Address latch (16)
Program counter (16)
Stack pointer (16)
Address Buffer (8)
Data/AddressBuffer(8)
8 Bit internal Data Bus
Control Bus Control Bus
Control BusA16 – A8
Address Bus
AD7 – AD0
Address/Data Bus
Fig.2: FUNCTIONAL ORGANIZATION OF THE 8085
The instruction set of 8085 may be classified into the following function categories:
(i) Data transfer, (ii) arithmetic operations, (iii) logical operations, (iv) branching
operations, (v) machine control operations and (vi) assembler directives.
The brief description of the subset of the entire instruction set given in table. Each
instruction contains an op-code and may also have an operand. The op-code is 8 bit
wide. The operand includes an internal register, 8 bit or 16 bit data, a memory location
and 8 or 16 bit address.
12
2 PASS ASSEMBLER FOR 8085
SOME OF 8085 INSTRUCTIONS
Opcode Operands Bytes M/C code Explanation
LDA Addr. 3 3E ACC := (Addr)
STA Addr 3 32 Addr := (ACC)
MOV R1,R2 --- 1 OIDDDSSS R1 := (R2)
LHLD Addr 3 2A HL := (Addr)
SHLD Addr 3 22 Addr := (HL)
MOV R,M --- 1 01DDD110 R := ((HL))
MOV M,R --- 1 01110SSS (HL) := (R)
ADD R --- 1 10000SSS ACC := (ACC)+(R)
ADD M --- 1 86 ACC := (ACC)+((HL))
SUB R --- 1 10010SSS ACC := (ACC) - (R)
SUB M --- 1 96 ACC := (ACC) - ((HL))
INR R --- 1 00SSS100 R := R+1
DCR R --- 1 00SSS101 R := R-1
JZ Label 3 CA If(result = 0) then goto Label
JNZ Label 3 C2 If(result != 0) then goto Label
JC Label 3 DA If(carry bit is 1) then goto Label
JNC Label 3 D2 If(carry bit is 0) then goto Label
JMP Label 3 C3 goto label
HLT --- 1 76 Stop
1 Addr A valid 16 bit address2 ACC Accumulator3 R, R1, R2 Register4 ( X ) Content of X5 DDD Destination6 SSS Sourse7 Lable Target of a branch
13
2 PASS ASSEMBLER FOR 8085
Source/Destination SSS/ DDD
A 111B 000C 001D 010E 011H 100L 101M 110
Here we are mentioning some of the 8085 assembler directives.
Assembler directive Example description
ORG ORG 0200 The next block of instructions should be stored in memory locations starting at 0200.
END END End of assembly.DB Y: DB 05 Reserves a byte symbolically(Define byte) referred as Y and initializes
it with 05.DS L: DS 06 Reserves six bytes of memory(Define Storage) locations for L.
The following data structures and databases are to be maintained to design the assembler for
8085.
1) A file containing the input source program.
2) The Machine Operation Table (MOT). An entry in MOT contains a
mnemonic, its machine code and the length of the instruction.
3) The Pseudo Operation Table (POT) which contains the list of all the
assembler directives.
4) The Symbol Table (ST).
5) The Literal Table (LT).
6) The intermediate file and the output file (.obj).
14
2 PASS ASSEMBLER FOR 8085
THE DETAILED VERSION OF THE ALGORITHM FOR AN ASSEMBLER FOR 8085.
Pass 1:
Step 1 : LC : = 0 ;
Step 2 : Read a line from the input file.
Step 3 : analyze the statement.
We have seen that a statement may contain three fields label, Opcode and the
operant. These parts are identified during the analysis of the statement. Let L,
I, X denote the label, op-code, and operand (if any) of the statement.
Step 3.1 : if the statement conations a label L then
Begin
If (L is not found in ST) then insert L in ST;
The address field of L is set to LC;
end
Step 3.2 :
Case 1 : I is found in MOT
LC := LC+ l Where l is the length of the length of I;
Case 2 : I is found in POT
Case 2.1 : I = ORG
LC := X;
Case 2.2 : I = END
goto step 5;
Case 2.3 : I = DB
LC := LC+1;
Case 2.4 : I = DS
LC := LC+X;
Step 3.3 : if (X is a literal ) then
Begin
LC := LC +1,
Insert X in LT if it is new
end;
Step 4 : goto Step 2
15
2 PASS ASSEMBLER FOR 8085
Step 5 : Set the address of the literal in LT (i >0) to LC + i.
Pass 2:
Step 1 : LC := 0;
Step 2 : Read a line from the input file
Step 3 : analyze the statement
Step 3.1 :
Case 1 : I is found in MOT,
Write a line in the output file containing the current value
of LC, the machine code of I, character ‘a’ (the last
character signifies that the content is ‘absolute’ which will
be needed by loader)
case 2 : I is found in POT
Case 2.1 : I = ORG no action
Case 2.2 : I = END
goto step 5;
Case 2.3 : I = DB
Write a line in the output file containing the current
value of LC, X, the character ‘a’.
Step 3.2 :
Case 1 : X is a literal
Write a line in the output file containing the current value
of LC, the address of X ( as found from LC ) and character
‘r’ ( for “relative” ).
Case 2 : X is a symbol
Write a line in the output file containing the current value
of LC, The address of X ( as found from ST ) and the
letter ‘r’.
Step 4 : goto Step 2
Step 5 : for i = 0 to total number of literals – 1 do
Write a line containing LC+I, the ith literal in LT and the letter ‘a’.
Step 6 : stop.
16
2 PASS ASSEMBLER FOR 8085
MAIN FUNCTIONS USED IN TWO PASSES
We developed some functions that are used for the 8085 assembler.
• PASS 1:
• READ1 To read the assembly source file.
• STSTO Store a label and its value into ST (if symbol is not present in ST).
• POTGET Search the POT for a match with the operation field.
• MOTGET Search the MOT for a match with the operation field.
• LTSTO Store a literal into LT.
• STGET Search the ST for the entry corresponding to a specific symbol
• PASS 2
• READ2 To read the assembly source file from the file copy.
• LTGEN Generate code for literals.
• DCGEN Process the fields of the DC (data constant) pseudo-op to generate object code.
17
2 PASS ASSEMBLER FOR 8085
RESULTS
1) INPUT FILE
LXI H, 0000MVI C, 00MOV A, MINX HADD MJNC LAB1INR CLAB1: INX HMOV M, AINX HMOV M, CHLT
2) SYMBOL TABLE
Index - symbol name - value-----------------------------------------------------------------------s1 - LAB1 - 000C
3) INTERMEDIATE FILE
m80-r6-C-A16-0000
m23-r3-C-A8-00
m22-r1-C-r9
m20-r6
m3-r9
m70-l-s1
m19-r3
l-s1-cn-m20-r6
m22-r9-C-r1
m20-r6
m22-r9-C-r3
m17
18
2 PASS ASSEMBLER FOR 8085
4) OUTPUT FILE
PC Opcode OBJCODE ________________________________________
0000 LXI H , 0000 210000
0003 MVI C , 00 0E00
0005 MOV A , M 7E
0006 INX H 23
0007 ADD M 86
0008 JNC LAB1 D20C00
000B INR C 0C
000C LAB1 : INX H 23
000D MOV M , A 77
000E INX H 23
000F MOV M , C 71
0010 HLT 76
19
2 PASS ASSEMBLER FOR 8085
CONCLUSION
This project generates the object code for 8085 microprocessor. It takes assembly
language as input, and generates object code as output. And this is done with 2 passes. During
the pass1 phase it generates symbol table and an intermediate code. In pass2 phase it takes
intermediate file as input and updates the symbol table and generates the object code.
20
2 PASS ASSEMBLER FOR 8085
REFERENCES
[1] Donovan J.J., “System Programming”, Mc-Graw Hill, New York, 1972.
[2] Barron D. W., “Assemblers and loaders, 2/e”, Elsevier, New York, 1972.
[3] Beck L. L., “System Software: An introduction to systems programming”,
Addison-Wesley, 1985.
[4] Ullman, j. d. ,”Fundamental Concepts of Programming Systems”,
Addison-Wesley, 1976.
[5] The Digital Core, by Nisan & Schocken, 2003, www.idc.ac.il/csd.
[6] Functional programming and the two-pass assembler, by Grady Early,
Southwest Texas State University, San Marcos, Texas.
[7] System Software, by Leland L. Beck.
[8] Assemblers and Loaders, by David Salomon.
[9] Wegner P., “Programming Languages, Information Structure and Machine
Organization” Mc-Graw Hill, NY 1968.
21