2 pass assembler

2 PASS ASSEMBLER FOR 8085

1 ABSTRACT 2

2 INTRODUCTION 3

3 DESIGN ASPECTS OF ASSEMBLER 4

A HYPOTHETICAL AND SIMPLE ASSEMBLY

LANGUAGE

5

EXAMPLE 6

THE TASKS OF AN ASSEMBLER 6

LITERAL HANDLING 11

ONE PASS THE ASSEMBLER 12

4 BRIEF DESCRIPTION OF 8085 12

SOME OF 8085 INSTRUCTIONS 14

5 THE DETAILED VERSION OF THE ALGORITHM FOR AN

ASSEMBLER FOR 8085

16

MAIN FUNCTIONS USED IN TWO PASSES 18

6 RESULTS 19

7 CONCLUSION 21

8 REFERENCES 22


CONTENTSABSTRACT:

Assembler is a program that translates programs from assembly language to machine

language. Before an assembly program can be executed on the computer platform, it

must be translated into the machine language of the target hardware. A program called

assembler does the translation task. The assembler takes as input a stream of assembly

commands, and generates as output a stream of equivalent binary instructions. The

resulting code can be loaded as-is into the computer's memory, and then executed by

the hardware. Assembler is essentially a text-processing program, designed to provide

translation services. Carries out the following operations: Parse the symbolic command

into its underlying fields; For each field, generate the corresponding bits in the machine

language; Replace all symbolic references (if any) with numeric addresses of memory

locations; Assemble the binary codes into a complete machine instruction. The

translation of symbols to numeric addresses is done it two conceptual stages. First, the

assembler creates a symbol table, associating each symbol with a designated memory

address. Next, the assembler uses the symbol table to translate each occurrence of each

symbol in the program to its allocated address. Inside the 8085, instructions are really

stored as binary numbers, not a very good way to look at them and extremely difficult

to decipher. An assembler is a program that allows you to write instructions in, more or

less, English form, much more easily read and understood, and then converted or

assembled into hex numbers and finally into binary numbers.

2 pass Assembler

1


Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivePass 2Assemble instructions Generate data values defined by BYTE, WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly

listingINTRODUCTION: Assemblers perform one-to-one translation of symbolic source statements written in assembly language to the corresponding machine language instructions. Assembly language is intermediate between the high-level language and machine language. Assembly language is the symbolic representation of a computer’s binary encoding—machine language. Assembly language is more readable than machine language because it uses symbols instead of bits. The symbols in assembly language name commonly occurring bit patterns, such as opcodes and register specifiers, so that it can be read and remembered. In addition, assembly language permits programmers to use labels to identify and name particular memory words that hold instructions or data. An assembler reads a single assembly language source file and produces an object file containing machine instructions and bookkeeping information that helps combine several object files into a program. Figure illustrates how a program is built. Most programs consist of several files—also called modules — that are written, compiled, and assembled independently. A program may also use prewritten routines supplied in a program library. A module typically contains references to subroutines and data defined in other modules and in libraries. The code in a module cannot be executed when it contains unresolved references to labels in other object files or libraries. Another tool, called a linker, combines a collection of

object and library files into an executable file, which a computer can run.

Fig.1: THE PROCESS THAT PRODUCES AN EXECUTABLE FILEA statement contains an operation name mainly the following three types:Operation Code (Op-code): It is an easy to understand code name for a primitive machine instruction.

Assembler directives (Pseudo-op): It is a symbolic directive to the assembler that tells

the assembler how to translate a program but do not produce machine instructions.

Macro-name: It is a symbolic name, which represents a group of assembly statements.

The purpose of using a macro is to allow a programmer to physically insert a set of

instructions by means of a single symbolic name.

2


The operand field contains an address of the operand whose content is manipulated by the

opcode. It could also be the target label of a branch. In general could be a literal, the name of a

machine register or a simple expression involving addresses. In case of such an expression, the

assembler calculates the effective address.

An assembly language program contains absolute entities, relative and externally defined

entities. Absolute entities like op-codes, and fixed addresses are independent of the storage

locations the machine code will eventually occupy. Relative entities like symbolic references

are fixed only in respect to each other and can stated relative to the starting addresses of the

program. Externally defined entities are used but not defined within module. Output of an

assembler is primarily a file containing object code. In addition to the object file the assembler

also produces some other files to help the user in debugging the program. The object file

contains machine codes for the mnemonics and the addresses along with an identification of

whether it is relative, absolute or external.

DESIGN ASPECTS OF ASSEMBLER

Implementation of an assembler can be viewed as having 3 logical stages. During the first

stage, the macro definitions are collected and proper textual substitution is made for macro

calls. This stage is known as microprogramming. The assembler star forms the second stage.

At first scans through the entire text to collect all the symbols associating them with addresses

whenever possible. This stage is known as pass one. In the second pass, the assembler replaces

the mnemonics by machine codes, symbols by the corresponding addresses and generates the

output.

A HYPOTHETICAL AND SIMPLE ASSEMBLY LANGUAGE

Let us consider a model of a hypothetical computer in order to appreciate the functions of an

assembler. For simplicity of explanation and the ease of understanding, our model is a very

small set of assembly level instruction. We now present the detail of the instructions and

functions. In example cited below, A and B are the names of two variables, L is a label and

“ACC” stands for the accumulator. A and B are the 2 variables, L is a label and “ACC” stands

for accumulator. We also include two pseudo code operations: DEFW and CONST . DEFW is

used to reserve one word of storage. The assembly language for the program as shown.

3


Mnemonic Machine No of Length of Example Meaningcode code operands instruction

ADD 01 1 2 ADD A ACC:=ACC+A SUB 02 1 2 SUB A ACC:=ACC-A

MULT 03 1 2 MUL T ACC:=ACC*A

JMP 04 1 2 JMP L GOTO L

JNEG 05 1 2 JNEG L if ACC<0 GOTO L

JPOS 06 1 2 JZ if ACC>0 GOTO L

JZ 07 1 2 JZ L if ACC=0 GOTO L

LOAD 08 1 2 LOAD A ACC:=A

STORE 09 1 2 STORE A A:=ACC

READ 10 1 2 READ A A:= Input DataWRITE 11 1 2 WRITE A Output A

STOP 12 0 2 STOP STOP Execution EXAMPLE:

Presenting a small assembly language program in order to highlight the structure of a typical

assembly language program. Consider the problem of reading N numbers and finding their

sum. The details of the instructions and their functions are listed below.

Line no Label Mnemonic code Operand

1 READ N 2 LOAD ZERO 3 STORE COUNT 4 STORE SUM 5 LOOP READ X

4


6 LOAD X 7 ADD SUM 8 STORE SUM 9 LOAD COUNT10 ADD ONE11 STORE COUNT12 SUB N13 JZ OUTER14 JMP LOOP 15 OUTER WRITE SUM16 STOP17 ENDP 18 ZERO 19 ONE20 SUM DEFW21 COUNT DEFW22 N DEFW23 X DEFW24 END

THE TASKS OF AN ASSEMBLER:

1. Replace symbolic codes by machine instruction codes.

2. Replace symbolic references by numeric addresses.

3. Reserve storage to be occupied by instruction and data.

4. Translate the literals into their internal representations.

The translation of the program is as follows. Each line is typically of form:

Label opcode field operand(s)

Where one or more of the fields may be missing in an instruction. Assuming program

will be stored from the 0th word of memory onwards.

In line 1 there is a mnemonic read and the operand ‘N’. It is easy to substitute the

machine code for the mnemonic. We can maintain a list of all mnemonics and their

corresponding machine codes in a table usually known as Machine Op-code Table

(MOT). Whenever a mnemonic is encountered this table is searched to fined the

appropriate machine code. MOT also contains other important information’s like the

5


number of operands, length of the instruction etc. Which helps in identifying the

appropriate number of operands expected to follow the mnemonic in an assembly

language statement. For example, the entry for “READ” in the MOT tells us that there

should be exactly one operand (“N in this case). Therefore, occurrence of more than one

operand or its absence has to be treated as an error. However, it is not possible to which it

refers. The assembler has not yet encountered any instructions that reserved storage

pointed by N. Moreover, as the size of the assembly language code still unknown, the

available part of the memory where data can be stored can not be fixed in advance. So,

the assembler keeps note of the fact that “N” could not be replaced and postpones this

action till the address of N is resolved. This is achieved by the use of a Symbol table

(ST). Each entry in this table contains the name of a symbol, its address and other

attributes. Whenever a new symbol is encountered, its name and other known attributes

are inserted in the table. Hence, the assembler puts N in the symbol table. Note that the

other attributes of N still remain undefined.

Recall that we have assumed the program will be stored from the 0 th word onwards. The

assembler has to reserve two words of memory for line 1 namely, the word for the

Opcode (Read) and word 1 which will eventually hold the address of the operand (N).

Consequently, the assembler now finds word 2 onwards free for its use. Starting location

of this free part of the memory changes as we proceed along the source code. The

assembler keeps track of this using a variable called the location counter (LC). After each

instruction LC is incremented by L, the length of the machine code of that instruction.

This is precisely the reason why L is included along the mnemonic is the MOT.

Processing of lines 2, 3, and 4 is similar. When we come to line 5 LC becomes eight and

the symbols ZERO, COUNT, SUM have been inserted in the ST. The specialty of line 5

is that it has a label field “LOOP”, The label Loop can be used elsewhere in the program

only to transfer control to line 5i.e. to word 8(the current value of the location counter).

To remember this information for subsequent use, the name Loop and its attribute (with

value 8) is inserted inside the symbol table. Proceeding in this manner, we can continue

up to line 17 inserting the new symbols as encountered in the symbol table and updating

the location counter. The symbol table now contains the following information.

6


Symbol Name Type Address Other

N Id ---ZERO Id ---

COUNT Id --- SUM Id --- X Id --- LOOP Id 08 ONE Id --- OUTER Id 28

Note that, the type field can be seen to identify whether a symbol represents a label or an

identifier. T he significance of “ other” fields will be clarified later. The current value of

LC is clearly 31. In line 17, the assembler comes across a pseudo-op ENDP. So the

assembler next searches a similar table called the pseudo operation table (POT), which

contains information regarding all pseudo ops. Unlike mnemonics for machine codes, a

pseudo-op does not always alter the value of the LC. Hence LC remains 31 at the end of

line 17. Seeing the ENDP assembler directive, which signifies the physical end of the

program segment, the assembler can conclude that the remaining free memory words

may be used to store the data.

In line 18, the pseudo-op CONST indicates that the label ZERO is defined as a constant

having the value 0. The assembler then search for the entry of the symbol ZERO is

ST(creates one is not found) puts the LC value (31) as its address and keeps a tag to

remember that it is a constant so that any attempt to overwrite ZERO can be reported as

an error. LC is incremented by 1. When we come to line 20, a new pseudo-op DEFW is

found. Processing of DEFW is similar to that of CONST expect that it is marked as a

variable identifier. Once we come to END in line 24, which indicates the end of the

source text symbol table will be updated as shown below.

Symbol Name Type Address Other N Var Id 35

ZERO CONST Id 31 COUNT Var Id 34 SUM Var Id 33 X Var Id 36 LOOP LABEL 08

7


ONE CONST Id 32 OUTER LABEL 28

Now after scanning the entire source text, we are in a position to say the amount of

storage necessary for the instructions, the storage required by the data segment and the

addresses, each symbol should refer to. Recall that while scanning line 1, the assembler

reserved one word for the address of N. But since this address was not known the content

of word 1 had to remain undefined till the availability of the address of N. Actually

replacing such symbolic references is deferred until the end of pass one when unresolved

attributed of all symbols are expected to be completed. The assembler is now equipped

with all necessary information so that it can start generating the object code by going

through the source code for the second time. For example, the object code of the program

in Example1 can easily be generated as follows:

Line Source Text word Output

1 READ N 0 10

1 35

2 LOAD ZERO 2 08

3 31

3 STORE COUNT 4 09

5 34

4 STORE SUM 6 09

7 33

5 LOOP READ X 8 10

9 36

6 LOAD X 10 08

11 36

7 ADD SUM 12 01

13 33

8 STORE SUM 14 09

15 33

9 LOAD COUNT 16 08

17 34

10 ADD ONE 18 01

19 32

8


11 STORE COUNT 20 09

21 34

12 SUB N 22 02

23 35

13 JZ OUTER 24 07

25 28

14 JMP LOOP 26 04

27 08

15 OUTER WRITE SUM 28 11

29 33

16 STOP 30 12

17 ENDP

18 ZERO CONST 0 31 00

19 ONE CONST 1 32 01

20 SUM DEFW 33 XX

21 COUNT DEFW 34 XX

22 N DEFW 35 XX

23 X DEFW 36 XX

24 END

Note that the output has been written in decimal for case of understanding. Actually, the

assembler generates the output in binary. The data generation part (lines 18 to 23) needs

some more explanation. Consider line 18. In the first pass, the assembler assigns a value

(equal to 0) to that address. While processing the directive “DEFW” in the second pass,

only storage is reserved and it is not necessarily initialed with some specific value. To

indicate that contents of word 33-36 have been marked XX. However, some assemblers

might as well assign some value, possibly 0, to those words. The program in Example 1

and its translated version identify the major tasks of an assembler and also explain how

those tasks are performed. Let us now more formally present an assembler in an

algorithm form.

LITERAL HANDLING

9


An operand whose value is literally stated is called a literal. For example consider the

following two Add commands: (i) ADD A (ii) ADD @21.

In (i) ‘A’ refers to some memory location whose content is to be added.

In (ii), the intension is to add the value 21 (not the content of some explicitly stated

memory address as in (i)). So, 21 is a literal. The purpose of the special symbol @ is to

inform the assembler that what follows it is to be treated as a literal. Literals are very

useful for writing program.

In the first pass, the assembler puts the literals in a table Known as Literal Table.

The literal table consists of three fields, the literal, its corresponding address and the

value. The fields specified at the end of pass1. The data is also generated and the value

field is initialized. In pass 2; the assembler can easily replace the literals by their

addresses.

ONE PASS ASSEMBLER

In pass 1, the assembler may fail to resolve symbolic references mainly due to two

reasons:

(i) The program may contain forward referencing in jump instructions, and

(ii) The definition of data becomes available only at the end of program text

If these facilities are restricted then pass 2 of the assembler can be eliminated. It

is not a major restriction to force the user to define the data at the beginning. But

forward referencing can not be compromised in the jump instructions. Even then the

pass 2 of the assembler could be practically eliminated. To implement the assembler in

10


one pass, each undefined symbol along with the address of the operand of

corresponding jump statement can be entered into a table known as Branch Table. Note

that there may be many references to the same symbol because several branches to the

same symbol may occur in a program. At the end of pass 1, this table and ST may be

consulted to settle the unresolved references.

BRIEF DESCRIPTION OF 8085

The 8085 is an 8bit general purpose microprocessor capable of addressing 64K words

of memory. The microprocessor requires a 5volts power supply and can operate at a

3MHZ single phase clock. The functional diagram of 8085 microprocessor is given in

figure the Arithmetic Logic Unit (ALU) includes 8bit accumulator, a temporary

register, arithmetic and logic circuits and five flags. It has six general purpose registers

identified as B, C, D, E, H and L. They can be combined as register pair BC, DE, and

HL in order to perform sixteen bit operations. The accumulator is identified as A.

11


Interrupt control Serial I/O control

Accumulator (8) Temp.

Reg. (8)

InstructionReg (8)

Flag(5)Flip-Flops

ArithmeticLogic Unit

(ALUT) (8)

Timing And Control

Instruction Decoder

L (8)

Reg.

H (8)

Reg.

E (8)

Reg.

D (8)

Reg.

C (8)

Reg.

B (8)

Reg.

IncrementerDecrementer

Address latch (16)

Program counter (16)

Stack pointer (16)

Address Buffer (8)

Data/AddressBuffer(8)

8 Bit internal Data Bus

Control Bus Control Bus

Control BusA16 – A8

Address Bus

AD7 – AD0

Address/Data Bus

Fig.2: FUNCTIONAL ORGANIZATION OF THE 8085

The instruction set of 8085 may be classified into the following function categories:

(i) Data transfer, (ii) arithmetic operations, (iii) logical operations, (iv) branching

operations, (v) machine control operations and (vi) assembler directives.

The brief description of the subset of the entire instruction set given in table. Each

instruction contains an op-code and may also have an operand. The op-code is 8 bit

wide. The operand includes an internal register, 8 bit or 16 bit data, a memory location

and 8 or 16 bit address.

12


SOME OF 8085 INSTRUCTIONS

Opcode Operands Bytes M/C code Explanation

LDA Addr. 3 3E ACC := (Addr)

STA Addr 3 32 Addr := (ACC)

MOV R1,R2 --- 1 OIDDDSSS R1 := (R2)

LHLD Addr 3 2A HL := (Addr)

SHLD Addr 3 22 Addr := (HL)

MOV R,M --- 1 01DDD110 R := ((HL))

MOV M,R --- 1 01110SSS (HL) := (R)

ADD R --- 1 10000SSS ACC := (ACC)+(R)

ADD M --- 1 86 ACC := (ACC)+((HL))

SUB R --- 1 10010SSS ACC := (ACC) - (R)

SUB M --- 1 96 ACC := (ACC) - ((HL))

INR R --- 1 00SSS100 R := R+1

DCR R --- 1 00SSS101 R := R-1

JZ Label 3 CA If(result = 0) then goto Label

JNZ Label 3 C2 If(result != 0) then goto Label

JC Label 3 DA If(carry bit is 1) then goto Label

JNC Label 3 D2 If(carry bit is 0) then goto Label

JMP Label 3 C3 goto label

HLT --- 1 76 Stop

1 Addr A valid 16 bit address2 ACC Accumulator3 R, R1, R2 Register4 ( X ) Content of X5 DDD Destination6 SSS Sourse7 Lable Target of a branch

13


Source/Destination SSS/ DDD

A 111B 000C 001D 010E 011H 100L 101M 110

Here we are mentioning some of the 8085 assembler directives.

Assembler directive Example description

ORG ORG 0200 The next block of instructions should be stored in memory locations starting at 0200.

END END End of assembly.DB Y: DB 05 Reserves a byte symbolically(Define byte) referred as Y and initializes

it with 05.DS L: DS 06 Reserves six bytes of memory(Define Storage) locations for L.

The following data structures and databases are to be maintained to design the assembler for

8085.

1) A file containing the input source program.

2) The Machine Operation Table (MOT). An entry in MOT contains a

mnemonic, its machine code and the length of the instruction.

3) The Pseudo Operation Table (POT) which contains the list of all the

assembler directives.

4) The Symbol Table (ST).

5) The Literal Table (LT).

6) The intermediate file and the output file (.obj).

14


THE DETAILED VERSION OF THE ALGORITHM FOR AN ASSEMBLER FOR 8085.

Pass 1:

Step 1 : LC : = 0 ;

Step 2 : Read a line from the input file.

Step 3 : analyze the statement.

We have seen that a statement may contain three fields label, Opcode and the

operant. These parts are identified during the analysis of the statement. Let L,

I, X denote the label, op-code, and operand (if any) of the statement.

Step 3.1 : if the statement conations a label L then

Begin

If (L is not found in ST) then insert L in ST;

The address field of L is set to LC;

end

Step 3.2 :

Case 1 : I is found in MOT

LC := LC+ l Where l is the length of the length of I;

Case 2 : I is found in POT

Case 2.1 : I = ORG

LC := X;

Case 2.2 : I = END

goto step 5;

Case 2.3 : I = DB

LC := LC+1;

Case 2.4 : I = DS

LC := LC+X;

Step 3.3 : if (X is a literal ) then

Begin

LC := LC +1,

Insert X in LT if it is new

end;

Step 4 : goto Step 2

15


Step 5 : Set the address of the literal in LT (i >0) to LC + i.

Pass 2:

Step 1 : LC := 0;

Step 2 : Read a line from the input file

Step 3 : analyze the statement

Step 3.1 :

Case 1 : I is found in MOT,

Write a line in the output file containing the current value

of LC, the machine code of I, character ‘a’ (the last

character signifies that the content is ‘absolute’ which will

be needed by loader)

case 2 : I is found in POT

Case 2.1 : I = ORG no action

Case 2.2 : I = END

goto step 5;

Case 2.3 : I = DB

Write a line in the output file containing the current

value of LC, X, the character ‘a’.

Step 3.2 :

Case 1 : X is a literal


of LC, the address of X ( as found from LC ) and character

‘r’ ( for “relative” ).

Case 2 : X is a symbol


of LC, The address of X ( as found from ST ) and the

letter ‘r’.

Step 4 : goto Step 2

Step 5 : for i = 0 to total number of literals – 1 do

Write a line containing LC+I, the ith literal in LT and the letter ‘a’.

Step 6 : stop.

16


MAIN FUNCTIONS USED IN TWO PASSES

We developed some functions that are used for the 8085 assembler.

• PASS 1:

• READ1 To read the assembly source file.

• STSTO Store a label and its value into ST (if symbol is not present in ST).

• POTGET Search the POT for a match with the operation field.

• MOTGET Search the MOT for a match with the operation field.

• LTSTO Store a literal into LT.

• STGET Search the ST for the entry corresponding to a specific symbol

• PASS 2

• READ2 To read the assembly source file from the file copy.

• LTGEN Generate code for literals.

• DCGEN Process the fields of the DC (data constant) pseudo-op to generate object code.

17


RESULTS

1) INPUT FILE

LXI H, 0000MVI C, 00MOV A, MINX HADD MJNC LAB1INR CLAB1: INX HMOV M, AINX HMOV M, CHLT

2) SYMBOL TABLE

Index - symbol name - value-----------------------------------------------------------------------s1 - LAB1 - 000C

3) INTERMEDIATE FILE

m80-r6-C-A16-0000

m23-r3-C-A8-00

m22-r1-C-r9

m20-r6

m3-r9

m70-l-s1

m19-r3

l-s1-cn-m20-r6

m22-r9-C-r1

m20-r6

m22-r9-C-r3

m17

18


4) OUTPUT FILE

PC Opcode OBJCODE ________________________________________

0000 LXI H , 0000 210000

0003 MVI C , 00 0E00

0005 MOV A , M 7E

0006 INX H 23

0007 ADD M 86

0008 JNC LAB1 D20C00

000B INR C 0C

000C LAB1 : INX H 23

000D MOV M , A 77

000E INX H 23

000F MOV M , C 71

0010 HLT 76

19


CONCLUSION

This project generates the object code for 8085 microprocessor. It takes assembly

language as input, and generates object code as output. And this is done with 2 passes. During

the pass1 phase it generates symbol table and an intermediate code. In pass2 phase it takes

intermediate file as input and updates the symbol table and generates the object code.

20


REFERENCES

[1] Donovan J.J., “System Programming”, Mc-Graw Hill, New York, 1972.

[2] Barron D. W., “Assemblers and loaders, 2/e”, Elsevier, New York, 1972.

[3] Beck L. L., “System Software: An introduction to systems programming”,

Addison-Wesley, 1985.

[4] Ullman, j. d. ,”Fundamental Concepts of Programming Systems”,

Addison-Wesley, 1976.

[5] The Digital Core, by Nisan & Schocken, 2003, www.idc.ac.il/csd.

[6] Functional programming and the two-pass assembler, by Grady Early,

Southwest Texas State University, San Marcos, Texas.

[7] System Software, by Leland L. Beck.

[8] Assemblers and Loaders, by David Salomon.

[9] Wegner P., “Programming Languages, Information Structure and Machine

Organization” Mc-Graw Hill, NY 1968.

21

2 pass assembler

Documents

assembly language program

assembly program

machine instructions

highlevel language

object program

assembler literal handlingone

program library

assembly listing