assembly language - kirkwood community college · assembly language instructions • mnemonics:...

Assembly Language

part 1

Some terminology

• Assembly language: a low-level language

that is a little more human-friendly than

machine language

• assembler: a program that translates

assembly language source code into

executable form

• object code: machine language program

(assembler output)

Types of assemblers

• Resident assembler: an assembler written for its own platform, using the native instruction set

• Cross assembler: assemble run on one platform to produce object code for another

• Disassembler: program that attempts to recover source code from object code (not 100% successful)

Assembly language instructions

• Mnemonics: abbreviated words used instead of machine language hex code; – have one-to-one correspondence with underlying

instruction

– always possible to determine underlying machine language statement from assembly language mnemonic, but not vice-versa

• Pseudo-ops: assembly language statements used mostly for data declaration; do not correspond to specific machine language instructions

Pep/8 assembly language

• General syntax notes:

– one instruction per line of code

– comments start with semicolon, continue until end of line

– not case-sensitive

– Spacing: • at least one space required after each instruction

(mnemonic or pseudo-op)

• otherwise doesn’t matter

– last line of program must be .END pseudo-op

Pep/8 Assembly Language

• Mnemonic instruction format: – 2-6 letter instruction specifier (most or 3-4 letters)

– operand specifier, usually followed by a comma and

– 1-3 letter address mode specifier (most are 1)

• Examples: LDA 0x0014,i ; load hex value 14 to A

LDX 0x1110,d ; load data at address 1110 into x

• Entire Pep/8 assembly language instruction set is printed on the inside front cover (and on page 191) of your textbook


• Addressing mode specifiers:

– i: immediate

– d: direct

– n: indirect

– s: stack-relative

– sf: stack-relative deferred

– x: indexed

– sx: stack-indexed

– sxf: stack-indexed deferred


• Unimplemented opcodes – instructions available at assembly language level,

even though they are not (directly) available at the machine language level

– represent operations handled by the operating system

• They include: – NOPn: unary no operation trap

– NOP: non-unary NOP

– DECI: decimal input trap

– DECO: decimal output trap

– STRO: string output trap

Pseudo-ops

• .ADDRSS: used to crate labeled jump destinations

• .ASCII: specifies char string

• .BLOCK: allocates specified # of bytes, initializes whole set to zero

• .BURN: used for OS configuration

• .BYTE: allocates one byte; can specify hex or decimal content

• .END: stop code

• .EQUATE: equate symbol with literal value; like # define in C/C++

• .WORD: allocates one word of memory

Example program 1

; Program example 1

CHARO 0x0010 ,d

CHARO 0x0011 ,d

CHARO 0x0012 ,d

CHARO 0x0013 ,d

CHARO 0X0014 ,d

STOP

.ASCII "Arrr!"

.END

Comment

Instructions: each outputs one

character; starting address of program

is 0000, and each instruction (except

STOP) is 3 bytes long; STOP is one

byte

Data

Object code & assembler output

from program example 1

Program example 2

BR 0x0008

.ASCII "#?"

.ASCII "\n"

.BLOCK 2

CHARO 0x0003, d

CHARO 0x0004, d

DECI 0x0006, d

CHARO 0x0005, d

DECO 0x0006, d

STOP

.END

go past data – minimizes offset

calculations (contrast with data after code)

“constant” declarations

“variable” declaration

output prompt

read number

output newline character

output number

Data declaration & storage

• The previous example included two different types of data declaration instructions:

– the .ASCII pseudo-op is used to allocate a contiguous set of bytes large enough to hold the specified data; as with Java & C++, the backslash character (\) is used as the escape character for control codes, like newline

– the .BLOCK pseudo-op allocates the specified number of bytes and initializes their values to 0

Data declaration & storage

• Two other pseudo-ops provide data

storage:

– the .WORD instruction allocates two bytes,

suitable for storage of integers

– the .BYTE instruction allocates one byte,

suitable for storage of characters

– like .ASCII (and unlike .BLOCK), both of these

instructions allow the programmer to specify

initial values, as shown on next slide

Initialization examples

• .WORD 7 ; allocates 2 bytes, with

; decimal value 7

• .BYTE 0x2B ; allocate 1 byte, with hex

; value 2B (‘+’)

I/O instructions

• The DECI and DECO instructions considerably ease the process of reading and writing numbers

• Each one deals with word-size data, and represent instructions not available in the underlying machine language – thus they are part of the set of unimplemented op codes

• The actual I/O is performed by the operating system; the instructions generate program interrupts that allow the OS to temporarily take over to provide a service to the program

I/O instructions

• The CHARI and CHARO instructions are

simply assembly language versions of the

machine language input and output

instructions:

– read or write a byte of data

– data source (for output) and destination (for

input) are memory (not registers)

I/O instructions

• STRO is yet another example of an

unimplemented op code

• Outputs a string of data

– String can be predefined with the .ASCII

pseudo-op

– Predefined string must be terminated with a

null character: “\x00”

Arranging instructions and data

• In the first program example (see Monday’s

notes), as with all of the machine language

examples, instructions were placed first, ended

with a STOP code, and data followed

• Problems with this approach:

– requires address calculations based on the number of

instructions (which may not be known as you’re

writing a particular instruction)

– addresses may have to be adjusted if even minor

changes are made to the program

Putting the data first

• An easy solution to the problems described on the previous slide was illustrated by the program example; the solution is twofold:

– declare the data first

– place an unconditional branch instruction at the beginning of the program, pointing to the first instruction after the data

– the following example provides another illustration

Program example 3 br 0x0020 ; bypass data

.block 4 ; space for 2 ints

.ascii "Enter a number: \x00"

.ascii " + \x00"

.ascii " = \x00"

stro 0x0007,d ; prompt

deci 0x0003,d ; get 1st number

stro 0x0007,d ; prompt

deci 0x0005,d ; get 2nd number

deco 0x0003,d ; output 1st number

stro 0x0018,d ; output ascii string " + "

deco 0x0005,d ; output 2nd number

stro 0x001c,d ; output string " = "

lda 0x0003,d ; put the first # in A

adda 0x0005,d ; add 2nd # to first

sta 0x0003,d ; store sum

deco 0x0003,d ; output sum

stop

.end

Program example 4: using labels

br code

pirate: .ASCII "Arrr!\x00"

code: stro pirate ,d

STOP

.END

Symbols

• Symbols are assembler names for memory addresses

• Can be used to label data or instructions

• Syntax rules: – start with letter

– contain letter & digits

– 8 characters max

– CASE sensitive

• Define by placing symbol label at start of line, followed by colon

Symbol Table

• Assembler stores labels & corresponding addresses in lookup table called symbol table

• Value of symbol corresponds to 1st byte of memory address (of data or instruction)

• Symbol table only stores label & address, not nature of what is stored

• Instruction can still be interpreted as data, & vice versa

Example

Example program:

this: deco this, d

stop

.end

Output:

14592

What happened?

High Level Languages & Compilers

• Compilers translate high level language

code into low level language; may be:

– machine language

– assembly language

– for the latter an additional translation step is

required to make the program executable

C++/Java example

// C++ code:

#include <iostream.h>

#include <string>

string greeting =

“Hello world”;

int main ()

{

cout <<

greeting

<< endl;

return 0;

}

// Java code:

public class Hello {

static String greeting =

“Hello world”;

public static void main

(String [] args) {

System.out.print

(greeting);

System.out.print(‘\n’);

}

}

Assembly language (approximate)

equivalent

br main

greeting: .ASCII "Hello world \x00"

main: stro greeting, d

charo '\n', i

stop

.end

Data types

• In a high level language, such as Java or C++,

variables have the following characteristics:

– Name

– Value

– Data Type

• At a lower level (assembly or machine

language), a variable is just a memory location

• The compiler generates a symbol table to keep

track of high level language variables

Symbol table entries

The illustration drove shows a snippet of output

from the Pep/8 assembler. Each symbol table

entry includes:

• the symbol

• the value (of the symbol’s start address)

• the type (.ASCII in this case)

Pep/8 Branching instructions

• We have already seen the use of BR, the

unconditional branch instruction

• Pep/8 also includes 8 conditional branch

instructions; these are used to create

assembly language control structures

• These instructions are described on the

next couple of slides

Conditional branching instructions

• BRLE: – branch on less than or equal

– how it works: if N or Z is 1, PC = operand

• BRLT: – branch on less than

– how it works: if N is 1, PC = operand

• BREQ: – branch on equal

– how it works: if Z is 1, PC = operand

• BRNE: – branch on not equal

– how it works: if Z is 0, PC = operand

Conditional branching instructions

• BRGE: – branch on greater than or equal

– if N is 0, PC = operand

• BRGT: – branch on greater than

– if N and Z are 0, PC = operand

• BRV: – branch if overflow

– if V is 1, PC = operand

• BRC: – branch if carry

– if C is 1, PC = operand

Example

Pep/8 code:

br main

num: .block 2

prompt: .ascii "Enter a number: \x00"

main: stro prompt, d

deci num, d

lda num, d

brge endif

lda num, d

nega ; negate value in a

sta num, d

endif: deco num, d

stop

.end

HLL code:

int num;

Scanner kb = new Scanner();

System.out.print

(“Enter a number: ”);

num = kb.nextInt();

if (num < 0)

num = -num;

System.out.print(num);

Analysis of example Pep/8 code:

br main

num: .block 2

prompt: .ascii "Enter a number: \x00"


deci num, d

lda num, d

brge endif

lda num, d

nega ; negate value in a

sta num, d

endif: deco num, d

stop

.end

The if statement, if

translated back to Java,

would now be more like:

if (num >= 0);

else

num = -num;

This part requires a little

more explanation; see

next slide

Analysis continued

• A compiler must be programmed to translate assignment statements; a reasonable translation of x = 3 might be: – load a value into the accumulator

– evaluate the expression

– store result to variable

• In the case above (and in the assembly language code on the previous page), evaluation of the expression isn’t necessary, since the initial value loaded into A is the only value involved (the second load is really the evaluation of the expression)

Compiler types and efficiency

• An optimizing compiler would perform the necessary source code analysis to recognize that the second load is extraneous – advantage: end product (executable code) is shorter

& faster

– disadvantage: takes longer to compile

• So, an optimizing compiler is good for producing the end product, or a product that will be executed many times (for testing); a non-optimizing compiler, because it does the translation quickly, is better for mid-development

Another example HLL code:

final int limit = 100;

int num;

System.out.print(“Enter a #: ”);

if (num >= limit)

System.out.print(“high”);

else

System.out.print(“low”);

Pep/8 code:

br main

limit: .equate 100

num: .block 2

high: .ascii "high\x00"

low: .ascii "low\x00"

prompt: .ascii "Enter a #: \x00"


deci num, d

if: lda num, d

cpa limit, i

brlt else

stro high, d

br endif

else: stro low, d

endif: stop

.end

Compare instruction: cpr where

r is a register (a or x):

action same as subr except

difference (result) isn’t stored in

the register – just sets status bits –

if N or Z is 0, <= is true

Writing loops in assembly language

• As we have seen, an if or if/else structure in assembly language involves a comparison and then a (possible) branch forward to another section of code

• A loop structure is actually more like its high level language equivalent; for a while loop, the algorithm is: – perform comparison; branch forward if condition isn’t

met (loop ends)

– otherwise, perform statements in loop body

– perform unconditional branch back to comparison

Example

• The following example shows a C++

program (because this is easier to

demonstrate in C++ than in Java) that

performs the following algorithm:

– prompt for input (of a string)

– read one character

– while (character != end character (‘*’))

• write out character

• read next character

C++ code #include <iostream.h>

int main ()

{

char ch;

cout << “Enter a line of text, ending with *” << endl;

cin.get (ch);

while (ch != ‘*’)

{

cout << ch;

cin.get(ch);

}

return 0;

}

Pep/8 code ;am3ex5

br main

ch: .block 1

prompt: .ascii "Enter a line of text ending with *\n\x00"


chari ch, d ; initial read

lda 0x0000, i ; clear accumulator

while: ldbytea ch, d ; load ch into A

cpa '*', i

breq endW

charo ch, d

chari ch, d ; read next letter

br while

endW: stop

.end

Do/while loop

• Post-test loop: condition test occurs after iteration

• Premise of sample program:

– cop is sitting at a speed trap

– speeder drives by

– within 2 seconds, cop starts following, going 5 meters/second faster

– how far does the cop travel before catching up with the speeder?

C++ version of speedtrap #include <iostream.h>

#include <stdlib.h>

int main()

{

int copDistance = 0; // cop is sitting still

int speeder; // speeder's speed: entered by user

int speederDistance; // distance speeder travels from cop’s position

cout << "How fast is the driver going? (Enter whole #): ";

cin >> speeder;

speederDistance = speeder;

do

{

copDistance += speeder + 5;

speederDistance += speeder;

} while (copDistance < speederDistance);

cout << "Cop catches up to speeder in " << copDistance << " meters." << endl;

return 0;

}

Pep/8 version ;speedTrap

br main

cop: .block 2

drvspd: .block 2

drvpos: .block 2

prompt: .ascii "How fast is the driver going? (Enter whole #): \x00"

outpt1: .ascii "Cop catches up to speeder in \x00"

outpt2: .ascii " meters\n\x00"

main: lda 0, i

sta cop, d

stro prompt, d

deci drvspd, d ; cin >> speeder;

ldx drvspd, d ; speederDistance = speeder;

stx drvpos, d

Pep/8 version continued

do: lda 5, i ; copDistance += speeder + 5;

adda drvspd, d

adda cop, d

sta cop, d

addx drvspd, d ; speederDistance += speeder;

stx drvpos, d

while: lda cop, d ; while (copDistance <

cpa drvpos, d ; speederDistance);

brlt do

stro outpt1, d

deco cop, d

stro outpt2, d

stop

.end

For loops

• For loop is just a count-controlled while

loop

• Next example illustrates nested for loops

C++ version #include <iostream.h>

#include <stdlib.h>

int main()

{

int x,y;

for (x=0; x < 4; x++)

{

for (y = x; y > 0; y--)

cout << "* ";

cout << endl;

}

return 0;

}

Pep/8 version ;nestfor

br main

x: .word 0x0000

y: .word 0x0000

main: sta x, d

stx y, d

outer: adda 1, i

cpa 5, i

breq endo

sta x, d

ldx x, d

inner: charo '*', i

charo ' ', i

subx 1, i

cpx 0, i

brne inner

charo '\n', i

br outer

endo: stop

.end

Notes on control structures

• It’s possible to create “control structures”

in assembly language that don’t exist at a

higher level

• Your text describes such a structure,

illustrated and explained on the next slide

A control structure not found in

nature

• Condition C1 is tested; if true,

branch to middle of loop (S3)

• After S3 (however you happen to

get there – via branch from C1 or

sequentially, from S2) test C2

• If C2 is true, branch to top of loop

• No way to do this in C++ or Java

(at least, not without the dreaded

goto statement)

High level language programs vs.

assembly language programs

• If you’re talking about pure speed, a program in

assembly language will almost always beat one

that originated in a high level language

• Assembly and machine language programs

produced by a compiler are almost always

longer and slower

• So why use high level languages (besides the

fact that assembly language is a pain in the

patoot)

Why high level languages?

• Type checking:

– data types sort of exist at low level, but the

assembler doesn’t check your syntax to

ensure you’re using them correctly

– can attempt to DECO a string, for example

• Encourages structured programming

Structured programming

• Flow of control in program is limited to

nestings of if/else, switch, while, do/while

and for statements

• Overuse of branching instructions leads to

spaghetti code

Unstructured branching

• Advantage: can lead to faster, smaller programs

• Disadvantage: Difficult to understand

– and debug

– and maintain

– and modify

• Structured flow of control is newer idea than

branching; a form of branching with gotos by

another name lives on in the Java/C++

switch/case structure

Evolution of structured

programming • First widespread high level language was FORTRAN; it

introduced a new conditional branch statement: if (expression) GOTO new location

• Considered improvement over assembly language – combined CPr and BR statements

• Still used opposite logic: if (expression not true) branch else

// if-related statements here

branch past else

else:

// else-related statements here

destination for if branch

Block-structured languages

• ALGOL-60 (introduced in 1960 – hey, me

too) featured first use of program blocks

for selection/iteration structures

• Descendants of ALGOL include C, C++,

and Java

Structured Programming Theorem

• Any algorithm containing GOTOs can be written using only nested ifs and while loops (proven back in 1966)

• In 1968, Edsgar Dijkstra wrote a famous letter to the editor of Communications of the ACM entitled “gotos considered harmful” – considered the structured programming manifesto

• It turns out that structured code is less expensive to develop, debug and maintain than unstructured code – even factoring in the cost of additional memory requirements and execution time

assembly language - kirkwood community college · assembly language instructions • mnemonics:...

Documents