assembly language - kirkwood community college · assembly language instructions • mnemonics:...
TRANSCRIPT
Assembly Language
part 1
Some terminology
• Assembly language: a low-level language
that is a little more human-friendly than
machine language
• assembler: a program that translates
assembly language source code into
executable form
• object code: machine language program
(assembler output)
Types of assemblers
• Resident assembler: an assembler written for its own platform, using the native instruction set
• Cross assembler: assemble run on one platform to produce object code for another
• Disassembler: program that attempts to recover source code from object code (not 100% successful)
Assembly language instructions
• Mnemonics: abbreviated words used instead of machine language hex code; – have one-to-one correspondence with underlying
instruction
– always possible to determine underlying machine language statement from assembly language mnemonic, but not vice-versa
• Pseudo-ops: assembly language statements used mostly for data declaration; do not correspond to specific machine language instructions
Pep/8 assembly language
• General syntax notes:
– one instruction per line of code
– comments start with semicolon, continue until end of line
– not case-sensitive
– Spacing: • at least one space required after each instruction
(mnemonic or pseudo-op)
• otherwise doesn’t matter
– last line of program must be .END pseudo-op
Pep/8 Assembly Language
• Mnemonic instruction format: – 2-6 letter instruction specifier (most or 3-4 letters)
– operand specifier, usually followed by a comma and
– 1-3 letter address mode specifier (most are 1)
• Examples: LDA 0x0014,i ; load hex value 14 to A
LDX 0x1110,d ; load data at address 1110 into x
• Entire Pep/8 assembly language instruction set is printed on the inside front cover (and on page 191) of your textbook
Pep/8 Assembly Language
• Addressing mode specifiers:
– i: immediate
– d: direct
– n: indirect
– s: stack-relative
– sf: stack-relative deferred
– x: indexed
– sx: stack-indexed
– sxf: stack-indexed deferred
Pep/8 Assembly Language
• Unimplemented opcodes – instructions available at assembly language level,
even though they are not (directly) available at the machine language level
– represent operations handled by the operating system
• They include: – NOPn: unary no operation trap
– NOP: non-unary NOP
– DECI: decimal input trap
– DECO: decimal output trap
– STRO: string output trap
Pseudo-ops
• .ADDRSS: used to crate labeled jump destinations
• .ASCII: specifies char string
• .BLOCK: allocates specified # of bytes, initializes whole set to zero
• .BURN: used for OS configuration
• .BYTE: allocates one byte; can specify hex or decimal content
• .END: stop code
• .EQUATE: equate symbol with literal value; like # define in C/C++
• .WORD: allocates one word of memory
Example program 1
; Program example 1
CHARO 0x0010 ,d
CHARO 0x0011 ,d
CHARO 0x0012 ,d
CHARO 0x0013 ,d
CHARO 0X0014 ,d
STOP
.ASCII "Arrr!"
.END
Comment
Instructions: each outputs one
character; starting address of program
is 0000, and each instruction (except
STOP) is 3 bytes long; STOP is one
byte
Data
Object code & assembler output
from program example 1
Program example 2
BR 0x0008
.ASCII "#?"
.ASCII "\n"
.BLOCK 2
CHARO 0x0003, d
CHARO 0x0004, d
DECI 0x0006, d
CHARO 0x0005, d
DECO 0x0006, d
STOP
.END
go past data – minimizes offset
calculations (contrast with data after code)
“constant” declarations
“variable” declaration
output prompt
read number
output newline character
output number
Data declaration & storage
• The previous example included two different types of data declaration instructions:
– the .ASCII pseudo-op is used to allocate a contiguous set of bytes large enough to hold the specified data; as with Java & C++, the backslash character (\) is used as the escape character for control codes, like newline
– the .BLOCK pseudo-op allocates the specified number of bytes and initializes their values to 0
Data declaration & storage
• Two other pseudo-ops provide data
storage:
– the .WORD instruction allocates two bytes,
suitable for storage of integers
– the .BYTE instruction allocates one byte,
suitable for storage of characters
– like .ASCII (and unlike .BLOCK), both of these
instructions allow the programmer to specify
initial values, as shown on next slide
Initialization examples
• .WORD 7 ; allocates 2 bytes, with
; decimal value 7
• .BYTE 0x2B ; allocate 1 byte, with hex
; value 2B (‘+’)
I/O instructions
• The DECI and DECO instructions considerably ease the process of reading and writing numbers
• Each one deals with word-size data, and represent instructions not available in the underlying machine language – thus they are part of the set of unimplemented op codes
• The actual I/O is performed by the operating system; the instructions generate program interrupts that allow the OS to temporarily take over to provide a service to the program
I/O instructions
• The CHARI and CHARO instructions are
simply assembly language versions of the
machine language input and output
instructions:
– read or write a byte of data
– data source (for output) and destination (for
input) are memory (not registers)
I/O instructions
• STRO is yet another example of an
unimplemented op code
• Outputs a string of data
– String can be predefined with the .ASCII
pseudo-op
– Predefined string must be terminated with a
null character: “\x00”
Arranging instructions and data
• In the first program example (see Monday’s
notes), as with all of the machine language
examples, instructions were placed first, ended
with a STOP code, and data followed
• Problems with this approach:
– requires address calculations based on the number of
instructions (which may not be known as you’re
writing a particular instruction)
– addresses may have to be adjusted if even minor
changes are made to the program
Putting the data first
• An easy solution to the problems described on the previous slide was illustrated by the program example; the solution is twofold:
– declare the data first
– place an unconditional branch instruction at the beginning of the program, pointing to the first instruction after the data
– the following example provides another illustration
Program example 3 br 0x0020 ; bypass data
.block 4 ; space for 2 ints
.ascii "Enter a number: \x00"
.ascii " + \x00"
.ascii " = \x00"
stro 0x0007,d ; prompt
deci 0x0003,d ; get 1st number
stro 0x0007,d ; prompt
deci 0x0005,d ; get 2nd number
deco 0x0003,d ; output 1st number
stro 0x0018,d ; output ascii string " + "
deco 0x0005,d ; output 2nd number
stro 0x001c,d ; output string " = "
lda 0x0003,d ; put the first # in A
adda 0x0005,d ; add 2nd # to first
sta 0x0003,d ; store sum
deco 0x0003,d ; output sum
stop
.end
Program example 4: using labels
br code
pirate: .ASCII "Arrr!\x00"
code: stro pirate ,d
STOP
.END
Symbols
• Symbols are assembler names for memory addresses
• Can be used to label data or instructions
• Syntax rules: – start with letter
– contain letter & digits
– 8 characters max
– CASE sensitive
• Define by placing symbol label at start of line, followed by colon
Symbol Table
• Assembler stores labels & corresponding addresses in lookup table called symbol table
• Value of symbol corresponds to 1st byte of memory address (of data or instruction)
• Symbol table only stores label & address, not nature of what is stored
• Instruction can still be interpreted as data, & vice versa
Example
Example program:
this: deco this, d
stop
.end
Output:
14592
What happened?
High Level Languages & Compilers
• Compilers translate high level language
code into low level language; may be:
– machine language
– assembly language
– for the latter an additional translation step is
required to make the program executable
C++/Java example
// C++ code:
#include <iostream.h>
#include <string>
string greeting =
“Hello world”;
int main ()
{
cout <<
greeting
<< endl;
return 0;
}
// Java code:
public class Hello {
static String greeting =
“Hello world”;
public static void main
(String [] args) {
System.out.print
(greeting);
System.out.print(‘\n’);
}
}
Assembly language (approximate)
equivalent
br main
greeting: .ASCII "Hello world \x00"
main: stro greeting, d
charo '\n', i
stop
.end
Data types
• In a high level language, such as Java or C++,
variables have the following characteristics:
– Name
– Value
– Data Type
• At a lower level (assembly or machine
language), a variable is just a memory location
• The compiler generates a symbol table to keep
track of high level language variables
Symbol table entries
The illustration drove shows a snippet of output
from the Pep/8 assembler. Each symbol table
entry includes:
• the symbol
• the value (of the symbol’s start address)
• the type (.ASCII in this case)
Pep/8 Branching instructions
• We have already seen the use of BR, the
unconditional branch instruction
• Pep/8 also includes 8 conditional branch
instructions; these are used to create
assembly language control structures
• These instructions are described on the
next couple of slides
Conditional branching instructions
• BRLE: – branch on less than or equal
– how it works: if N or Z is 1, PC = operand
• BRLT: – branch on less than
– how it works: if N is 1, PC = operand
• BREQ: – branch on equal
– how it works: if Z is 1, PC = operand
• BRNE: – branch on not equal
– how it works: if Z is 0, PC = operand
Conditional branching instructions
• BRGE: – branch on greater than or equal
– if N is 0, PC = operand
• BRGT: – branch on greater than
– if N and Z are 0, PC = operand
• BRV: – branch if overflow
– if V is 1, PC = operand
• BRC: – branch if carry
– if C is 1, PC = operand
Example
Pep/8 code:
br main
num: .block 2
prompt: .ascii "Enter a number: \x00"
main: stro prompt, d
deci num, d
lda num, d
brge endif
lda num, d
nega ; negate value in a
sta num, d
endif: deco num, d
stop
.end
HLL code:
int num;
Scanner kb = new Scanner();
System.out.print
(“Enter a number: ”);
num = kb.nextInt();
if (num < 0)
num = -num;
System.out.print(num);
Analysis of example Pep/8 code:
br main
num: .block 2
prompt: .ascii "Enter a number: \x00"
main: stro prompt, d
deci num, d
lda num, d
brge endif
lda num, d
nega ; negate value in a
sta num, d
endif: deco num, d
stop
.end
The if statement, if
translated back to Java,
would now be more like:
if (num >= 0);
else
num = -num;
This part requires a little
more explanation; see
next slide
Analysis continued
• A compiler must be programmed to translate assignment statements; a reasonable translation of x = 3 might be: – load a value into the accumulator
– evaluate the expression
– store result to variable
• In the case above (and in the assembly language code on the previous page), evaluation of the expression isn’t necessary, since the initial value loaded into A is the only value involved (the second load is really the evaluation of the expression)
Compiler types and efficiency
• An optimizing compiler would perform the necessary source code analysis to recognize that the second load is extraneous – advantage: end product (executable code) is shorter
& faster
– disadvantage: takes longer to compile
• So, an optimizing compiler is good for producing the end product, or a product that will be executed many times (for testing); a non-optimizing compiler, because it does the translation quickly, is better for mid-development
Another example HLL code:
final int limit = 100;
int num;
System.out.print(“Enter a #: ”);
if (num >= limit)
System.out.print(“high”);
else
System.out.print(“low”);
Pep/8 code:
br main
limit: .equate 100
num: .block 2
high: .ascii "high\x00"
low: .ascii "low\x00"
prompt: .ascii "Enter a #: \x00"
main: stro prompt, d
deci num, d
if: lda num, d
cpa limit, i
brlt else
stro high, d
br endif
else: stro low, d
endif: stop
.end
Compare instruction: cpr where
r is a register (a or x):
action same as subr except
difference (result) isn’t stored in
the register – just sets status bits –
if N or Z is 0, <= is true
Writing loops in assembly language
• As we have seen, an if or if/else structure in assembly language involves a comparison and then a (possible) branch forward to another section of code
• A loop structure is actually more like its high level language equivalent; for a while loop, the algorithm is: – perform comparison; branch forward if condition isn’t
met (loop ends)
– otherwise, perform statements in loop body
– perform unconditional branch back to comparison
Example
• The following example shows a C++
program (because this is easier to
demonstrate in C++ than in Java) that
performs the following algorithm:
– prompt for input (of a string)
– read one character
– while (character != end character (‘*’))
• write out character
• read next character
C++ code #include <iostream.h>
int main ()
{
char ch;
cout << “Enter a line of text, ending with *” << endl;
cin.get (ch);
while (ch != ‘*’)
{
cout << ch;
cin.get(ch);
}
return 0;
}
Pep/8 code ;am3ex5
br main
ch: .block 1
prompt: .ascii "Enter a line of text ending with *\n\x00"
main: stro prompt, d
chari ch, d ; initial read
lda 0x0000, i ; clear accumulator
while: ldbytea ch, d ; load ch into A
cpa '*', i
breq endW
charo ch, d
chari ch, d ; read next letter
br while
endW: stop
.end
Do/while loop
• Post-test loop: condition test occurs after iteration
• Premise of sample program:
– cop is sitting at a speed trap
– speeder drives by
– within 2 seconds, cop starts following, going 5 meters/second faster
– how far does the cop travel before catching up with the speeder?
C++ version of speedtrap #include <iostream.h>
#include <stdlib.h>
int main()
{
int copDistance = 0; // cop is sitting still
int speeder; // speeder's speed: entered by user
int speederDistance; // distance speeder travels from cop’s position
cout << "How fast is the driver going? (Enter whole #): ";
cin >> speeder;
speederDistance = speeder;
do
{
copDistance += speeder + 5;
speederDistance += speeder;
} while (copDistance < speederDistance);
cout << "Cop catches up to speeder in " << copDistance << " meters." << endl;
return 0;
}
Pep/8 version ;speedTrap
br main
cop: .block 2
drvspd: .block 2
drvpos: .block 2
prompt: .ascii "How fast is the driver going? (Enter whole #): \x00"
outpt1: .ascii "Cop catches up to speeder in \x00"
outpt2: .ascii " meters\n\x00"
main: lda 0, i
sta cop, d
stro prompt, d
deci drvspd, d ; cin >> speeder;
ldx drvspd, d ; speederDistance = speeder;
stx drvpos, d
Pep/8 version continued
do: lda 5, i ; copDistance += speeder + 5;
adda drvspd, d
adda cop, d
sta cop, d
addx drvspd, d ; speederDistance += speeder;
stx drvpos, d
while: lda cop, d ; while (copDistance <
cpa drvpos, d ; speederDistance);
brlt do
stro outpt1, d
deco cop, d
stro outpt2, d
stop
.end
For loops
• For loop is just a count-controlled while
loop
• Next example illustrates nested for loops
C++ version #include <iostream.h>
#include <stdlib.h>
int main()
{
int x,y;
for (x=0; x < 4; x++)
{
for (y = x; y > 0; y--)
cout << "* ";
cout << endl;
}
return 0;
}
Pep/8 version ;nestfor
br main
x: .word 0x0000
y: .word 0x0000
main: sta x, d
stx y, d
outer: adda 1, i
cpa 5, i
breq endo
sta x, d
ldx x, d
inner: charo '*', i
charo ' ', i
subx 1, i
cpx 0, i
brne inner
charo '\n', i
br outer
endo: stop
.end
Notes on control structures
• It’s possible to create “control structures”
in assembly language that don’t exist at a
higher level
• Your text describes such a structure,
illustrated and explained on the next slide
A control structure not found in
nature
• Condition C1 is tested; if true,
branch to middle of loop (S3)
• After S3 (however you happen to
get there – via branch from C1 or
sequentially, from S2) test C2
• If C2 is true, branch to top of loop
• No way to do this in C++ or Java
(at least, not without the dreaded
goto statement)
High level language programs vs.
assembly language programs
• If you’re talking about pure speed, a program in
assembly language will almost always beat one
that originated in a high level language
• Assembly and machine language programs
produced by a compiler are almost always
longer and slower
• So why use high level languages (besides the
fact that assembly language is a pain in the
patoot)
Why high level languages?
• Type checking:
– data types sort of exist at low level, but the
assembler doesn’t check your syntax to
ensure you’re using them correctly
– can attempt to DECO a string, for example
• Encourages structured programming
Structured programming
• Flow of control in program is limited to
nestings of if/else, switch, while, do/while
and for statements
• Overuse of branching instructions leads to
spaghetti code
Unstructured branching
• Advantage: can lead to faster, smaller programs
• Disadvantage: Difficult to understand
– and debug
– and maintain
– and modify
• Structured flow of control is newer idea than
branching; a form of branching with gotos by
another name lives on in the Java/C++
switch/case structure
Evolution of structured
programming • First widespread high level language was FORTRAN; it
introduced a new conditional branch statement: if (expression) GOTO new location
• Considered improvement over assembly language – combined CPr and BR statements
• Still used opposite logic: if (expression not true) branch else
// if-related statements here
branch past else
else:
// else-related statements here
destination for if branch
Block-structured languages
• ALGOL-60 (introduced in 1960 – hey, me
too) featured first use of program blocks
for selection/iteration structures
• Descendants of ALGOL include C, C++,
and Java
Structured Programming Theorem
• Any algorithm containing GOTOs can be written using only nested ifs and while loops (proven back in 1966)
• In 1968, Edsgar Dijkstra wrote a famous letter to the editor of Communications of the ACM entitled “gotos considered harmful” – considered the structured programming manifesto
• It turns out that structured code is less expensive to develop, debug and maintain than unstructured code – even factoring in the cost of additional memory requirements and execution time