lesson 01

What is Assembly Language?

Introduction to the GNU/Linux assembler and linker

for Intel Pentium processors

High-Level Language

• Most programming nowdays is done using so-called “high-level” languages (such as

FORTRAN, BASIC, COBOL, PASCAL, C, C++, JAVA, SCHEME, Lisp, ADA, etc.)

• These languages deliberately “hide” from a programmer many details concerning HOW his problem actually will be solved by the underlying computing machinery

The BASIC language

• Some languages allow programmers to forget about the computer completely!

• The language can express a computing problem with a few words of English, plus formulas familiar from high-school algebra

• EXAMPLE PROBLEM: Compute 4 plus 5

The example in BASIC

1 LET X = 4

2 LET Y = 5

3 LET Z = X + Y

4 PRINT X, “+”, Y, “=“, Z

5 END

Output: 4 + 5 = 9

The C language

• Other high-level languages do require a small amount of awareness by the program-author of how a computation is going to be processed

• For example, that: - the main program will get “linked” with a “library” of other special-purpose subroutines- instructions and data will get placed into separate sections of the machine’s memory - variables and constants get treated differently- data items have specific space requirements

Same example: rewritten in C

#include <stdio.h> // needed for printf()

int x = 4, y = 5; // initialized variablesint z; // unitialized variable

int main(){

z = x + y;printf( “%d + %d = %d \n”, x, y, z );

}

“ends” versus “means”

• Key point: high-level languages let programmers focus attention on the problem to be solved, and not spend effort thinking about details of “how” a particular piece of electrical machiney is going to carry out the pieces of a desired computation

• Key benefit: their problem gets solved sooner (because their program can be written faster)

• Programmers don’t have to know very much about how a digital computer actually works

computer scientist vs. programmer

• But computer scientists DO want to know how computers actually work:-- so we can fix computers if they break-- so we can use the optimum algorithm-- so we can predict computer behavior -- so we can devise faster computers -- so we can build cheaper computers-- so we can pick one suited to a problem

A machine’s own language

• For understanding how computers work, we need familiarity with the computer’s own language (called “machine language”)

• It’s LOW-LEVEL language (very detailed)

• It is specific to a machine’s “architecture”

• It is a language “spoken” using voltages

• Humans represent it with zeros and ones

Example of machine-language

Here’s what a program-fragment looks like:

10100001 10111100 10010011 0000010000001000 00000011 00000101 11000000 10010011 00000100 00001000 1010001111000000 10010100 00000100 00001000

It means: z = x + y;

Incomprehensible?

• Though possible, it is extremely difficult, tedious (and error-prone) for humans to read and write “raw” machine-language

• When unavoidable, a special notation can help (called hexadecimal representation):

A1 BC 93 04 08 03 05 C0 93 04 08 A3 C0 94 04 08

• But still this looks rather meaningless!

Hence: assembly language

• There are two key ideas:

-- mnemonic opcodes: we employ abbreviations of English language words to denote operations

-- symbolic addresses: we invent “meaningful” names for memory storage locations we need

• These make machine-language understandable to humans – if they know their machine’s design

• Let’s see our example-program, rewritten using actual “assembly language” for Intel’s Pentium

Simplified Block Diagram

CentralProcessing

Unit

MainMemory

I/Odevice

I/Odevice

I/Odevice

I/Odevice

system bus

Pentium’s visible “registers”

• Four general-purpose registers:eax, ebx, ecx, edx

• Four memory-addressing registers:esp, ebp, esi, edi

• Six memory-segment registers:cs, ds, es, fs, gs, ss

• An instruction-pointer and a flags register:eip, eflags

The “Fetch-Execute” Cycle

ESP

EIPProgram

Instructions(TEXT)

ProgramVariables(DATA)

TemporaryStorage(STACK)

main memory

central processor

EAXEAXEAXEAX

the system bus

Define symbolic constants

.equ device_id, 1

.equ sys_write, 4

.equ sys_exit, 0

our program’s ‘data’ section

.section .datax: .int 4y: .int 5z: .int 0fmt: .asciz “%d + %d = %d \n”buf: .space 80len: .int 0

our program’s ‘text’ section

.section .text

_start:

# comment: assign z = x + y

movl x, %eax

addl y, %eax

movl %eax, z

‘text’ section (continued)

# comment: prepare program’s outputpushl z # arg 5pushl y # arg 4pushl x # arg 3pushl $fmt # arg 2pushl $buf # arg 1call sprintf # function-calladdl $20, %esp # discard argsmovl %eax, len # save return-value

‘text’ section (continued)

# comment: request kernel assistance

movl $sys_write, %eax

movl $device_id, %ebx

movl $buf, %ecx

movl len, %edx

int $0x80

‘text’ section (concluded)

# comment: request kernel assistance

movl $sys_exit, %eax

movl $0, %ebx

int $0x80

# comment: make label visible to linker

.global _start

.end

program translation steps

program sourcemodule

demo.s

programobject

module

assembly

demo.o

theexecutable

program

object module libraryobject module library

other object modules

linking

demo

The GNU Assembler and Linker

• With Linux you get free software tools for compiling your own computer programs

• An assembler (named ‘as’): it translates

assembly language (called the ‘source code’) into machine language (called the ‘object code’)

$ as demo.s -o demo.o• A linker (named ‘ld’): it combines ‘object’ files

with function libraries (if you know which ones)

What must programmer know?

• Needed to use CPU register-names (eax)

• Needed to know space requirements (int)

• Needed to know how stack works (pushl)

• Needed to make symbol global (for linker)

• Needed to understand how to quit (ret)

• And of course how to use system tools:

(e.g., text-editor, assembler, and linker)

Summary

• High-level programming (offers easy and speedy real-world problem-solving)

• Low-level programming (offers knowledge and power in utilizing machine capabilities)

• High-level language hides lots of details

• Low-level language reveals the workings

• High-level programs: readily ‘portable’

• Low-level programs: tied to specific CPU

In-class exercise #1

• Download the source-file for ‘demo1’, and compile it using the GNU C compiler ‘gcc’:

$ gcc demo1.c -o demo1

Website: http://cs.usfca.edu/~cruse/cs210/

• Execute this compiled applocation using:$ ./demo1


• Download the two source-files needed for our ‘demo2’ application (i.e., ‘demo2.s’ and ‘sprintf.s’), and assemble them using:

$ as demo2.s -o demo2.o

$ as sprintf.s -o sprintf.o• Link them using:

$ ld demo2.o sprintf.o -o demo2• And execute this application using: $ ./demo2


• Use your favorite text-editor (e.g., ‘vi’) to modify the ‘demo2.s’ source-file, by using different initialization-values for x and y

• Reassemble your modified ‘demo2.s’ file, and re-link it with the ‘sprintf.o’ object-file

• Run the modified ‘demo2’ application, and see if it prints out a result that is correct

lesson 01

Documents

x yprint x

raw machinelanguage

machinelanguage unders

computers work

x yprintf

x yincomprehensible

basiclet x

computing problem