lesson 01
DESCRIPTION
ghghTRANSCRIPT
What is Assembly Language?
Introduction to the GNU/Linux assembler and linker
for Intel Pentium processors
High-Level Language
• Most programming nowdays is done using so-called “high-level” languages (such as
FORTRAN, BASIC, COBOL, PASCAL, C, C++, JAVA, SCHEME, Lisp, ADA, etc.)
• These languages deliberately “hide” from a programmer many details concerning HOW his problem actually will be solved by the underlying computing machinery
The BASIC language
• Some languages allow programmers to forget about the computer completely!
• The language can express a computing problem with a few words of English, plus formulas familiar from high-school algebra
• EXAMPLE PROBLEM: Compute 4 plus 5
The example in BASIC
1 LET X = 4
2 LET Y = 5
3 LET Z = X + Y
4 PRINT X, “+”, Y, “=“, Z
5 END
Output: 4 + 5 = 9
The C language
• Other high-level languages do require a small amount of awareness by the program-author of how a computation is going to be processed
• For example, that: - the main program will get “linked” with a “library” of other special-purpose subroutines- instructions and data will get placed into separate sections of the machine’s memory - variables and constants get treated differently- data items have specific space requirements
Same example: rewritten in C
#include <stdio.h> // needed for printf()
int x = 4, y = 5; // initialized variablesint z; // unitialized variable
int main(){
z = x + y;printf( “%d + %d = %d \n”, x, y, z );
}
“ends” versus “means”
• Key point: high-level languages let programmers focus attention on the problem to be solved, and not spend effort thinking about details of “how” a particular piece of electrical machiney is going to carry out the pieces of a desired computation
• Key benefit: their problem gets solved sooner (because their program can be written faster)
• Programmers don’t have to know very much about how a digital computer actually works
computer scientist vs. programmer
• But computer scientists DO want to know how computers actually work:-- so we can fix computers if they break-- so we can use the optimum algorithm-- so we can predict computer behavior -- so we can devise faster computers -- so we can build cheaper computers-- so we can pick one suited to a problem
A machine’s own language
• For understanding how computers work, we need familiarity with the computer’s own language (called “machine language”)
• It’s LOW-LEVEL language (very detailed)
• It is specific to a machine’s “architecture”
• It is a language “spoken” using voltages
• Humans represent it with zeros and ones
Example of machine-language
Here’s what a program-fragment looks like:
10100001 10111100 10010011 0000010000001000 00000011 00000101 11000000 10010011 00000100 00001000 1010001111000000 10010100 00000100 00001000
It means: z = x + y;
Incomprehensible?
• Though possible, it is extremely difficult, tedious (and error-prone) for humans to read and write “raw” machine-language
• When unavoidable, a special notation can help (called hexadecimal representation):
A1 BC 93 04 08 03 05 C0 93 04 08 A3 C0 94 04 08
• But still this looks rather meaningless!
Hence: assembly language
• There are two key ideas:
-- mnemonic opcodes: we employ abbreviations of English language words to denote operations
-- symbolic addresses: we invent “meaningful” names for memory storage locations we need
• These make machine-language understandable to humans – if they know their machine’s design
• Let’s see our example-program, rewritten using actual “assembly language” for Intel’s Pentium
Simplified Block Diagram
CentralProcessing
Unit
MainMemory
I/Odevice
I/Odevice
I/Odevice
I/Odevice
system bus
Pentium’s visible “registers”
• Four general-purpose registers:eax, ebx, ecx, edx
• Four memory-addressing registers:esp, ebp, esi, edi
• Six memory-segment registers:cs, ds, es, fs, gs, ss
• An instruction-pointer and a flags register:eip, eflags
The “Fetch-Execute” Cycle
ESP
EIPProgram
Instructions(TEXT)
ProgramVariables(DATA)
TemporaryStorage(STACK)
main memory
central processor
EAXEAXEAXEAX
the system bus
Define symbolic constants
.equ device_id, 1
.equ sys_write, 4
.equ sys_exit, 0
our program’s ‘data’ section
.section .datax: .int 4y: .int 5z: .int 0fmt: .asciz “%d + %d = %d \n”buf: .space 80len: .int 0
our program’s ‘text’ section
.section .text
_start:
# comment: assign z = x + y
movl x, %eax
addl y, %eax
movl %eax, z
‘text’ section (continued)
# comment: prepare program’s outputpushl z # arg 5pushl y # arg 4pushl x # arg 3pushl $fmt # arg 2pushl $buf # arg 1call sprintf # function-calladdl $20, %esp # discard argsmovl %eax, len # save return-value
‘text’ section (continued)
# comment: request kernel assistance
movl $sys_write, %eax
movl $device_id, %ebx
movl $buf, %ecx
movl len, %edx
int $0x80
‘text’ section (concluded)
# comment: request kernel assistance
movl $sys_exit, %eax
movl $0, %ebx
int $0x80
# comment: make label visible to linker
.global _start
.end
program translation steps
program sourcemodule
demo.s
programobject
module
assembly
demo.o
theexecutable
program
object module libraryobject module library
other object modules
linking
demo
The GNU Assembler and Linker
• With Linux you get free software tools for compiling your own computer programs
• An assembler (named ‘as’): it translates
assembly language (called the ‘source code’) into machine language (called the ‘object code’)
$ as demo.s -o demo.o• A linker (named ‘ld’): it combines ‘object’ files
with function libraries (if you know which ones)
What must programmer know?
• Needed to use CPU register-names (eax)
• Needed to know space requirements (int)
• Needed to know how stack works (pushl)
• Needed to make symbol global (for linker)
• Needed to understand how to quit (ret)
• And of course how to use system tools:
(e.g., text-editor, assembler, and linker)
Summary
• High-level programming (offers easy and speedy real-world problem-solving)
• Low-level programming (offers knowledge and power in utilizing machine capabilities)
• High-level language hides lots of details
• Low-level language reveals the workings
• High-level programs: readily ‘portable’
• Low-level programs: tied to specific CPU
In-class exercise #1
• Download the source-file for ‘demo1’, and compile it using the GNU C compiler ‘gcc’:
$ gcc demo1.c -o demo1
Website: http://cs.usfca.edu/~cruse/cs210/
• Execute this compiled applocation using:$ ./demo1
In-class exercise #2
• Download the two source-files needed for our ‘demo2’ application (i.e., ‘demo2.s’ and ‘sprintf.s’), and assemble them using:
$ as demo2.s -o demo2.o
$ as sprintf.s -o sprintf.o• Link them using:
$ ld demo2.o sprintf.o -o demo2• And execute this application using: $ ./demo2
In-class exercise #3
• Use your favorite text-editor (e.g., ‘vi’) to modify the ‘demo2.s’ source-file, by using different initialization-values for x and y
• Reassemble your modified ‘demo2.s’ file, and re-link it with the ‘sprintf.o’ object-file
• Run the modified ‘demo2’ application, and see if it prints out a result that is correct