Computer Systems & Programming (CS 367)
Section 002: Prof. Elizabeth White
Spring 2010
1-2
Course Goals
Previous courses: • CS 112, CS 211 - high-level programming in Java/C++• CS262 – C programming - Commonly used for “low-level”
systems programming and embedded systems
Theme of the course: strip away abstractions provided by high-level languages such as Java and let you understand what goes on “under the hood”
1-3
Course Goals cont’d
Most CS courses emphasize abstraction• Abstract data types• Asymptotic analysis
Abstractions have limits• Especially in the presence of bugs• Need to understand underlying implementations
Useful outcomes• Become more effective programmers
Able to find and eliminate bugs efficientlyAble to tune program performance
• Prepare for later “systems” classes in CS & ECECompilers, Operating Systems, Networks, Computer
Architecture
1-4
Great Reality #1
Ints are not Integers, Floats are not Reals
Examples• Is x2 ≥ 0?
Floats: Yes!Ints:
– 40000 * 40000 --> 1600000000
– 50000 * 50000 --> ??
• Is (x + y) + z = x + (y + z)?Unsigned & Signed Ints: Yes!Floats:
– (1e20 + -1e20) + 3.14 --> 3.14
– 1e20 + (-1e20 + 3.14) --> ??
1-5
Computer Arithmetic
Does not generate random values• Arithmetic operations have important mathematical properties
Cannot assume “usual” properties• Due to finiteness of representations• Integer operations satisfy “ring” properties
Commutativity, associativity, distributivity• Floating point operations satisfy “ordering” properties
Monotonicity, values of signs
Observation• Need to understand which abstractions apply in which contexts• Important issues for compiler writers and serious application
programmers
1-6
Great Reality #2
You’ve got to know assembly
Chances are, you’ll never write program in assembly• Compilers are much better & more patient than you are
Understanding assembly key to machine-level execution model• Tuning program performance
Understand optimizations done/not done by the compilerUnderstanding sources of program inefficiency
• Implementing system softwareCompiler has machine code as targetOperating systems must manage process state
• Creating / fighting malwarex86 assembly is the language of choice!
1-7
Great Reality #3
Memory Matters
Memory is not unbounded• It must be allocated and managed• Many applications are memory dominated
Memory referencing bugs especially pernicious• Effects are distant in both time and space
Memory performance is not uniform• Cache and virtual memory effects can greatly affect program
performance• Adapting program to characteristics of memory system can lead
to major speed improvements
1-8
Memory Referencing Bug Example
double fun (inti){ volatile double d[1] = {3.14}; volatile long int a[2]; a[i] = 1073741824; /* Possibly out of bounds */ return d[0];}
double fun (inti){ volatile double d[1] = {3.14}; volatile long int a[2]; a[i] = 1073741824; /* Possibly out of bounds */ return d[0];}
fun(0) –>3.14fun(1) –>3.14fun(2) –>3.1399998664856fun(3) –>2.00000061035156fun(4) –>3.14, then segmentation fault
Saved State
d7 … d4
d3 … d0
a[1]
a[0]
1-9
Memory Referencing Errors
C and C++ do not provide any memory protection• Out of bounds array references
• Invalid pointer values
• Abuses of malloc/free
Can lead to nasty bugs• Whether or not bug has any effect depends on system and compiler
• Action at a distance Corrupted object logically unrelated to one being accessed Effect of bug may be first observed long after it is generated
How can I deal with this?• Program in Java, Lisp, or ML
• Understand what possible interactions may occur
• Use or develop tools to detect referencing errors
1-10
Memory Performance Example
Hierarchical memory organization
Performance depends on access patterns• Including how we step through multi-dimensional array
void copyij(int src[2048][2048],int dst[2048][2048]){inti,j;for (i= 0; i< 2048; i++) for (j = 0; j< 2048; j++) dst[i][j] = src[i][j];}
void copyij(int src[2048][2048],int dst[2048][2048]){inti,j;for (j= 0; j< 2048; j++) for (i = 0; i< 2048;i++) dst[i][j] = src[i][j];}
21 times slowerPentium 4
1-11
Great Reality #4
There’s more to performance than asymptotic complexity
Constant factors matter too!And even exact op count does not predict performance
• Easily see 10:1 performance range depending on how code written
• Must optimize at multiple levels: algorithm, data representations, procedures, and loops
Must understand system to optimize performance• How programs compiled and executed• How to measure program performance and identify
bottlenecks• How to improve performance without destroying code
modularity and generality
1-12
Great Reality #5
Computers do more than execute programs
They need to get data in and out• I/O system critical to program reliability and performance
They communicate with each other over networks• Many system-level issues arise in the presence of
networksConcurrent operations by autonomous processesCoping with unreliable mediaCross platform compatibilityComplex performance issues
1-13
Course Perspective
Most Systems Courses are Builder-Centric• Computer Architecture
Design pipelined processor • Operating Systems
Implement portions of operating system• Compilers
Write compiler for simple language• Networking
Implement and simulate network protocols
1-14
Course Perspective (Cont.)
This Course is Programmer-Centric• Purpose is to show how by knowing more about the underlying
system, one can be more effective as a programmer• Enable you to
Write programs that are more reliable and efficientIncorporate features that require hooks into OS
– E.g., concurrency, signal handlers
• Not just a course for dedicated hackersWe bring out the hidden hacker in everyone
• Cover material in this course that you won’t see elsewhereLinking, loading, signals If nothing else, this course will teach you how to make
effective use of debuggers such as gdb
1-15
Relationship to other courses
1-16
Course Prerequisites
You must have completed the following classes with a C or better grade:
• CS 262 – Introduction to Low-level Programming or CS 222 - Computer Programming for Engineers,
• ECE 301 – Digital Electronics
The programming assignments in CS 367 involve machine and assembly language (x 86), and programming in C
Things you should already know about C:• C data types• C Standard I/O library• Bit-level operators• Pointers & structures• Dynamic memory allocation and deallocation• How to use makefiles• Unix commands
1-17
Textbook
• Randal E. Bryant and David R. O’Hallaron, “Computer Systems: A Programmer’s
Perspective”, Prentice Hall 2003.–http://csapp.cs.cmu.edu
This book really matters for the course!–How to solve labs
–Practice problems typical of exam problems
• Optional book: a good reference on C likeBrian Kernighan and Dennis Ritchie,
–“The C Programming Language, Second Edition”, Prentice Hall, 1988
1-18
Logistics
LecturesHomework [15% of the course grade]
• Simple program in C• Written exercises
Labs [35% of the course grade]• The heart of the course• 2 or 3 weeks each• Provides an in-depth understanding of an aspect of
systems• Programming and measurement
You can work in groups of up to size 2 on homework and labs
1-19
Logistics
More about Labs • All programs must be submitted and tested on zeus.ite.gmu.edu• You need to obtain an IT&E Unix cluster account.• See the syllabus for further information.• Even if it works on your laptop or PC, it doesn’t count unless it runs
on the IT&E cluster.Making Deadlines
• Assignments are due at the stated time and day.• Some (not all) assignments can be turned in late with a penalty for
each late day.
Midterm and Final exams [Each is 25% of the grade]Test your understanding of concepts & mathematical principlesThere are NO make ups for missing an exam.
1-20
Cheating
What is cheating?• Sharing code: either by copying, retyping, looking at, or
supplying a copy of a file.• Coaching: helping your friend to write a lab, line by line.• Copying code from previous course or from elsewhere
on WWW
What is NOT cheating?• Explaining how to use systems or tools.• Helping others with high-level design issues.
Penalty for cheating:• Removal from course with failing grade.
1-21
Getting Help
Class Web Page• http://cs.gmu.edu/~white/cs367• Copies of lectures, homework, labs, handouts, suggested
readings
Blackboard• http://courses.gmu.edu• Discussion board monitored by our UTAs• Homework and exam grades posted here• Up-to-date information on the course
Cancellation noticesExtensions on assignmentsChanges to assignmentsSolutions to homework
1-22
Course Outline
Week 1: • Overview of Computer Systems (Ch 1)
Week 2, 3: • Representing & Manipulating Information (Ch 2)
Weeks 4 - 9: • Machine-level Representation of Programs (Ch 3)
Week 10: • Linking (Ch 7)
Weeks 11 - 12: • Exceptional Control Flow (Ch 8)
Week 13:• Memory (Ch 10)
1-23
Lab Rationale
• Each lab has a well-defined goal such as solving a puzzle or winning a contest.
• Doing a lab should result in new skills and concepts.
• We try to use competition in a fun and healthy way.
• Set a reasonable threshold for full credit.• Post intermediate results (anonymized) on
Web page for glory!
1-24
Machine-level Representation of Data and Programs Topics
• Bits operations, arithmetic, assembly language programs, representation of C control and data structures
• Includes aspects of of architecture and compilers
Assignments• Homeworks including a datalab: Manipulating bits• Lab 1 (bomblab): Defusing a binary bomb• Lab 2 (buflab): Hacking a buffer bomb
1-25
Linking and Exceptional Control Flow
Topics• Object files, static & dynamic linking, libraries, loading
• Hardware exceptions, processes, process control, Unix signals, nonlocal jumps
• Includes aspects of compilers, OS, and architecture
Assignments• Lab 3 (tshlab): Writing your own shell with job control
1-26
Memory
Topics• Virtual memory, address translation, dynamic
storage allocation• Includes aspects of architecture and OS