cs 107 midterm notes

12/4/2009 - the remaining 3 languages; cs107 wrapup1. what is java?

a. Your program is like a mouse running in the field.

b. Compile time typing sets some boundaries. And its in compile time, which is before runtime! You are constraining the future!c. C has full compile time system and nothing at runtime.

d. Python has full runtime and nothing at compile time (no compile)

e. Java has both. Compile time and runtime. Every heap object is still tagged.i. Having runtime typing prevents writing malware in Java.

ii. You cant just lie and say that something is a string that youre overwriting. Java checks.f. Java --> bytecode. i. Not elf. julies well known fixation with the elf format

ii. Its portable between OS. which is why ms hates it so much

iii. Old: interpreter runs bytecode. Has a big while loop and switch statement. 5-10x slowdown.iv. Modern: just in time compiler (jit) --> native code. hotspot - gpl open source from sun. hotspot does its optimizations at runtime. It usually starts in interpreter mode, observes what is running the most, and optimizes that function, and swaps that in for the code. This is the future of code optimization.g. Firefox has a JIT also for javascript. But its a lot easier in compiler time checked languageh. Startup time is bad - has to start up everything. Tons of memory used. Hotspot has a couple copies of the code. But later on, itll run fairly fast.2. pros / cons of compile time typinga. advantages

i. detects errors

ii. better performance because decision isnt deferred to runtime (ie, its different if a+b is an int or string)

iii. better tool refactoring and auto-complete.

iv. +/- readable / verbose.

1. but python can be beautiful because there isnt a lot of extra typing stuff distracting you from codez

v. allows better jit compiling

b. disads

i. extra stuff to key in, more verbose

ii. may be hard to express some ideas in type system, though they will actually work at runtime. Maintaining type info can get in the way.

c. Demo: javabat.com

3. pros/cons of dynamic typing

a. advs

i. less to type in / less to get in the way

ii. lang not limited to what can be expressed within static typing

1. python has lots of features

2. code is short; defer to runtime is simple to implement.

b. disads

i. hard to read bc type info not there

1. you might find yourself adding it in the variable names

2. type info can be useful to reader

ii. worse performance bc defer decisions

iii. worse compile time error detection

1. compensate with unit tests

iv. worse compile time tool refactoring, autocomplete.

4. language choice preceptsa. working source code is high mass, high cost.

b. legacy. Google does a ton of things in c++ because they did it in c++ originally and it would be a big pain to change.c. Therefore, avoid building your system on top of locked-in, proprietary infrastructure.

i. Once your system develops high mass, you are screwed.

d. Precept 2: engineers can get heated about language choice. i. you know the Stockholm syndrome? This explains c to me. ii. Important meme: bikeshed painting principle

1. if you talk about geopolitics, people wont talk.

2. if you project a picture of a bikeshed, because its trivial, everyone can form an opinion.

iii. Team idea: shutting up skills. Consider remaining silent. Whoevers implementing should do it however they want. Only intervene if their choice is sooooo bad.

5. three lang choices. a. Language tools today are fantastic. features = lang features + libs

b. c/c++

i. fast. Small memory use. Low dependency -- if youre programming for a $12chip, you dont want to have to install java on it.

ii. legacy

iii. features are pretty weak

iv. fit: small stuff good. Performance sensitive. Not Big, complex stuff with lots of people. v. c++0x project -- making c++ catch up

c. Java

i. Static typing, good performance because of jit, lots of features. The lang features are small. The libraries are probably the best of any language. C++ has the horrible .h files.ii. Fit: large or complex project, team project, tolerate verbose code.

d. python i. dynamic typing, worst performance, large features and flexibility and short codeii. Small projects, where simplicity shines. If youre just doing it in 2 or 3 pages at once, you dont need to add type info. Feels quick.

iii. Bad if there are lots of files and classes and people on one project.

6. things worthwhile in 107

a. gdb / bombb. vim

c. remote programming / reposi. Im going to datamine the repos. There are interesting commit messages adams are done. No, now Im done. No, now Im really done

d. labs pointed out good pitfallse. testingf. data type reps in memory / bit repsg. making my code faster

h. stack / heap. Digging around and getting the args.i. Computational thinking. how did the prog make you feel?

j. Lab team

k. Optimizatn

l. Pointer arithmetic and using memory blobsm. Lets of diagrams in lectures

n. Detail given to assn descriptions

o. Buggy c code / code that works even despite bugsp. linux

7. advice to next years 107 class

a. code early; code oftenb. read assn descriptions and the header files and such.

c. Dont assume compiling means working. Compile --> not much of a milestone.d. Be ok wit deleting code.

e. Hg is your friend f. Learn unix

g. Write your own tests

h. Doing the reading in B+O

i. Know the pointers. Theyre important.j. Talk with each other

k. Learn your editor

l. Dont be shy about using cs107 email.m. Valgrind -- teach optimization early

n. Code before thinking

8. logisticsa. assn7 due tonight. You can use a late day or two though

b. grading argh 4 and 5 done. Half of 6 done. All of assn1/2 reduxes done. c. Final is in this room on Friday at 830am.

d. Sample final is good. Well add a light level of python.

9. classes following cs107

a. cs110. obvious choice. Next systems core class. More efficiency, memory higherarchy, caching, performance. Scalability. If you are taking cs110 eventually, take it asap because it follows from cs107 because otherwise youll have to relearnb. cs108. awesome elective. Intense programming. Higher level. Design issues. App design, gui dev, software design patterns, group project. Java. You cant make java hard enough.c. Cs103, cs109. theyre good to take.

d. The new major is flexible, so you can find a track that lets you avoid what you dislike.

e. Section leading! Very good at debugging other peoples code (which is very different skill fro debugging your own code). 6th week of any quarter.11/30/2009 - python!

1. intro

a. guest lecture (Nick Parlante?)b. Nick: Am I louder than you, Julie? Julie: Yes. And probably more charismatic, too

c. Lecture 18: Python notes on Courseware (http://www.stanford.edu/class/cs107/other/nick_python/python-introduction.html) are fancy.

2. how many programming langs are there? 3!

a. C / c++. Everything must be resolved at compile time. The other langs are built on c

b. There is the python space. javascript, perl, ruby, scheme. Dynamic typing. Everything is deferred until last moment.c. Java. It has a dynamic typing system, but it tries to resolve stuff at compile time for debugging.

3. how do scripting languages (dynamic languages) evaluate expressions?a. Every var is a pointer that points to a tagged value. b. Evaluating is kind of like a big switch statement. Are these two ints? Then Ill do int addition. Are they two strings? Then Ill concatenate. Thus, each line can be used for multiple things.4. python rocks

a. misc

i. foss

ii. ms hates it.

iii. Boilerplate at the bottom

1. If __name__ == __main__: main()

iv. Can you overload stuff? Sure. Python is customizable. Just dont do it. C++ has shown us how horrible that is.

v. help(functionName) gives you info on it

vi. For line in f //reads lines in file

vii. Text = f.read() //reads lines into file.

b. interpreter fun

i. To quit, type quit. Then, it tells you to use quit() or ctrl-D.

ii. You can type in a line of code to figure out what it does.

iii. Interpreter is in read-eval-print loop. Or, you can do c. Variables, types, etc

i. Vars can be retyped. Everything is just a pointer.ii. functions, along with vars, are all in same namespace. Just pointers.

d. Import --- lets you use a model. Like of library code. i. import syssys.bla()ii. searches for modules in your path

iii. open source --> theres a module for everything. There are language features that make it easier to share code.e. Boolean expressions

i. == works. For everything. For lists. For strings.

ii. It uses or and and and not rather than the really intuitive || or && or !.

f. Python uses indentation, not curly braces. Made by Guido Van Nossen (google now)

i. Arg: good programmers would have curly braces and indentation be consistent. Having two things that programmers have to keep consistent manually is stupid. ii. Like, does anyone update their file in a .c and then jump for joy when they think Oh, now I have to update my .h!? Ok, Ill stop bagging on C now

1. but he never did stop.

iii. Indentation is more visible. Lets just use that.iv. To span multiple lines, use gratuitous parens.

1. (foo(), bar())

2. works

g. Three most common python errors by new folksi. Forget to put in colon after code block

ii. Indentation is off by one.

iii. Using parens on if statements. But theyre not necessary. Itll work, but python people will make fun of you. Watch out for that.h. Drawbacks to python

i. There is no tab completion in python. Because python has no clue what an object is.

ii. Errors not caught until it executes a line of code. Nothing predictive in the code. So it cant know what code is bad until it runs the code. This means you need to have good test coverage in python code to avoid these errors.i. tips

i. in python, there is nothing to tell you filename versus content of file, or queue or string or whatnot. So you really need good variable names. Is it WORD or WORDS? ii. Test each few lines of code as you go. Print your data structures. Call sys.exit(0) to exit. Keep iterating through your code and printing your data structures to see if it works.5. strings

a. string is single quotes. Or you can use double quotes for the sake of putting single quotes in it.

b. str is name of string class. So str(2) makes 2 into 2.

c. Len - length. You can get the length of strings or arrays or anything.d. Square bracket array notation works for just about everything.

e. Strings are immutable like in java. You can make new strings if you want to. Anything that changes a string just returns a new string.

f. Str.upper()

g. Str.isalpha() -- tests every character in the string.

h. --- three quotes means it can span lines

6. list

a. a = [1,2,3]b. len

c. square bracket.

d. Lists can contain anything. List of strings and such.e. Lists dont need to be uniformly typed

f. in: tests whether an item is in an array. Returns true g. slices

i. a[1:] gets the list starting at index 1

ii. a[:2] goes up to that number

iii. a[-1] is rightmost element. A[-2] is next one in.

h. reverse() method of list

7. loops

a. python does have while and for loops and such. But the only one youll use is foreach loop

b. for VAR in LIST:

c. a = [x, y, z]

d. for letter in a: print letter

e. this works for hash tables, lists, everything. Dont bother with index.

8. functions

a. def main():

i. args = sys.argv[1:]

9. hashmap

a. curly brace delimited.b. d = {}

c. d[a] = alpha

d. d = {a: alpha, g: gamma, o: omega}e. you can always print your data structure using print d

f. d[a] returns alpha.

g. if x in dh. d.keys()

i. for key in sorted(d.keys()):

i. print k, -->, d[k]j. d.values()k. d.items() pulls out both key and value and puts them in a list of 2-tuples. Does this for better time because it doesnt have to hit the lest twice.i. Tuples are like little lists. They have length and you can pull items out o them using square brackets.

l. Deleting stuff: del d[a]10. custom sorting

a. default will do alphabetical or numerical if its ints or strings.

b. To sort, you do a function of one arg. That gives each arg a proxy value. Then, it compares using the proxy values.c. Python functions can have optional named args.

i. Sorted([aa, a], key=len)

1. pass in len function as a function pointer. Now, you get custom sorting by length.

d. def second(s):i. return s[1]

e. sorted([zb, az], key=second)

f. to sort in descending order: sorted(bla, key=second, reverse=True)

11. questionsa. do chars exist, or is everything strings?

i. No, chars dont exist. Just strings of length 1.

b. Whats the difference between # comments and ## comments and comments?i. ## comments dont exist. Its just two #s. its the single line version of comments.ii. allows multiline. And its the convention for the javadoc-style-thing that python does.11/20/2009 - notes from Zahan

11/18/2009 section

Memory Layout:Globals are in data segment, below the heap

You can only use static locals in that function, but it persists

String constants in data segment too. Above the other stuff?

You dont get more than one copy.

If you write to a string constant? Crash.

Code is in text segment, below the data segment.

Library functions is between heap and stack with symbols and links to dynamic libraries.

Functions are at different places in different runs.

Writing to functions crashes.

Stack starts at the top

Where does your heap start? Make one call to malloc.

How much can your heap allocate? As much as your OS gives to you.

Malloc

If you underrun, usually not that much happens

If you overrun, you pwn your memory

Free nonheap pointer, segfault

Free stack ptr: invalid pointer check.

Realloc non-heap ptr: same.

Free twice: double free err msgFree ptr to middle of heap block: invalid

Access after freed: it zeros the first four bytes. So most of the memory is still there, except the first 4 bytes

11/16/2009 - Memory Optimization1. introa. segbus error?

i. You can fake any signal you want with kill

ii. Segbus is hard to get beause segfault is used for addresses that appear to be valid but are outside your segment.

iii. Bus error usually comes from alignment access (odd address), but ia32 lets you get those addresses.

iv. So, you wont actually ever get a real segbus error. Just emulate it with killb. Use a vector?

i. Sure. But it might not help you because cvector is a pain to use.c. Not too many lines of code, but you need to know what happens at compile time, whats available at runtime, what the stack looks like, etc. d. Can we call malloc before program crashes?

i. You want to do minimal stuff once the program crashes. That might fail because of the crash.

ii. So you should already know symbols, name, info, etc.

iii. Just look through backtrace and be done.

iv. Do as much as possible at the init phase. The only thing you cant do is figure out what symbols are on the stack.

v. You dont even want the function to complete. You want to troll the stack and then exit.

e. What happened if someone wrote ontop of our heap data for the crash reporter? Do we have to account for that?

i. No, you cant account for that.

f. Can we use the stack after crash?

i. Light use is cool.

ii. Dont use big system library stuff that touches a lot of things like malloc and free.

iii. We hope that the corruption has not touched the stack.

iv. But printing a line is good. Decomp is fine.

g. How do we examine the registers?

i. Signal handler shows you how to get to eip, which is the one you need.

2. Memory + Memory Hierarchya. Most programs are not cpu bound -- that have tons of numerical analysis and doesnt even need a lot of data. But except in that case, theres probably a lot of downtime just waiting for memory.

b. Main memory - cpu registries. Connection, through the bus, to ram. Bus traffic has profound impact on performance bc cpu runs at ~3GHz. Memory runs ~800MHz. c. Memory Hierarchy

i. Registers

ii. On chip L1 Cache (SRam). Holds cache lines retrieved from the l2 cache. 1 to 2 cycle. Not shared between processors. write through usually.iii. Off-chap L2 cache (sram). Holds cache lines retrieved from memory. Maybe 10 cycles away. Usually shared between processors. write back usually.iv. L3: Main memory (Dram) fairly slow - about 100 cycles to get memory across the bus and to the chip. But its cheap in terms of dollars. v. L4: Local secondary storage (hard disks) holds disk block retrieed from local disks

vi. L5: remote secondary storage. Distributed file systems, web servers.

d. The introduction of L1 and L2 cache (some machines have 3 caches) was because of the growing gap between cpu and bus speeds. Cpu and bus used to be about the same speed.

e. core memory -- from magnetic storage from big room-sized computers. Ie, core dump. Now, there is no magnetic core. Awwww.3. caching

a. two forms of locality in terms of access to memory in most progs:i. temporal locality. you use one variable a lot in one place, and you dont use it much in other places. Ie, local vars in a function.

ii. Spatial locality: youre likely to use memory thats around one piece of memory at around the same time. Ie, looking at a string, youll probably look at the entire string at once.

1. this means that the heap is slower because it cant be hashed as well. Ie, arrays versus linked-list.

2. so, sometimes, with big programs, youll ask malloc for a big block of memory, and then youll control it yourself so that you can force the locality.

b. the cache is like your desk. You can only have certain things on your desk at once. If you go over to the bookshelf to get another textbook, you have to clear something else off.c. cache hit = what you want is already what you need in the cache.

d. cache miss = you need to grab the data from ram.

e. If we have 97% hit rate (each of which is 1 cycle) and others are 100 cycles, then 0.97*1 + 100*0.03, then access time is about 4 cycles on average. If we improve hit rate to 99, avg access is about 2 cycles. Typically, cache stats are discussed in miss rate.f. Figuring out which part of memory goes into which cache: its a very simple mod relation. If we have 4 blocks of cache, we put all memory in the RAMBLOCK % 4 == 0 area to CACHEBLOCK 0. i. Easy to implement

ii. Tends to work fairly well with access patterns of most progs.

iii. There are fancier forms, but eh. g. Write policy? When you look in ram, youll always try to search for memory in its corresponding cache block first. So you wont use memory i. When you write to cache, you could write through (write through cache) which means you would write to ram as you update the cache. This can be efficient because you dont need the ram immediately, so its fine that its slow. ii. Or, you could have write back cache, where you copy cache to ram when flushed.

4. Virtual memory:

a. you have 4GB addressable, but not 4GB of ram (usually). Basically, RAM is cached to HDD.

b. Each process has its own address space. Its own virtual addresses.

c. Other processes have similar (or exactly the same) virtual addresses.

d. Map, called the page table, that maps virtual addresses to physical addresses (which corresponds to a place in DRAM chip). e. Things go off and on DRAM chips in pages. f. If not resident in memory -- if youre not using memory -- then theyre resident on disk (in swapfile) rather than ram.

g. When swapping gets too high: thrashing. Disk is very slow. Thousands or millions of cycles to get a page and bring it in.

h. Avoid this: have more ram, use less ram, keep things together on one page, prefetch.

5. numbers everyone should know according to Jeff Dean, king of large distributed systems by Google

a. l1 cache 0.5nsb. l2 cache 7ns

c. mutex lock/unlock 25ns

d. main memory 100ns

e. compress1kbytes with zippy 3k ns

f. send 2k bytes over 1gbps network 20k ns

g. read 1mb sequentially frem memory 250k ns

h. reound trip within same datacenter 500k ns

i. disk seek 10m ns

j. read 1mb sequentially from disk 20m ns

k. send packet CA->Netherlands->CA 150m ns

6. etc

a. storing v caching

i. if I can recreate the data faster than I can write it down somewhere else, I wont bother storing it.

ii. compressing data: compressed data can be read from memory more quickly even though it takes time to uncompress.b. valgrind

i. valgirnd --tool=callgrind --simulate-cache=yes

1. simulates l1, not l2, cache. Its a lie when it says it knows about l2.2. Counts cache hits and cache misses and reports.

c. old programmers optimize for cpu, not memory, but memory is killer. + a lot of the tools tell you about cpu, but not memory.

i. Link ordering makes a big difference.

7. Fridaya. Software level concurrency.

8. questions

a. how do ssds compare to ram?

i. Orders of mmagnitudes faster than HDD. Stil orders of magnitude slower than ram.

b. is there a structural limit to the size of each level of cache, or would it just be really expensive to get a ton of cache?

i. Space limit

ii. You have to search the cache, and being small lets them be faster.

iii. So, consumer cache isnt too much different than high end cache

c. why hasnt ram / bus gotten faster as much as cpu? Is it just industry effect, or is it harder to optimize ram?

i. Dunno?

d. Windows page file thats greater than the current ram usage: that means that a page is in use but the program is reporting Im not using that entire page?

i. Probably difference between virtual and physical memory. Always more virtual than physical memory.

e. when you say 100 cycles to pull from memory, how are those cycles optimized?

i. It can pipeline, but 100 is the response time.

f. Why are virtual addresses 0x80 if it could be any arbitrary number?

i. Completely arbitrary

g. What does it look like when the cpu is waiting for memory? Bc I often see programs pwn the cpu to 100%.

h. does compiler need to know about the cache, or does the processor automatically deal with that?

i. what kind of miss rate in cache is typical?

i. Less than .1% ideal?

11/13/2009 - Optimization, GCC, Fancy Processors1. intro

a. use email -- ask us questions! Use forum if you need questions in public, but we want you to ask questions. \last spring, we got about 200 questions per week. This quarter, we have about 400-500 questions total.

2. optimization - dont a. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil Don Knuth

i. Do it more simple first, then figure out where you need to optimize.

b. more computing sins are computed in the name of efficiency (without necessarily achieving it) than for any other single reason - including blind stupidity - WA Wulf

c. Bottlenecks occur in surprising places, so dont try to second guess and put in a speed hack until you have proven thats where the bottleneck is - rob pike

i. Dont guess. Measure.

3. Algos + Data Structures Matter Mosta. Big O: no tool will turn an n2 algo into an n logn algo / datastructure. Optimizations will lower the coefficients, but you have to make sure to get the best big O.4. how does GCC work?

a. optimizing compilers

i. gcc -O0 no optimizations

ii. -O1: moderate things that are known to work

iii. -O2: aggressive and well documented to behave well in most cases

iv. -O3: things that GCC is experimenting with that might help but might not and might hurt.

v. More optimization --> more compile time. GCC is trying to understand code.

vi. Sometimes, the code gets bigger: GCC will split it into different cases, and optimize one case, but have to use multiple blocks of code. Maybe.

b. specific optimizations

i. these might not be generally useful. In specific cases, they will be.

ii. -f omit-frame-pointer. Doesnt use ebp. Uses everything off esp. ebp can be general purpose register. Also makes backtrace harder.

iii. -funroll-loops

iv. Lots of other optimizations. These are all

c. GCC knows the compiler

i. Superscalar - the computer can do multiple things in each cycle.d. GCC can fold constants

i. 4*8*a versus 32*a. these constants may come out of the derived expression -- ie, getting an index of a struct. Or named constants.

e. Or if theres some values that wont change within a loop, it wont have to recalculate each time. Common subexpressionf. Looks locally for low hanging fruit. Wont be able to look at the whole code.

g. It will be conservative. It cannot change the code. For buggy code, it might change, but for correct code it should work.

h. Strength reduction.

i. Change divide to multiply.

ii. Or multiply to add -- if you keep adding 4, thats easier than re-multiplying by 4 each time.

i. If you write your code normally, GCC will be able to recognize it as an idiomatic pattern. If you write esoteric code, GCC will leave it alone.

j. Code Motion

i. Theres an intel chip cycle count if you want to time fine-tuned. How many machine cycles did it take to do a qsort? These are each in millions of cycles.ii. Matrix mult2M0.38M=6x faster (strength reduction)iii. Quicksort2M(quicksort was already compiled optimized in the library)iv. Selection sort973M557M=40% reduction. v. Recursive factorial1.6M0.3M (tail end recursion - its a constant factor away). vi. Iterative factorial

0.8M0.3Mvii. Reassemble776M716M10% (for programs with mixed character, not a ton of leverage from optimizations)k. Wont optimize out function calls -- ie, it wont pull out a strlen(s) because it doesnt know that s wont change. The function call might change it! Or might change a global variable. C doesnt have a way to say that, given the same input, it will give the same output always. Ie, rand doesnt take any parameters, and it gives a new output every time.5. how to test

a. youll have code for cycle counter for when you implement malloc

b. time - very crude measure.

c. valgrind -- it doesnt just do memory usage.

d. Valgrind --tool=callgrind

i. Maps it back to source code, so you can see how many cycles were spent on each line of source code.ii. Callgrind.out.(processnumber)

iii. Callgrind_annotate --auto=yes e. Other tool: gprof

6. Processors: superscalar, ICU + EU

a. At the hardware level, code isnt linear. b. The chip can do more than one instruction at a time

c. In execution unit, 6 channels: unit dedicated to loading, to storing, two for floating point, two int units.

d. In one cycle, the ICU can delegate 6 tasks to get started.

e. Not all of these take the same amount of cycles to finish. While its still doing a floating point divide, it can do other things

f. Pipelining:

i. The part of the machinery that does floating point add, which takes 3 cycles, can always be working.

ii. The first cycle handles exp, second handles sign, third rounds.

iii. After you finish one floating point exp handling, you can start on the next even though you still havent added the first number.

iv. Latency (ie, 5 cycle latency to load or store. Issue time = 1 cycle). Time to complete.v. issue time: # cycles before start next op. If issue < latency, --> pipelined.vi. Divide: no pipelining, and takes 18 latency 18 issue.

vii. Out of order scheduling

g. You dont need to know about this. The compiler and chip worries about that.

h. instruction level parallelism

i. Sometimes, it has to speculatively execute code. It will sometimes do extra work bc it might need it if the units will otherwise be idle and unscheduled, no reason not to.

7. how to write code that takes advantage of instruction level parallelism

a. dont make it so that each instruction depends on previous instruction. You need to have both operands ready to pipeline. b. Particularly useful when massive number of calculations.

c. Compiler can optimize out associativity for integers but it CANT do that for floating points because floating points arent associative

8. how might buggy code change?

a. Uninitialized variables

b. Dangling pointers

9. you have whole proc?

a. In the model of abstraction, you do.

10. atom and arm processors are in order of execution, though.

11/11/2009 - sectionGcc Wont search current directory if you put include in brackets

Null: #define NULL (void*) 0

Macros suxxorz: have to parenthesize everything, cant do good stuff like passing in ++x, it will repeat function calls.Linker will look through library symbol table even if you dont explicitly include a header file. So you can still use qsort even if you dont include stdlib! Versus assert, which is a macro and needs to be taken care of in preprocessing because there is no symbol for it in the library symbol table.11/9/2009 - The Heap1. intro

a. next class:

i. take 110! Its lower level and systemsy. Its required.

ii. It wouldnt be terrible to take 108. its an optional mixin. Its not required. More higher level.

b. Assignment regrade

i. No late days

ii. Its posted

iii. Monday of thanksgiving break

iv. Get back 75% of the points you lost

c. Binary bombi. Woot woot

ii. GDB=r0x0rz. You can use it rather than printfs now.2. Its like laundrya. You throw it into a pile. Its unordered. You just have to search through and grab something from it.

b. Malloc and free

c. Other stuff

i. Realloc

ii. Calloc

d. Its C code. The stack is just one assembly instruction to move stuff. Malloc and free are in c libraries.

e. Relies on low level os allocator to get big chunks of memory. 1 page-- 4k, or 8k. that allocator is not appropriate for individual calls. Malloc gets big chunks of memory and divvies them up for you.f. Pages dont even have to be contiguous

g. In use list and free list

h. Sizeof is compile time operator. But Malloc knows how much space youre using on the heap!i. Malloc.usable.size() - goes into malloc and figures out the usable size that you allocated. But it isnt standard c.3. data structures

a. How to track in use and free?

i. In use will be sorted by address, and you can do log-bin search. You could also use hash.

ii. Free needs to be quickly accessed, sorted by size.

b. More likely way its tracked: Embed housekeeping into the heap

i. Every node in the heap will have info stored left of the pointer itself.

ii. Size of the node and free status will be embedded.

iii. When the free, we can just take that pointer and go back 4 bytes.

iv. But finding a new node is O(number of nodes)

c. Their storage - the payload

d. Prenode header: sizeof node and freed status.e. Implicit free list functions as an implicit linked list. When you have a free node, you can make it point to the next free node. Probably a doubly linked list. + you know that each node is at least 8 bytes. 4. optimizing

a. Our malloc probably gives you extra memory

i. If you have less than 8, it gives you 8. if you have more, it might give you a number dividisble by 4 or 8. ii. If you ask for 16, it might give you 20 if it has a 20-byte block of memory, because it cant use the extra memory.

b. Two competing interests

i. Throughput - runtime performance

ii. Density - low fragmentation. Cluster nodes with little space between them. Holes mean well use extra memory.c. Criteria

i. It must fit

ii. Do you want the first fit? That means if you have a bunch of small nodes at the start, you wont have to research them a lot.iii. Next fit? Start where you left off after the last one.

iv. Best fit? A node thats exactly the size requested. Least extra overhead.v. Do you want to split at the end?

d. Rejoining blocks? Coalescing? i. When you call free, you might track fragmentation (number of nodes and number of pages). When that gets too big, coalesce nodes together.

e. How OS knows valid region? It asks if an address is in a mapped page.

f. Knuths idea: make it a doubly linked list

i. Put a header at the end of the node also. That makes it a doubly linked list bc you can back up.

5. malloc16 and corrupting the heapa. some guy called tech support saying that malloc didnt work for small bytes, so he wrote a malloc16 function that mallocs 16 extra bytes. Turns out, he was concatenating a string onto it a malloced region

b. if you use more space than you have in a region, youre likely to corrupt the heap. Because youll get rid of the next header.

c. If you free something that isnt a pointer that was returned by malloc, it will corrupt the heap. Because it will interpret the previous 4 bytes as a header, which means it might say itll say I have 2 million bytes free!

6. memory leaks

a. gcc has tons of leaks bc it knows its run once and done so leaks dont matter.

b. Apache cant have any leaks because it gets millions of requests and never reboots.

c. Valgrind has its own version of the heap. It does a lot more work.

7. realloc

a. is there already extra room at the end? Cool.

b. Is there an adjacent free node? Cool. Detach it from the free list and eat it up. c. Anything else? Malloc a new region, copy over memory, free old block.8. malloc overhead?a. A few global variables.

b. Maybe a free list or two (ie, a segregated fit free list - here are 12B free nodes, here are 16-32B nodes,11/6/2009 - Assembly wrapup, Make, Preprocessor, Linker1. intro

a. well regrade one of your assignment 1,2,3. you can rework it, and youll get back 75% of the points.

b. Well try to get assn3 back this week2. register savinga. caller saved: eax, ecx, edx

b. callee saved: esi, edi, ebx

3. what about context switching within OS?

a. Freeze dry and restore.

b. Every core has its own registers

4. how code REALLY gets compiled?

a. Makefileb. preprocessor

c. Compiling

d. Linking

5. make

a. not something that only works for c. I used make to assemble my websites!b. Idea: there are dependencies

c. On the left: target

d. On the right of the colon: depends on other files existing

e. By default, it knows that .o files probably come from associated .c files.

f. If it lacks a dependency, it will look for how to create that dependency

g. $ = variable

h. $@: the name of the target

i. Noone ever makes a new makefile. They just copy an old one and slightly edit it.

6. preprocessor

a. what is preprocessed?i. #define (constants / macros)ii. #include

iii. #ifdef

iv. #ifndefv. Removing comments

vi. String concatenation: abc def --> abcdef.vii. __LINE__ (string version of names?)

viii. White space reannangement.

b. We tend to use all uppercase for #define things.

c. #define is totally text find and replace. You give a token -- whitespace delimited -- and replace that the rest of the line.d. Gcc -E //

i. Run preprocessor and output and stop

ii. It has to add line numbers so that it can give the correct error messagese. Make sure you dont have errors in your #defines. Like having semicolons at the end

f. Macros

i. Faster / more efficient than functions. We dont need to save state, move around, shuffle registers

ii. And since its just a find and replace, you dont need to deal with type rigidness. Functions dont work for multiple types, but macros are.iii. But function call overhead is not that big.

iv. You can also do this with inline functions

v. But its also really easy to make bugs there. And its easy to do a lot of recalculate stuff vi. When writing macros, you need to put parens around every instance of x so that longer expressions dont completely fail.

vii. Whole macro needs to be parenthesized.

viii. Ie,

1. #define ABS(x) ((x) > 0 ? (x) : -(x))

ix. But theres no way to get around the reevaluation.

g. #ifndef

i. We dont want something to be defined twice. Which would be a problem bc everything includes stdio

ii. C doesnt care about redefn of prototypes

iii. C DOES care about redefn of types, which is a problem.

iv. C DOES care if you redeclare a variable, which is a problem.

v. _DEMO_H is general convention for names7. modules

a. Preprocessor runs on a per-module basis.

b. There is independent module compilation

c. It compiles each thing individually

d. That way, only the ones that change have to be recompiled.

e. Any dependencies might also need changes.

f. In the .h file is everything that is part of the public interface.

g. Extern: global. By default, its extern, so people wont say it. Functions are also extern by default. Structures are also extern.h. Static: private to this module. TOTALLY DIFFERENT from c++ static. You want to declare stuff static unless you know you want it to be used everywhere.i. Theres nothing aside from extern and static. Either global or private.

j. Dont pollute the namespace.

k. Nm : symbol table.

l. Size demo.o tells you how big each segment is.

8. linking

a. taking together two object files

b. just tidies up references between different object files

c. undefined refernecs or multiply defined references are the only errors that linker can throw.

d. Linker doesnt take whole of system library code and reduplicate it in every executable. It leaves behind breadcrumbs saying where it needs to come from. Every process that runs concurrently using the same system library functions can share those instructions in data. Ie, prinf@@glibc_2.0e. Compiling as static forces this redundancy

f. ld is link editor. collect is a part of ld. And this will tell you if multiply or undefined refernces.11/4/09 - sectionx/10wx $esp

at breakpoint, you can say up and go up a stack level.

stack smashing protection -- puts arrays far away from stored ebp. That means that an off by one error wont mess up your stack by quite as much.

Gdb: watch

Gdb: display

leave moves stack pointer up to saved ebp and then pops it

pop just pops the ebp.

limit

11/2/2009 - The Stack1. intro

a. were sorry that were bad at getting assignments back to you. Well try!

b. Midterms graded.

c. I wanted to give you all a big hug!

d. I hate giving exams

e. I learned that you all dont know IA32.

f. lab problems are like my children. I love them. And its hard for me to cut them. I really just need to find my least favoriteg. turns out there is a big correlation between people who got the floating point question in lab and the people who got the floating point question on the midterm. For the others, it was a matter of I didnt get this in lab. I dont get it any more now than I didnt get it in labh. Extra space in output shouldnt be like MINUS 40: if were going to grade 140 assignments in a timely manner -- which, apparently, we arent, but if we were -- we need autograders. And inserting extra space can screw that up. Thus, sanity check.

2. now, Im going to let you read about the switch table3. stack

a. grows top-down, starting at the middle of your ram

b. deep recursion lets stack frame grow deep.

c. Contains parameters, local vars, and housekeeping stuff related to knowing where control is going.

i. Backtrace

ii. Other scratch space -- ie, computing temporary resultsd. Fast place to use memory. push just adds something to the stack and adjusts the stack pointer. Push and pop just take one arg. They just subtract from the stack pointer.e. Convention about where it puts parameters and how to return to stack after a fn call (which means reinstating info, getting back stack context so that I know where everything is)

f. Parameters are pushed right to left. Binky(3.8) pushes 8, then 3. this is important for printf, because that way it knows where the first parameter is.i. Either push $8 push$3

ii. Or minus 8 from stack ptr or iii. Call Binkyg. Calli. Changes eip to adjust to binky

ii. Saves current value of eip -- the instruction right after the call -- and pushes it to the stack.

iii. Aka return address

h. Ebp points in the middle of the stack frame. Beneath parameters and before local vars. The sp changes a lot as we use the stack for scratch. The bp doesnt change as much. We keep both bc otherwise we would need to keep track of how much was on the stacki. Binky saves old value of ebp (push %ebp -- mains base pointer)

j. Sets ebp to be current value of sp. So, first parameter (from left) is ebp + 8, next parameter is ebp + 12, saved bp is ebp, and return address is at ebp + 4.k. When done

i. Unmake space on stack

ii. Set ebp to what its pointing to -- the saved ebp. Mov (%ebp) %ebpiii. Pop things off of the stack / add offset back to ebp

l. Nothing will be below esp bc esp is below the stack. Unless you do screwy stuffm. Something similar for main, too!!

n. Stack might make a big chunk of space rather than changing a bunch of times to make space as it goes

o. Parameters have to be consistent bc caller and callee have to know what theyre doing. But locals have more variation bc only the inside function needs to know. This means that padding has to be consistentp. where in gdb does a backtrace

q. fr 5 info frame

r. Doesnt record old stack pointer bc where the ebp points is where stack pointer was when I got here.

s. Cant return address of a stack var. bc the space is deallocated. But its dealloccated by changing the pointer, so it might just leave the contents there if noone else writes over it.

i. At Next, we had a bug like this for years: a big char buffer that was declar locally, and it all worked bc the stack grows down bc youll only use the lower stuff at the buffer.

ii. Changing to HPPA architecture makes the stack grow up. Which broke it. Bc the hot activity is at the lower indices.

iii. Valgrind isnt very good at stack, so it might not catch this either. Stack is harder to track bc everything is contiguous. Accessing past end otf array means that its hard to tell if unintentional access. 4. questionsa. how does using registers rather than memory work with state saving and such?i. Register pasasing rather than memory passing might be a bit faster, but there needs to be agreement, and libraries arent compiled that way, so it woludnt work very well.

b. How do registers work when you have a new function? Ie, what if youre using eac for something before?

i. 3 registers are reserved for the function that gets called. That means that you need to back them up before you call a function if you want them to stay the same.ii. 3 registers are reserved for the function that makes calls. That means that you need to back them up if you use them within a function, and reset them before you return.

c. How does stack work considering that other programs are using ram too? You dont know that the stack is completely yours, do you? i. Multiple stacks. One for each thread.

ii. Each stack has a maximum amount of space that it can use. In other words, a lot of space is allocated for it ahead of time.

10/28/2009 - sectionPipe: output of one command --> input of another.

Output of strings --> input of grep:

Grep Warn --count

10/26/2009 - Assembly + Control1. intro

a. midterm Friday. Still no room. Ill post to website and email you.

b. Assn 4 tomorrow night

c. No man page for open? Need to use man 2 open. 2. Working in assembly

a. ptr = &arr[3]

i. leal -24(%ebp,3,4), -%ecx(it could collapse it down to -12 with some optimization, but not without it)ii. mov %eax, -8(%ebp) 1. load the address of arr[3] and store it in ptr, which is at -8(%ebp)

b. *ptr

i. Movl -8(%ebp), %eaxii. Movl (%eax), %eax

iii. (only allowed to do one memory read per instruction, so you cant do double-dereference.

c. *ptr + 31

i. Add $0x1f, %eaxii. ($ is immediate constant)d. Num

i. Mov -4(%ebp), %ecx

e. Arr[num] = *ptr + 31

i. Movl %eax, -24(%ebp, %ecx, 4)f. What happens when arr[num] segfaults?

i. the OS says heeeyyy -- no memory for you!

ii. Calculating bad address is fine. Read/write causes error.

3. Typecast

a. Rather than mov (%eax), eax to dereference, it would movb (%eax), %al and would change offset scaling to by 1.b. If it needs to do datatype conversion, thats just one assembly instr

4. control structure

a. label (outdented; word followed by colon. In human-readable assembly, itll be )i. Loop:

ii. Incl %eax //increments eaxiii. Jmp Loop

b. Conditional jumps: first do an operation (usually a compare -- a cmp). There are condition flags. Like the carry flag, overflow flag, zero flag, etc. Records the result of these flags when you cmp. Then, j uses them. The flags are kind of like a register (not implemented the same) i. Cmp %eax, %edx //subtracts eax from edx

ii. J! //jump if edx is less than eaxiii. Je //jump if equal

iv. Jne //jump if not equal

v. Jump label/target.

vi. Jns (no sign bit set)

vii. J there are tons of them.

viii. Jz -- result is zero

c. Ie

i. Cmp a, b

ii. If a != b, jump over instr

d. Loops

i. Usually go through body and jump around a lot.

ii. Jump down to test

iii. Do stuff

iv. Jump up to instructions

v. Jump up to top

vi. --> one jump per loop and one outside.

vii. Valgrind err: conditional cmp with uninit val (if you read uninit val)

e. Switch

i. Series of cascading if/elses and lots of unconditional jumps whenevr theres a break statement.5. other switch next time: makes switch table. If mayn optns close together, you can sort of treat like index and imagine an array of targets.

6. gdba. disp

b. disass

c. p $eax

d. info reg

7. q

a. how do the registers work, given multitasking?

10/23/2009 - assembly data layout, assembly operations, alu basics1. intro

a. assn1 grading

i. email us if you havea question

ii. come to office hrs if you have lots of questions

iii. we can show you sample code.

b. Midterm

i. Practice midterm (last springs midterm) is a handout now.

ii. We might also do an ia32 question

iii. Dont take late days for assn4 because the midterm matters more. You may, though.

c. This week and next, you should read the text. Because its dense.

2. data layouta. ie,

i. disp(base, index, scale)

1. *(base+disp+index*scale)

2. -16(%ebp, 2, 4) == -8(%ebp)

3. Base -- the start of the stack frame

4. Disp -- where the variable lives on the stack

5. Index*scale -- going into the variable.

b. Void binky (int a)

i. Int b, c,

c. Assembly of binky

i. Ebp is a pointer to to the function. Base pointer.

ii. Parameters are stored on positive offsets. Parameter 1 is at +4 offset, param2 at +8 offset. Stack variables are at -4, -8, etc.

iii. Parens in assembly are like * -- dereference.

d. `void ArrayPtr() {

i. int *ptr, num, arr[10];

num = 8

e. ebp

i. ebp at 0

ii. ptr at -4

iii. num at -8

iv. arr[0] at -48, arr[1] at -44, etc. because arrays are always stored with lowest index at lowest spot in memory. That way, you can always add to an array to get subsequent indexes.v. Often, padding to wordsize of machine. So char, int, char will have char at -4, int at -8, and char at -16 even though theres just wasted space.

f. Generate assembly. i. Gcc -m32 -S demo.c1. Capital S.2. Produces demo.s that is assembly emission.

ii. Objdump (it has a man page). Can use objdump on an executable. Can see disassembly. If compiled with debug info in, you can extract assembly right next to source codeiii. Objdump -S -d

g. Looking at assembly

i. 0: push%ebpii. Number of bytes offset

h. Leai. Load effective address

ii. Like a move without an indirection.

iii. Base + disp + index*scale. Dont load that. Just put it in the register. Store it in eax, for instance. Deal with pointers.

iv. Leal offset(base, index, scale), address to store it in.v. Sometimes used to compute simple polynomials. Integer arithmetic. Faster than arithmetic logic unit for simple stuff.i. Movl

i. Value to push, Offset(base, index, scale)j. Shl

i. Does a leftshift. Bitshift left. Same sa multiplying by 2. shl 2, %eax multiplies eax by 4.ii. Imult multiplies

k. Assembly stuff always operates on itself. Stores in the last arg. I think.

l. Scale must be 1,2,4, or 8.

m. Structs

i. Struct binky {

1. int num

2. char letter;

3. int *ptr;

ii. Void Structs()

1. Struct binky *ptr, b;

iii. You can turn off padding in gcc, but if you dont, binky will be 12bytes: 4b for num, 4b for letter even though only 1b is used, and 4b for ptr.

iv. Assembly

1. movb moves a single byte.

n. Movl v mov

i. Mov infers from operand how much youll move.

ii. Movl is move long

o. you do know how o hotwire your car, but you should still use the key

i. Just because you know exactly how far in front of the struct is the data doesnt mean you should use it. Use c, not assembly.

ii. Cs110 told us that we fail. Because the old cs107 paradgm was have void*, will travel. My reputation depends on your void* usage.p. Gcc convention: eax is return value

q. Eax is return / scratch

r. Ecx is scratch

s. ebp is base pointer

t. Esp is stack pointer

u. Others are less used. If you need more scratch space, you might use others, but you have to save that stuff, so less cconvenient.v. Accessing the cache?

i. Totally opaque to the programmer. You just say get the memory.

ii. The chip knows.

3. alu - arithmetic logic unit.a. Imult (integer multiply) a, b, destb. To multiply: load into register, multiply, store. c. Sometimes wont even write to stack; will just leave in register bc it doesnt need to take it out.

d. Add a, (to) deste. Sub (you can use memory address or constant), (from) dest

f. Subl $1, dest (you can use memory aperand. Like -8(%ebp) //b--g. Shll -0x4(%ebp) //thing at ebp-4 *= 2h. Not dest

i. And src, dest

4. control structure

5. function call protocall (later)

10/21/2009 - section

Chars always converted to ints when you do mathFloat:

Sign bit (1 if positive)

8 bits of exponent. (x - 127)

23 bit Mantissa/significand: 1.xxxxxxx 1 + 1/2 + 1/4 + 1/8 + + 1/2^23(Sign * (mantissa)*2)^expBit extraction: unsigned int bits = *(unsigned int*)&f

SIGN MASK = 1 64: chance to make a new instr set. So, they moved to a risc model. AMD used old style. AMD used old style and beat Intel to marketplace, so intel sort of failed.4. what does assembly look like?

a. Storage on the chip itself: registers. In IA32, there are 8. They are very fast.i. EIP: instruction pointer. Dedicated register to instruction. Fetches instruction and then executes it.

b. On cpu: Condition codes: did the last operation end in 0? Did the last op Has lots of info about the last

c. Cache: faster than memory.

d. In memory:

i. object code.

ii. Program data

iii. Runtime stack

e. The 8 registersi. %eSOMETHING

1. the e is legacy thing -- e stands for extended. Bc it changed from 2byte to 4byte.ii. Esp: stack pointer

iii. Ebp: base pointer

iv. Esi: source index

v. Edi: destination index

vi. Eax: accumulator register. Used for arithmetic. Uses for return val of function

vii. Ecx

viii. Edx

ix. Ebx

x. They might have come from a specific thing, but theyre fairly general now.

xi. Number of registers is very cramped.

5. turning c into object code

a. make --> gcc (which isnt a compiler: its a compiler driver. It invokes other things, like cc, the c compiler. ). (cvector.c)b. C source --> asm source Using compiler. (cvector.s)c. Asm --> object code using assembler (cvector.o)

d. Object --> executable using linker(cvector-test)

e. Ie, sum function listed on slides.6. assembly characteristicsa. minimal datatypes. very little evidence of datatype

i. integer data of 1 2 or 4 bytesii. pointers are unsigned ints

iii. floating point data of 4,8,or10 bytes

iv. no explicit aggregate types such as arrays or structures. Just constructed f rom primitives laid in sequence.

b. Primitive ops

i. Performs arithmetic function on register or memory data

ii. Transfer data between memory and register

1. load data from mem into reg

2. store reg data into mem

iii. transfer control

1. unconditional jumps to/from procedures

2. conditional branches

7. moving dataa. b = 1byte, w = 2byte, l = 4bytei. word = 2 bytes is just a legacy. Because words that we use are 4 bytes.ii. l = long

iii. b = byte

b. the mov instruction - one of the most common ones.

c. General form: movx src, dst

i. Intel version reverses dst and src.

ii. Different conventions on brackets and parens.

d. Movb $65, %al

i. $ = its a constant. Thats encoded as part of the instruction itself.

ii. Take the byte 65 and put it in lower byte of

e. Immediatei. Constant data prefixed with $

ii. Number not prefixed with $ is interpreted as fixed memory address.

f. Register

i. register name prefixed with %

g. memory

i. registery enclosed in parens or fixed address. Like dereferencing a pointer. h. cant move from memory to memory. Can move from immediate to register or memory. From register to register or memory. From memory to register.8. addressing modes

a. (cisc nature)

b. Direct: fixed memory address. Global/static.

c. Indirect: register holds memory address. Pointer.

d. Base + displacement

i. Constant value or fixed address

ii. Used to access data with offset from base: struct field, locals/parameters (expressed as offset within stack), some array access and pointer arithmetic.

iii. Ie, 8(%ebp)

e. Base + scaled index + displacementi. -8(%ebp, %esi, 4)1. displacement(base, index, scale)2. like an array: base + index * 4

ii. displacement can be constant or fixed

iii. base and index must be registers

iv. scale can only be 1 2 4 8. units of char, short, long, double, and nothing else.

f. Special cases

i. -8(%ebp, %esi)

1. no scale

2. used for char array elem, struct field

ii. (%ebp, %esi, 4)

1. used for array elems, pointer arith.

9. Friday: manual translation.

10/16/2009

Zahans Notes

- Unisigned is used wrongly by programmers to ek out larger range from byte. - don't - acts weird in extreme cases

- Correctly used for bit masks, individual bit manipulation

- short, int, long - default signed

- char - no default

- when converting from smaller to larger type, tries to preserve sign of value

- -1 in two's complement notation - short - 11111111

- convert to int - it will replicate the signed bit - 11111111 11111111 -> -1 in two's complement

- converting larger type to smaller

- 255 - int- 00000000 11111111

- conv to short - 11111111 -> -1

- #define EOF -1

- when will ((ch = getc(fp)) != EOF) not work?

- We don't know whether ch will be signed, or unsigned

- value returned by getc is truncated to char and then promoted back to int for comparison

- if ch is unsigned - the rest of ch during promotion is zero-filled, 00000000 11111111, does not hit EOF

- if ch is signed - it works - -1 - rest of ch is one-filled - preserves sign

- Fix for this is not to hope that machine makes char signed by default

- just use int ch

- Fractional representation

- fixed point

- can use bits for powers of two

- x x x x y y y y

- 8 4 2 1 .5 .25 .125 .0625

- 11111111 - 15*15/16

- no negative numbers

- cannot represent exact real numbers - 1/3?

- floating point

- use bits for powers again

- rep significant number and powers separately

- 32 bit float

- MSB used for sign bit

- 8 bits for exponent -> -128 to 128

- exp bits all zeros - denormalised numbers

- exp bits all ones - special, inf, Nan, div by zero

- 23 remaining bits used for significant bits

- to rep 5 - the number is normalised down to 1.01*2^2

- makes exponent from -128 + whatever - shows up as .01 * 2^2 - 1. is implicit

- denormalised numbers - where exponent is assumed to be zero

- Next week

- Machine instructions, IA32

10/14/2009 sectionmemcpy is faster, but if theres overlap it can fail

memmv wont fail with overlap

casting primitives converts (ie, int to float). Casting pointers just reads straight bits.

10/12/2009 - binary, memory structure, data representations, bitwise operators, conversions1. binary + memory structurea. why dont we use base 10? Hard to distinguish between 10 signals esp with noise and vacuum tubes. Instead, we just have to distinguish between 2.

b. Bit: binary digitc. Cant get far with one bit, so we use sequences. 8bit = byte. Smallest addressable unit of memory.d. Memory goes from 0 to 4GB. Each part of memory has a unique address.

e. Different parts of your program go to different parts of your memory completely differently. Once you get familiar with addresses, youll learn what looks good for heap v stack. i. Your code, in binary, is read from fairly low data. Global variables (read only and read write sections) are just a little bit above that. ii. Halfway through the memory will be your stack. Stack typically grows downward (on intel architecture)iii. Heap starts low (above your globals) and grows up. Not very ordered, though.iv. Big space between stack and heap.

v. Large section above the stack thahts unmapped. The heap can jump over the stack and go up to the top. And some lower level allocator functions other than malloc can specifically use that space.f. There is no runtime tagging of addresses. If some bits store an int and you read it as a double, it wont be what you want.g. Pages (typically 4k) of allocated memory.h. Your code is the Text segment. Global data is the DATA segment. Stack is the STACK segment. If your memory is within the segment, random data. Else, seg fault.2. Char

a. Chars are 1B. 256 possible patterns.b. Most to least significant bit

c. 0-255d. Ascii is the one (there used to be another one. Endibec?). lower ascii is totally standard (only using the last 7 bits - 0-127). Upper ascii - extended is nonstandard. Stuff with accents. There was not widespread agreement. Thus, standard that you say what characterset you use. Line Feed (10). Character Return (13). We dont use Teletype Terminals. Unix puts in a 10. Old (pre Next) macs used 13. Then, unix transitioned to 10. PC uses both: CRLF. Lots of programs will try to cope with nonstandard line endings.3. Char (1), Short (2), int (4), long (4-8), long long. The only ANSI requirement is the order between these -- that ints are at least as many bytes as shorts, for instance.a. Endian. Which is the most significant byte? The least significant bit in each byte is to the right.b. Intel is Little Endian. That means that the lowest address is the smallest part of the number.c. Network order is big endian.

d. Printing a byte will always print that byte in big-endian format even though bits might not have any sane ordering.

4. Decimal

a. Decimal doesnt have a nice mapping.

b. Hex does. 0-F. Covers 1/2 byte. 0x11 = 16+1 = 17. 5. Gdb

a. List lists code

b. p chif its typed, it knows how to print it. Prints ascii val and nonascii

c. p/x chprints in hex

d. p/t chprints in binarye. x &iexamine. Prints the actual memory rather than the

f. x/4bt &iexamine 4 bytes. Display them indivduall. Dont reorder them.

g. Uses the ? whenever it sees a high order ascii char.

h. x/10i mainprints 10 instructions starting at main. Push, move, pop, call

6. bitwise ops

a. &, |, ^ (XOR), ~ (inverse)b. Only work on lower order stuff (char, short, int, long).c.

cs 107 midterm notes

Documents