d u k e s y s t e m s cps 210 unix and beyond jeff chase duke university chase/cps210

50
D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University http://www.cs.duke.edu/~chase/cps210

Upload: lynette-maryann-chandler

Post on 16-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

D u k e S y s t e m s

CPS 210Unix and Beyond

Jeff ChaseDuke University

http://www.cs.duke.edu/~chase/cps210

Page 2: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

“Just make it”

• To get started on heap manager, download the files and type “make”.

– Provides a script to build the heap manager test programs on Linux or MacOS.

• This lab is just a taste of system programming in C.

• The classic text is CS:APP.

• Also see PDF “What every computer systems student should know about computers” on the course website.

• You may think of it as notes from CS:APP. It covers background from Computer Architecture and also some material for this class.

http://csapp.cs.cmu.edua classic

Page 3: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

64 bytes: 3 waysp + 0x0

0x1f

0x0

0x1f

0x1f

0x0

char p[]char *p

int p[]int* p

p

char* p[]char** p

Pointers (addresses) are 8 bytes on a 64-bit machine.

Page 4: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Alignmentp + 0x0

0x1f

0x0

0x1f

0x1f

0x0

char p[]char *p

int p[]int* p

p

char* p[]char** p

The machine requires that an n-byte value is aligned on an n-byte boundary. n = 2i

XX

X

Page 5: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Heap allocation

Allocated heap blocks for structs or objects.

Align!

A contiguous chunk of memory obtained from

OS kernel.E.g., with Unix sbrk()

system call.

A runtime library obtains the block and manages it as a

“heap” for use by the programming language environment, to store

dynamic objects.

E.g., with Unix malloc and free library calls.

Page 6: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Variable PartitioningVariable partitioning is the strategy of parking differently sized carsalong a street with no marked parking space dividers.

Wasted spaceexternal fragmentation

2

3

1

Page 7: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Alternative: block maps

map

The storage in a heap block is contiguous in the VAS. C and

other PL environments require this.

That complicates the heap manager because the heap

blocks may be different sizes.

Idea: use a level of indirection through a map to assemble a

storage object from “scraps” of storage in different locations.

The “scraps” can be fixed-size slots: that makes allocation

easy because they are interchangeable.

Example: page tables that implement a VAS.

Page 8: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Indirection

Page 9: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Fixed Partitioning

Wasted spaceinternal fragmentation

Page 10: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Post-note

• We took much of the class talking about some general issues for naming, illustrated in Unix.

• Block maps and other indexed maps are common structure to implement “machine” name spaces:– sequences of logical blocks, e.g., virtual address spaces, files

– process IDs, etc.

– For sparse block spaces we may use a tree hierarchy of block maps (e.g., inode maps or 2-level page tables, later).

– Storage system software is full of these maps.

• Symbolic name spaces use different kinds of maps.– They are sparse and require matching more expensive.

– Trees of maps create nested namespaces, e.g., the file tree.

Page 11: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Files: hierarchical name spaceroot directory

mount point

user home directory

external media volume or network storage

applications etc.

Page 12: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

File I/O

char buf[BUFSIZE];int fd;

if ((fd = open(“../zot”, O_TRUNC | O_RDWR) == -1) {perror(“open failed”);exit(1);

}while(read(0, buf, BUFSIZE)) {

if (write(fd, buf, BUFSIZE) != BUFSIZE) {perror(“write failed”);exit(1);

}}

Pathnames are translated through the directory tree, starting at the root directory or current directory.

Every system call should check for errors and handle appropriately.

File grows as process writes to it system must allocate space dynamically.

System finds the physical disk locations of the file’s logical blocks by indexing a block map (the file’s index node or “inode”).

Page 13: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

A filesystem on disk

111000100010110110111101

100110100011000100010101

001011100001100101000100

inode 0bitmap file

allocationbitmap file

blocks

0

rain: 32

hail: 48

0

wind: 18

snow: 62

once upon a time/n in a l

and far far away, lived th

inode 1root directory

fixed locations on disk

This is a toy example (Nachos).

regular file(inode)

directory blocks

file blocks

Page 14: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Names and layers

notes in notebook fileUserview

Application

File System

notefile fd, byte range*

Disk Subsystem

device, block #

surface, cylinder, sector

bytes

fd

block#

Add more layers as needed.

Page 15: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Directories

0

rain: 32

hail: 48

0

wind: 18

snow: 62

directoryinode

lblock 32

Entries or free slots are typically found by a linear scan.

Note: implementations vary. Large directories are problematic.

A creat operation must scan the directory to ensure that creates are exclusive.

There can be no duplicate names: the name mapping is a function.

Page 16: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Operations on Directories (UNIX)

• Link - make entry pointing to file• Unlink - remove entry pointing to file• Rename• Mkdir - create a directory• Rmdir - remove a directory

Page 17: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Links

usr

Lynn Marty

ln /usr/Lynn/foo barunlink foofoo

creat foo

ln -s /usr/Marty/bar bar

unlink bar

creat bar

bar

Page 18: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Unix File Naming (Hard Links)

0

rain: 32

hail: 48

0

wind: 18

sleet: 48

inode 48

inode link count = 2

directory A directory B

A Unix file may have multiple names.

Each directory entry naming the file is called a hard link.

Each inode contains a reference count showing how many hard links name it.

Illustrates: garbage collection by reference counting.

link system calllink (existing name, new name)create a new name for an existing fileincrement inode link count

unlink system call (“remove”)unlink(name)destroy directory entrydecrement inode link countif count == 0 and file is not in active usefree blocks (recursively) and on-disk inode

Page 19: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Unix Symbolic (Soft) LinksA soft link is a file containing a pathname of some

other file.

0

rain: 32

hail: 48

inode 48

inode link count = 1

directory A

0

wind: 18

sleet: 67

directory B

../A/hail/0

inode 67

The target of the link may beremoved at any time, leavinga dangling reference.

How should the kernel handle recursive soft links?

symlink system callsymlink (existing name, new name)allocate a new file (inode) with type symlinkinitialize file contents with existing namecreate directory entry for new file with new name

Page 20: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Concepts

• Reference counting and reclamation• Redirection/indirection• Dangling reference• Binding time (create time vs. resolve time)• Referential integrity

Page 21: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Processes and the kernel

data dataPrograms

run asindependent processes.

Protected system calls

...and upcalls (e.g., signals)

Protected OS kernel

mediates access to

shared resources.

Threads enter the kernel for

OS services.

Each process has a private

virtual address space and one

thread.

The kernel is a separate component/context with enforced modularity.The kernel syscall interface supports processes, files, pipes, and signals.

Page 22: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

GS4. Layered systems

Garlan and Shaw, An Introduction to Software Architecture, 1994.

Page 23: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Processes: A Closer Look

+ +user ID

process IDparent PIDsibling links

children

virtual address space process descriptor (PCB)

resources

thread

stack

Each process has a thread bound to the VAS.

The thread has a stack addressable through the

VAS.

The kernel can suspend/restart the thread wherever and whenever it

wants.

The OS maintains some state for each

process in the kernel’s internal

data structures: a file descriptor table, links to maintain the process tree, and a place to store the

exit status.

The address space is a private name space for a set of memory

segments used by the process.

The kernel must initialize the process

memory for the program to run.

Page 24: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

0x0

0x7fffffff

Static data

Dynamic data(heap/BSS)

Text(code)

Stack

ReservedVAS example (32-bit)

• An addressable array of bytes…

• Containing every instruction the process thread can execute…

• And every piece of data those instructions can read/write…

– i.e., read/write == load/store

• Partitioned into logical segments with distinct purpose and use.

• Every memory reference by a thread is interpreted in its VAS context.

– Resolve to a location in machine memory

• A given address in different VAS may resolve to different locations.

Page 25: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

A Peek Inside a Running Program

0

high

code library

your data

heap

registers

CPU

R0

Rn

PC

“memory”

x

x

your program

common runtime

stack

address space(virtual or physical)

SP

y

y

Page 26: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Unix File Descriptors Illustrateduser space

pipe

file

socketprocess filedescriptor

table

kernel

open file table tty

Disclaimer: this drawing is oversimplified

.

Processes may share open files (“objects”), but the binding of file descriptors to objects is specific

to each process.e.g., see the dup system call

Page 27: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Networking

channelbinding

connection

endpointport

Some IPC mechanisms allow communication across a network.E.g.: sockets using Internet communication protocols (TCP/IP).Each endpoint on a node (host) has a port number.

Each node has one or more interfaces, each on at most one network.Each interface may be reachable on its network by one or more names.

E.g. an IP address and an (optional) DNS name.

node A node B

operationsadvertise (bind)listenconnect (bind)close

write/sendread/receive

Page 28: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Networking stack

Page 29: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

What is a distributed system?

"A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable." -- Leslie Lamport

Leslie Lamport

Page 30: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Example: browser

Page 31: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

GS6. Interpreter

Garlan and Shaw, An Introduction to Software Architecture, 1994.

Page 32: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Interpreter: example

An interpreter controls how a program executes and what it sees.

An interpreter can “sandbox” a program for isolation.

Page 33: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Processes in the browser

Page 34: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Threads: a familiar metaphor

Page links and back button navigate a

“stack” of pages in each tab.

Each tab has its own stack.One tab is active at any given time.

You create/destroy tabs as needed.You switch between tabs at your whim.

Similarly, each thread has a separate stack.The OS switches between threads at its whim.

One thread is active per CPU core at any given time.

1

2

3

time

Page 35: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Fork

• Child can’t be an exact copy• Is distinguished by one variable (the return value of fork)

if (fork () == 0) { /* child */ execute new program} else { /* parent */ carry on }

Page 36: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Memory and fragmentation

Page 37: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

An advantage of address spaces

Page 38: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Enforced modularity

Page 39: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Concept: garbage collection

Page 40: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Managing the pointers

Page 41: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Post-note: understand garbage collection

• Garbage collection: the language runtime system calls the underlying heap manager to free unused heap blocks automatically; the program itself does not have to do it.

– Java does it for you, but C does not.

• A heap block is “garbage” only when there are no references to the block, e.g., no pointers to the object that lives in that block.

– A reference is a stored name. The garbage collector counts these references, and marks a block as garbage when all references to it are gone. To do that it must find/identify all stored references.

• Java knows the types of all of a program’s data objects, so it can find stored references and identify their targets.

• A language that supports garbage collection may also move objects around to compact the heap to reduce fragmentation.

• Weakly typed languages like C cannot do this for you. Q: can a file system garbage collect or compact stored data on disk?

Page 42: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Post-note

• Next slide gives more detail on fork/exit.

• We will discuss kernel protection and kernel entry and exit more later.

Page 43: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Mode Changes for Fork/Exit

• Syscall traps and “returns” are not always paired.• Fork “returns” (to child) from a trap that “never

happened”

• Exit system call trap never returns

• System may switch processes between trap and return

Forkcall

Forkentry to

user space

Exit call

Forkreturn

Wait call

Waitreturnparent

child

transition from user to kernel mode (callsys)

transition from kernel to user mode (retsys)

Exec enters the child bydoctoring up a saved user context to “return” through.

Page 44: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Example: System Call Traps

• Programs in C, C++, etc. invoke system calls by linking to a standard library of procedures written in assembly language.– the library defines a stub or wrapper routine for each

syscall

– stub executes a special trap instruction (e.g., chmk or callsys or int)

– syscall arguments/results passed in registers or user stackread() in Unix libc.a Alpha library (executes in user mode):

#define SYSCALL_READ 27 # op ID for a read system call

move arg0…argn, a0…an # syscall args in registers A0..ANmove SYSCALL_READ, v0 # syscall dispatch index in V0callsys # kernel trapmove r1, _errno # errno = return statusreturn

Alpha CPU architecture

Page 45: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Representing a File On Disk

logicalblock 0

logicalblock 1

logicalblock 2

once upon a time/nin a l

and far far away,/nlived t

he wise and sagewizard.

physical block pointers in the block map are sector IDs or physical block numbers

file attributes: may include owner, access control list, time of create/modify/access, etc.

block mapIndex by logical block number

“inode”

Page 46: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Post-note

• The following slides were presented in the next class (on Android) as intro to motivate Android.

• Android keeps the Unix (Linux) kernel, but replaces the entire application framework.

– Shell is gone. App execution is controlled by trusted system-wide server process, which is part of the system TCB.

– Pipes are gone. Apps interact through system events (intents) and service bindings (binder RPC).

– There is only one user, but each app has its own userID.

– Each app has at most one instance, with its private files.

– Terminals are gone: user opens screens (activities) to interact with apps. The system keeps an activity stack with a “back” button.

• foreground and background activities?

– System launches app components and reclaims them at suitable times. They don’t “exit”.

Page 47: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Unix, looking backward: UI+IPC

• Conceived around keystrokes and byte streams– User-visible environment is centered on a text-based

command shell.

• Limited view of how programs interact– files: byte streams in a shared name space

– pipes: byte streams between pairs of sibling processes

Page 48: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Unix, looking backward: upcalls

• Limited view of how programs interact with the OS.

– The kernel directs control flow into user process at a fixed entry point: e.g., entry for exec() is _crt0 or “main”.

– Process may also register a signal handlers for events relating to the process, (generally) signalled by the kernel.

– Process lives until it exits voluntarily or fails

• “receives an unhandled signal that is fatal by default”.

data data

Protected system calls

...and upcalls (e.g., signals)

Page 49: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

X Windows (1985)

Big change: GUI.1.Windows2.Window server3.App events4.Widget toolkit

Page 50: D u k e S y s t e m s CPS 210 Unix and Beyond Jeff Chase Duke University chase/cps210

Unix, looking backward: security

• Presumes multiple users sharing a machine.

• Each user has a userID.– UserID owns all files created by all programs user runs.

– Any program can access any file owned by userID.

• Each user trusts all programs it chooses to run.– We “deputize” every program.

– Some deputies get confused.

– Result: decades of confused deputy security problems.

• Contrary view: give programs the privileges they need, and nothing more.– Principle of Least Privilege