advanced operating systems lecture...
TRANSCRIPT
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 1
Advanced Operating SystemsLecture notes
Clifford Neuman, Katia ObraczkaUniversity of Southern CaliforniaInformation Sciences Institute
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 2
CSci555: Advanced Operating SystemsLecture 1 - September 3, 1999
Dr. Clifford NeumanUniversity of Southern CaliforniaInformation Sciences Institute
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 3
Some things an operating system does
Y Memory ManagementY Scheduling / Resource management
Y CommunicationY Protection and Security
Y File Management - I/OY Naming
Y SynchronizationY User Interface
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 4
Progression of Operating Systems
Primary goal of a distributed system:– Sharing
Progression over past years– Dedicated machines– Batch Processing
– Time Sharing– Workstations and PC’s– Distributed Systems
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 5
Structure of Distributed Systems
Y Kernel– Basic functionality and protection
Y Application Level– What does the real work
Y Servers– Service and support functions needed by
applications– Many functions that used to be in Kernel
are now in servers.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 6
Structure of Distributed Systems
User Space
Kernel
UP
SVR
User Space
Kernel
SVR
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 7
Characteristics of a Distributed System
Y Basic characteristics:– Multiple Computers– Interconnections
– Shared State
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 8
Why Distributed Systems are Hard
Y Scale:– Numeric– Geographic– Administrative
Y Loss of control over parts of the system
Y Unreliability of Messages
Y Parts of the system down or inaccessible– Lamport: You know you have a distributed system when
the crash of a computer you have never heard of stops youfrom getting any work done.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 9
CSci555: Advanced Operating SystemsLecture 2 - September 10, 1999
Dr. Katia ObraczkaUniversity of Southern CaliforniaInformation Sciences Institute
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 10
Administration 1
Y Diagnostic exams:– Everyone should have gotten their grades
and our recommendation by e-mail.– Talk to us if you haven’t heard anything.
Y Academic Integrity Policy:– NOT A JOKE!– We’ll strictly enforce it.
– Read it and talk to us if you havequestions.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 11
Administration 2
Y Reading report #1 assigned.– Get it from the class Web page.
– Look at guidelines for writing andsubmitting report.
Y Office:– SAL 236, phone (213) 740-4777.– Information on Web page updated.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 12
Last Class
Y How to read a technical paper.
Y What is a distributed system.
Y Motivation, applications, some history.
Y More: textbook chapters 1 and 2.– Including a chart with historic facts for
more details on distributed systems history(pages 23 and 24).
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 13
Outline
Y E2E Argument [Saltzer et al.]
Y Communication Models.– Message passing.– Distributed shared memory (DSM).
– Remote procedure call (RPC) [Birrel et al.]X Light-weight RPC [Bershad et al.]
– DSM case studiesX IVY [Li et al.]X Linda [Carriero et al.]
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 14
End-to-End Argument 1
Y QUESTION: Where to place distributedsystems functions?
Y Layered system design:– Different levels of abstraction for simplicity.
– Lower layer provides service to upperlayer.
– Very well defined interfaces.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 15
E2E Argument 2
Y E2E argument paper argues that functionsshould be moved closer to the applicationthat uses them.
Y Rationale:– Some functions can only be completely and
correctly implemented with app’s knowledge.X Example: Reliable message delivery.
– App’s that don’t need certain functions shouldnot have to pay for them.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 16
E2E Argument 3
Y Counter argument:– Performance.
– Example: File transferX Reliability checks at lower layers detect
problms earlier.X Abort transfer and re-try without having to wait
till whole file is transmitted.
Y Bottom line: “spread out” functionalityacross layers.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 17
E2E Argument 4
Y Another perspective:– Why pay for something you don’t need.
X Example 1: the Internet.X Example 2: trend in kernel design - take away
from kernel as much functionality as possible.
Y We’ll revisit the e2e argumentthroughout the course.
Y Paper has more examples.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 18
Communication Models
Y Support for processes to communicateamong themselves.
Y Traditional (centralized) OS’s:– Provide local (within single machine)
communication support.– Distributed OS’s: must provide support for
communication across machineboundaries.X Over LAN or WAN.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 19
Communication ParadigmsY 2 paradigms
– Message Passing (MP)
– Distributed Shared Memory (DSM)
Y Message Passing– Processes communicate by sending
messages.
Y Distributed Shared Memory– Communication through a “virtual shared
memory”.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 20
Message PassingY Basic communication primitives:
– Send message.
– Receive message.
Y Modes of communication:– Synchronous versus asynchronous.
Y Semantics:– Reliable versus unreliable.
...Send
Sending Q
...
Receiving QReceive
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 21
Synchronous Communication
Y Blocking send: blocks until message istransmitted (out of the send queue).
Y Blocking receive: blocks until messagearrives in the receive queue.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 22
Asynchronous Communication
Y Non-blocking send: sending processcontinues as soon message is queued.
Y Blocking or non-blocking receive:– Blocking: timeout or threads.
– Non-blocking: proceeds while waiting formessage.X Message is queued upon arrival.X Process needs to poll or be interrupted.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 23
Reliability 1
Y Unreliable communication:– Aka, best effort, ie, “send and hope for the
best”.– No ACKs or retransmissions.– Application programmer must design their
own reliability mechanism.– Example: User Datagram Protocol (UDP)
X Applications using UDP either don’t need reliabilityor build their own (e.g., UNIX NFS and DNS (bothUDP and TCP))
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 24
Reliability 2Y Reliable communication:
– Different degrees of reliability.– Processes have some guarantee that
messages will be delivered.– Example: Transmission Control Protocol
(TCP)– Reliability mechanisms:
X Positive acknowledgments (ACKs).X Negative Acknowledgments (NACKs).
– Possible to build reliability atop unreliableservice (E2E argument).
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 25
Distributed Shared Memory
Y Motivated by development of shared-memory multiprocessors which doshare memory.
Y Abstraction used for sharing dataamong processes running on machinesthat do not share memory.
Y Processes think they read from andwrite to a “virtual shared memory”.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 26
DSM 2
Y Primitives: read and write.
Y OS ensures that all processes (may bein different machines) sees all updates.– Happens transparently to processes.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 27
DSM and MPY DSM is an abstraction!
– Gives programmers the flavor of a centralizedmemory system, which is a well-knownprogramming environment.
– No need to worry about communication andsynchronization.
Y But, it is implemented atop MP.– No real shared memory.
– OS takes care of required communication.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 28
Caching in DSMY For performance, DSM caches data
locally.– More efficient access (locality).
– But, must keep caches consistent.– Caching of pages in case of page-based
DSM.
Y Issues:– Page size.
– Consistency mechanism.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 29
Approaches to DSM 1
Y Hardware-based:– Multi-processor architectures with
processor-memory modules connected byhigh-speed LAN (E.g., Stanford’s DASH).
– Specialized hardware to handle reads andwrites and perform required consistencymechanisms.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 30
Approaches to DSM 2
Y Paged-based:– Example: IVY.
– DSM implemented as region of processor’svirtual memory; occupies same addressspace range for every participatingprocess.
– OS keeps DSM data consistency as part ofpage fault handling.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 31
Approaches to DSM 3
Y Library-based:– Or language-based.
– Example: Linda.– Language or language extensions.
– Compiler inserts appropriate library callswhenever processes access DSM items.
– Library calls access local data andcommunicate when necessary.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 32
Remote Procedure Call
Y Builds on MP.
Y Main idea: extend traditional (local)procedure call to perform transfer ofcontrol and data across network.
Y Easy to use: analogous to local calls.
Y But, procedure is executed by adifferent process, probably on adifferent machine.
Y Fits very well with client-server model.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 33
RPC Mechanism1. Invoke RPC.2. Calling process suspends.3. Parameters passed across network to
target machine.4. Procedure executed remotely.5. When done, results passed back to caller.6. Caller resumes execution.Is this synchronous or asynchronous?
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 34
RPC Advantages
Y Easy to use.
Y Well-known mechanism.
Y Abstract data type– Client-server model.– Server as collection of exported
procedures on some shared resource.– Example: file server.
Y Reliable.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 35
RPC Semantics 1
Y Related to delivery guarantees.
Y “Maybe call”:– Clients cannot tell for sure whether remote
procedure was executed or not due tomessage loss, server crash, etc.
– Usually not acceptable.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 36
RPC Semantics 2
Y “At-least-once” call– Remote procedure executed at least once,
but maybe more than once.– Retransmissions but no duplicate filtering.– Idempotent operations OK; e.g., reading
data that is read-only.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 37
RPC Semantics 3
Y “At-most-once” call– Most appropriate for non-idempotent
operations.– Remote procedure executed 0 or 1 time,
ie, exactly once or not at all.
– Use of retransmissions and duplicatefiltering.
– Example: Birrel et al. Implementation.X Use of pobes to check if server crashed.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 38
RPC Implementation 1
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 39
RPC Implementation 2
Y RPC runtime mechanism responsiblefor retransmissions, acknowledgments.
Y Stubs responsible for data packagingand unpackaging; also known asmarshalling and unmarshalling: puttingdata in form suitable for transmission.Example: Sun’s XDR.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 40
BindingY How to determine where server is? Which
procedure to call?– “Resource discovery” problem
X Name service: advertises servers and services.X Example: Birrel et al uses Grapevine.
Y Early versus late binding.– Early: server address and procedure name
hardcoded in client.
– Late: go to name service.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 41
RPC PerformanceY Sources of overhead
– data copying
– scheduling and context switch.
Y Light-Weight RPC– Shows that most invocations took place on a
single machine.– LW-RPC: improve RPC performance for local
case.
– Optimizes data copying and threadscheduling for local case.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 42
LW-RPC 1
Y Argument copying– RPC: 4 times (2 on call and 2 on return);
copying between kernel and user space.
– LW-RPC: common data area (A-stack)shared by client and server and used topass parameters and results; access byclient or server, one at a time.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 43
LW-RPC 2
Y A-stack avoids copying between kerneland user spaces.
Y Client and server share the same thread:less context switch (like regular calls).
A
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 44
DSM Case Studies: IVY
Y Environment:”loosely coupled”multiprocessor.– Memory is physically distributed.
– Memory mapping managers (OS kernel):X Map local memories to shared virtual space.X Local memory as cache of shared virtual
space.X Memory reference may cause page fault; page
retrieved and consistency handled.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 45
IVY
Y Issues:– Read-only versus writable data.
– Locality of reference.– Granularity (1 Kbyte page size).
X Bigger pages versus smaller pages.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 46
IVY
Y Memory coherence strategies:– Page synchronization
X InvalidationX Write broadcast
– Page ownershipX Fixed: page always owned by same processorX Dynamic
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 47
IVY Page Synchronization
Y Invalidation:– On write fault, invalidate all copies; give
faulting process write access; gets copy ofpage if not already there.
– Problem: must update page on reads.
Y Write broadcast:– On write fault, fault handler writes to all
copies.
– Expensive!
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 48
IVY Memory Coherence
Y Paper discusses approaches to memorycoherence in page-based DSM.– Centralized: single manager residing on a
single processor managing all pages.
– Distributed: multiple managers on multipleprocessors managing subset of pages.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 49
DSM Case Studies: LindaY Language-based approach to DSM.
Y Environment:– Similar to IVY, ie, loosely coupled
machines connected via fast broadcastbus.
– Instead of shared address space,processes make library calls inserted bycompiler when accessing DSM.
– Libraries access local data andcommunicate to maintain consistency.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 50
Linda
Y DSM: tuple space.
Y Basic operations:– out (data): data added to tuple space.– in (data): removes matching data from TS;
destructive.
– read (data): same as “in”, but tuple remainsin TS (non-destructive).
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 51
Linda Primitives: Examples
Y out (“P”, 5, false) : tuple (“P”, 5, false)added to TS.– “P” : name
– Other components are data values.– Implementation reported on the paper:
every node stores complete copy of TS.
– out (data) causes data to be broadcast toevery node.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 52
Linda Primitives: Examples
Y in (“P”, int I, bool b): tuple (“P”, 5, false)removed from TS.– If matching tuple found locally, local kernel
tries to delete tuple on all nodes.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 53
CSci555: Lecture 3 September 17, 1999Concurrency, Deadlock, Transactions, Time.
Dr. Katia ObraczkaUniversity of Southern CaliforniaInformation Sciences Institute
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 54
Administration
Y Reading assignment due next class.– Concerns regarding grading.
Y Tanenbaum book.– Recommended reading for background
material.
– Not required (ie, you don’t have to buy it ifyou don’t want to).
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 55
Last Class
Y E2E argument.
Y Communication Models.– MP.– DSM.
– RPC.– IVY.
·Linda.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 56
Today
Y Linda.
Y Concurrency and Synchronization [Hauseret al.]
Y Transactions [Spector et al.]
Y Distributed Deadlocks [Chandy et al.]
Y Time in Distributed Systems [Lamport andJefferson]
Y Replication [Birman and Gifford]
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 57
DSM Case Studies: Linda
Y Language-based approach to DSM.Y Environment:
– Similar to IVY, ie, loosely coupledmachines connected via fast broadcastbus.
– Instead of shared address space,processes make library calls inserted bycompiler when accessing DSM.
– Libraries access local data andcommunicate to maintain consistency.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 58
Linda
Y DSM: tuple space.
Y Basic operations:– out (data): data added to tuple space.– in (data): removes matching data from TS;
destructive.
– read (data): same as “in”, but tuple remainsin TS (non-destructive).
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 59
Linda Primitives: Examples
Y out (“P”, 5, false) : tuple (“P”, 5, false)added to TS.– “P” : name
– Other components are data values.– Implementation reported on the paper:
every node stores complete copy of TS.
– out (data) causes data to be broadcast toevery node.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 60
Linda Primitives: Examples
Y in (“P”, int I, bool b): tuple (“P”, 5, false)removed from TS.– If matching tuple found locally, local kernel
tries to delete tuple on all nodes.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 61
Concurrency Control and Synchronization
Y How to control and synchronize possiblyconflicting operations on shared data byconcurrent processes?
Y First, some terminology.– Processes.
– Threads.– Tasks.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 62
Processes
Y Text book, page 165– Processing activity associated with an
execution environment, ie, address spaceand resources (such as communication andsynchronization resources).
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 63
ThreadsY OS abstraction of an activity/task.
– Execution environment expensive to createand manage.
– Multiple threads share single executionenvironment.
– Single process may spawn multiple threads.– Maximize degree of concurrency among
related activities.
– Example: multi-threaded servers allowconcurrent processing of client requests.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 64
Other Terminology
Y Process versus task.– Process: heavy-weight unit of execution.
– Task/thread: light-weight unit of execution.– Table with other terminology page 166 of
textbook.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 65
Threads Case Study 1
Y Hauser et al.
Y Examine use of user-level threads in 2OS’s:– Xerox Parc’s Cedar (research).
– GVX (commercial version of Cedar).
Y Study dynamic thread behavior.– Number of threads.– Thread lifetime.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 66
Thread Paradigms
Y Different categories of usage:– Defer work: thread does work not vital to the
main activity.X Examples: printing a document, sending mail.
– Pumps: used in pipelining; use output of athread as input and produce output to beconsumed by another task.
– Sleepers: tasks that repeatedly wait for anevent to execute; e.g., check for networkconnectivity every x seconds.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 67
Synchronization
Y So far, how one defines/activatesconcurrent activities.
Y But how to control access to shareddata and still get work done?
Y Synchronization via:– Shared data [DSM model].– Communication [MP model].
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 68
Synchronization by Shared Data
Y Primitives– Semaphores.
– Condition critical regions.– Monitors.
flexibilitystructure
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 69
Synchronization by MP.
Y Explicit communication.
Y Primitives: send and receive.
Y Mailboxes.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 70
Transactions 1
Y Database term.– Execution of program that accesses a
database.
Y In distributed systems,– Concurrency control in the client/server
model.– From client’s point of view, sequence of
operations executed by server in servicingclient’s request.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 71
Transaction Atomicity
Y “All or nothing”.
Y Sequence of operations to serviceclient’s request are performed in onestep, ie, either all of them are executedor none are.
Y Issues:– Multiple concurrent clients: “isolation”.
– Server failures: “failure atomicity”.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 72
Transaction Features
Y Recoverability: server should be able to“roll back” to state before transactionexecution.
Y Serializability: transactions have sameeffect whether executed concurrently orsequentially.
Y Durability: effects of transactions arepermanent.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 73
Local versus Distributed Transactions
Y Local transactions:– All transaction operations executed by
single server.
Y Distributed transactions:– Involve multiple servers.
Y Both local and distributed transactionscan be simple or nested.– Nesting: increase level of concurrency.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 74
Concurrency Control
Y Maintain transaction serializability.
Y Server operations: read or write.
Y 3 mechanisms:– Locks.– Optimistic concurrency control.
– Timestamp ordering.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 75
Locks 1
Y Order transactions accessing shareddata based on order of access to data.
Y Lock granularity: affects level ofconcurrency.– 1 lock per shared data item.
– Read (shared) and write locks.– Lock compatibility (page 381).
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 76
Lock Implementation
Y Server lock manager– Maintains table of locks for server data
items.– Lock and unlock operations.– Clients wait on a lock for given data until
data is released; then client is signalled.– Each client’s request runs as separate
server thread.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 77
Deadlock
Y Use of locks can lead to deadlock.
Y Deadlock: each transaction waits foranother transaction to release a lockforming a wait cycle.
Y Deadlock condition: cycle in the wait-forgraph.
Y Deadlock prevention and detection.
Y Deadlock resolution: lock timeout.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 78
Optimistic Concurrency Control 1
Y Assume that most of the time,probability of conflict is low.
Y Transactions allowed to proceed inparallel until ready to commit.
Y When client issues close transaction,check whether there was conflict; if so,some transactions aborted.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 79
Optimistic Concurrency 2Y Read phase
– Transactions have tentative version of dataitems it accesses.
– Tentative versions allow transactions to abortwithout making their effect permanent.
Y Validation phase– Executed upon close transaction.– Checks serially equivalence.
– If validation fails, conflict resolution decideswhich transaction(s) to abort.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 80
Optimistic Concurrency 3
Y Write phase– If transaction is validated, all of its tentative
versions are made permanent.– Read-only transactions commit
immediately.
– Write transactions commit only after theirtentative versions are recorded inpermanent storage.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 81
Timestamp Ordering
Y Uses timestamps to order transactionsaccessing same data items according totheir starting times.
Y Assigning timestamps:– Clock based.
– Monotonically increasing counter.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 82
Distributed Transactions 1
c
s1
s2
s3
1
2
3
Simple DistributedTransaction
c
s1 s112
s121
s111
s11
s12
Nested DistributedTransaction
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 83
Distributed Transactions 2
Y Transaction coordinator– First server contacted by client.
– Responsible for aborting/commiting.– Adding workers.
Y Workers– Other servers involved.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 84
Atomicity in Distributed Transactions
Y Harder: several servers involved.
Y Atomic commit protocols– 1-phase commit
X Example: coordinator sends “commit” or “abort”to workers; keeps re-broadcasting until it getsACK from all of them that request wasperformed.
X Inefficient.X Client makes decision to commit which may not
be feasible.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 85
2-Phase Commit 1
Y First phase: voting– Each server votes to commit or abort
transaction.
Y Second phase: carrying out jointdecision.– If any server votes to abort, joint decision is
to abort.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 86
2-Phase Commit 2
Y Issues:– Ensure all servers vote and reach same
decision despite of failures.– When vote to commit, ensure server will
commit even in the event of failure.Coordinator Workers
1. Prepared to commit?2. Prepared tocommit.3. Committed.
4. Committed.5. Done.
Can commit?
YesDo commit
Have committed
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 87
Concurrency Control in Distributed Transactions 1
Y Locks– Each server manages locks for its own
data.– Decides whether to grant lock or make
transaction wait.
– Locks cannot be released until transactioncommitted or aborted on all serversinvolved.
– Distributed deadlocks!
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 88
Concurrency Control in Distributed Transactions 2
Y Timestamp Ordering– Globally unique timestamps.
X First server issues globally unique TS andpasses it around.
X TS: <server id, local TS>
– Servers must ensure same order ofoperations on all servers.
– Clock synchronization issues
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 89
Concurrency Control in Distributed Transactions 3
Y Optimistic concurrency control– Makes even more sense?
– Validation phase occurs during 1st. phaseof 2-phase commit.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 90
Camelot [Spector et al.]
Y Supports execution of distributedtransactions.
Y Specialized functions:– Disk management
X Allocation of large contiguous chunks.
– Recovery managementX Transaction abort and failure recovery.
– Transaction managementX Abort, commit, and nest transactions.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 91
Distributed Deadlock 1
Y When locks are used, deadlock can occur.
Y Circular wait in wait-for graph meansdeadlock.
Y Centralized deadlock detection, prevention,and resolutions schemes.– Examples:
X Detection of cycle in wait-for graph.X Lock timeouts: hard to set TO value, aborting
unnecessarily.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 92
Distributed Deadlock 2
Y Much harder to detect, prevent, andresolve. Why?– No global view.
– No central agent.– Communication-related problems
X Unreliability.X Delay.X Cost.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 93
Distributed Deadlock Detection
Y Cycle in the global wait-for graph.
Y Global graph can be constructed fromlocal graphs: hard!– Servers need to communicate to find
cycles.
Y Example from book (page 425).
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 94
Distributed Deadlock Detection Algorithms 1
Y [Chandy et al.]
Y Message sequencing is preserved.
Y Resource versus communication models.– Resource model
X Processes, resources, and controllers.X Process requests resource from controller.
– Communication modelX Processes communicate directly via messages
(request, grant, etc) requesting resources.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 95
Resource versus Communication Models
Y In resource model, controllers aredeadlock detection agents; incommunication model, processes.
Y In resource model, process cannotcontinue until all requested resourcesgranted; in communication model, processcannot proceed until it can communicatewith at least one process it’s waiting for.
Y Different models, different detection alg’s.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 96
Distributed Deadlock Detection Schemes
Y Graph-theory based.
Y Resource model: deadlock when cycleamong dependent processes.
Y Communication model: deadlock whenknot (all vertices that can be reachedfrom i can also reach i) of waitingprocesses.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 97
Deadlock Detection in Resource Model
Y Use probe messages to follow edges ofwait-for graph (aka edge chasing).
Y Probe carries transaction wait-forrelations representing path in globalwait-for graph.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 98
Deadlock Detection Example1. Server 1 detects transaction T is waiting for U,
which is waiting for data from server 2.2. Server 1 sends probe T->U to server 2.3. Server 2 gets probe and checks if U is also waiting;
if so (say for V), it adds V to probe T->U->V. If V iswaiting for data from server 3, server 2 forwardsprobe.
4. Paths are built one edge at a time.Before forwarding probe, server checks for cycle (e.g.,
T->U->V->T).5. If cycle detected, a transaction is aborted.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 99
Replication 1Y Keep more than one copy of data item.
Y Technique for improving performance indistributed systems.
Y In the context of concurrent access todata, replicate data for increaseavailability.– Improved response time.
– Improved availability.– Improved fault tolerance.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 100
Replication 2
Y But nothing comes for free.
Y What’s the tradeoff?– Consistency maintenance.
Y Consistency maintenance approaches:– Lazy consistency (gossip approach).
– Quorum consensus.– Process group.
Stronger consistency
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 101
Quorum Consensus
Y Goal: prevent partitions from fromproducing inconsistent results.
Y Quorum: subgroup of replicas whose sizegives it the right to carry out operations.
Y Quorum consensus replication:– Update will propagate successfully to a
subgroup of replicas.– Other replicas will have outdated copies but
will be updated off-line.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 102
Weighted Voting [Gifford] 1
Y Every copy assigned a number of votes(weight assigned to a particular replica).
Y Read: Must obtain R votes to read fromany up-to-date copy.
Y Write: Must obtain write quorum of Wbefore performing update.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 103
Weighted Voting 2
Y W > 1/2 total votes, R+W > total votes.
Y Ensures non-null intersection betweenevery read quorum and write quorum.
Y Read quorum guaranteed to havecurrent copy.
Y Freshness is determined by versionnumbers.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 104
Weighted Voting 3Y On read:
– Try to find enough copies, ie, total votes noless than R. Not all copies need to be current.
– Since it overlaps with write quorum, at leastone copy is current.
Y On write:– Try to find set of up-to-date replicas whose
votes no less than W.
– If no sufficient quorum, current copies replaceold ones, then update.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 105
ISIS 1
Y Goal: provide programmingenvironment for development ofdistributed systems.
Y Assumptions:– DS as a set of processes with disjoint
address spaces, communicating over LANvia MP.
– Processes and nodes can crash.
– Partitions may occur.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 106
ISIS 2
Y Distinguishing feature: groupcommunication mechanisms– Process group: processes cooperating in
implementing task.
– Process can belong to multiple groups.– Dynamic group membership.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 107
Virtual Synchrony
Y Real synchronous systems– Events (e.g., message delivery) occur in
the same order everywhere.– Expensive and not very efficient.
Y Virtual synchronous systems– Illusion of synchrony.– Weaker ordering guarantees when
applications allow it.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 108
ISIS Features
Y Atomic multicast– One or none.
– As long as 1 member is up, message willbe delivered to remaining members.
– FBCAST: unordered multicast.
– CBCAST: causally ordered multicast.X “Happen before” order.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 109
Time in Distributed Systems
Y Notion of time is critical.
Y “Happen before” notion.– Example: concurrency control using TS’s.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 110
Atomic Multicast 1
Y One or none.
Y As long as 1 member is up, message willbe delivered to remaining members.
Y FBCAST: unordered multicast.
Y CBCAST: causally ordered multicast.– “Happened before” order.
– Messages from given process in order.– UDP+ACKs+reXmissions.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 111
Atomic Multicast 2
Y ABCAST: totally ordered multicast.– Ordered delivery of messages across
processes.– Uses token site to serialize messages.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 112
Other Features
Y Process groups– Group membership management.
Y Broadcast and group RPC– RPC-like interface to CBCAST, ABCAST,
and FBCAST protocols.
– Delivery guaranteesX Caller indicates how many responses required.
– No responses: asynchronous.– 1 or more: synchronous.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 113
Implementation
Y Set of library calls on top of UNIX.
Y Commercially available.
Y In the paper, example of distributed DBimplementation using ISIS.
Y HORUS: extension to WANs.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 114
Time in Distributed Systems
Y Notion of time is critical.
Y “Happened before” notion.– Example: concurrency control using TS’s.– “Happened before” notion is not
straightforward in distributed systems.X No guarantees of synchronized clocks.X Communication latency.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 115
Event Ordering
Y Lamport defines partial ordering:1. If 2 events occurred in the same process,
then they occurred in the order observed.2. Whenever message sent between 2
processes, event sending messageoccurred before event receiving message.
3. If X->Y and Y->Z, then X->Z.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 116
Causal Ordering
Y “Happened before” also called causalordering.
Y Example: text book page 298.
Y In summary, possible to drawhappened-before relationship betweenevents if they happen in same processor there’s chain of messages betweenthem.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 117
Logical Clocks
Y Monotonically increasing counter.
Y No relation with real clock.
Y Each process keeps its own logicalclock Cp used to timestamp events.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 118
Causal Ordering and Logical Clocks
1. Cp incremented before each event. Cp=Cp+1.2. When p sends message m, it
piggybacks t=Cp.3. When q receives (m, t), it computes: Cq=max(Cq, t) before timestamping
message receipt event.Example: text book page 299.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 119
Total Ordering
Y Extending partial to total order.
Y Global timestamps:– (Ta, pa), where Ta is local TS and pa is the
process id.– (Ta, pa) < (Tb, pb) iff Ta < Tb or
Ta=Tb and pa<pb
– Total order consistent with partial order.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 120
Virtual Time [Jefferson] 1
Y Time warp mechanism.
Y May or may not have connection withreal time.
Y Uses optimistic approach, ie, eventsand messages are processed in theorder received: “look-ahead”.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 121
Virtual Time 2
Y Process virtual clock (local virtual clock)set to TS of next message in inputqueue.
Y If next message’s TS is in the past,rollback!– Can happen due to different computation
rates and communication latency.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 122
Rolling Back
Y Process goes back to TS(latestmessage).
Y Cancels all intermediate effects.
Y Then, executes forward.
Y Rolling back is expensive!– Messages may have been sent to other
processes causing them to sendmessages, etc.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 123
Anti-Messages 1Y For every message, there is an anti-
message with same content but differentsign.
Y When sending message, message goesto receiver input queue and a copy with “-”sign is enqueued in the sender’s outputqueue.
Y Message is retained for use in case of rollback.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 124
Anti-Message 2
Y Message + its antimessage = 0 when inthe same queue.
Y Processes must keep log to “undo”operations.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 125
Implementation
Y Local control.
Y Global control– How to make sure system as a whole
progresses.– Errors and I/O when roll back occurs.
– Avoid running out of memory.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 126
Global Virtual Clock
Y Snapshot of system at given real time.
Y Minimum of all local virtual times.
Y Lower bound on how far processes rollback.
Y Purge state before GVT.
Y GVT computed concurrently with rest oftime warp mechanism.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 127
CSci555: Advanced Operating SystemsLecture 4 - September 24, 1999
Dr. Clifford NeumanUniversity of Southern CaliforniaInformation Sciences Institute
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 128
Naming Concepts
Y Name– What you call something
Y Address– Where it is located
Y Route– How one gets to it
Y But it is not that clear anymore, it depends onperspective. A name from one perspectivemay be an address from another.– Perspective means layer of abstraction
What is http://www.isi.edu/people/bcn ?
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 129
What are the things we name
Y Users– To direct, and to identify
Y Hosts (computers)– High level and low level
Y Services– Service and instance
Y Files and other “objects”– Content and repository
Y Groups– Of any of the above
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 130
How we name things
Y Host-Based Naming– Host-name is required part of object name
Y Global Naming– Must look-up name in global database to find address– Name transparency
Y User/Object Centered Naming– Namespace is centered around user or object
Y Attribute-Based Naming– Object identified by unique characteristics– Related to resource discovery / search / indexes
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 131
Namespace
Y A name space maps:
Σ∗ → X ε O
At a particular point in time.
Y The rest of the definition, and even some ofthe above, is open to discussion/debate.
Y What is a “flat namespace”– Implementation issue
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 132
Case Studies
Y Host Table– Flat namespace (?)– Global namespace (?)
Y Domain name system– Arbitrary depth– Iterative or recursive(chained) lookup– Multi-level caching
Y Grapevine– Two-level, iterative lookup– Clearinghouse 3 level
GetHostByName(usc.arpa){ scan(host file); return(matching entry); }
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 133
Case Studies
Y Host Table– Flat namespace (?)– Global namespace (?)
Y Domain name system– Arbitrary depth– Iterative or recursive(chained) lookup– Multi-level caching
Y Grapevine– Two-level, iterative lookup– Clearinghouse 3 level
GetHostByName(host.sc)
esgv
gvsc
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 134
Case Studies
Y Host Table– Flat namespace (?)– Global namespace (?)
Y Domain name system– Arbitrary depth– Iterative or recursive(chained) lookup
Y Grapevine– Two-level, iterative lookup– Clearinghouse 3 level
GetHostByName(aludra.usc.edu)
/ edu usc.edu
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 135
Scalability of naming
Y Scalability– Ability to continue to operate efficiently as a system grows large,
either numerically, geographically, or administratively.
Y Affected by– Frequency of update– Granularity– Evolution/reconfiguration
Y DNS characteristics– Multi-level implementation– Replication of root and other servers– Multi-level caching
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 136
Other implementations of naming
Y Broadcast– Limited scalability, but faster local response
Y Prefix tables– Essentially a form of caching
Y Capabilities– Combines security and naming– Traditional name service built over capability
based addresses
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 137
Closure
Y Closure binds an object to the namespace withinwhich names embedded in the object are to beresolved.– Namespace is dynamic– “Object” may as small as the name itself
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 138
Advanced Name Systems
Y DEC’s Global Naming– Support for reorganization the key idea– Little coordination needed in advance
Y Prospero Directory Service– Multiple namespace centered around a “root” node that
is specific to each namespace.– Closure binds objects to this “root” node.– Used today as an embedded directory service.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 139
Non Hierarchical Naming
Y Examples– Prospero– The Web
Y User centered naming– Prospero– Tilde– Quicksilver
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 140
Resource Discovery
Y Similar to naming– Browsing related to directory services– Indexing and search similar to attribute based naming
Y Attribute based naming– Profile– Multi-structured naming
Y Search engines
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 141
Resource Discovery
Y Similar to naming– Browsing related to directory services– Indexing and search similar to attribute based naming
Y Object handles– Uniform Resource Locators (URL’s)
X Is it a name or an address?
– Uniform Resource Names (URN’s)X Is a directory service required
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 142
CSci555: Advanced Operating SystemsLecture 5 - October 1, 1999
Dr. Clifford NeumanUniversity of Southern CaliforniaInformation Sciences Institute
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 143
The Web
Y Object handles– Uniform Resource Locators (URL’s)
X Is it a name or an address?
– Uniform Resource Names (URN’s)X Is a directory service required
– How URL’s are misused
Y XML– Definitions provide a form of closure
X Conceptual level rather than the “namespace” level.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 144
Security Goals
Y Protection– Enforced by hardware– Depends on trusted kernel
Y Authentication– Determining identity of principal
Y Integrity– Authenticity of document– That it hasn’t changes
Y Privacy– That inappropriate information is not disclosed
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 145
Security Policy
YAccess Matrix
–implemented as:XCapabilities or
XAccess Control list
Subject O BJ1 O BJ2b cn RW Rgos t-g rou p RW -ob raczk a R RWtyao R RCsci555 R -
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 146
Access Control Lists
YAdvantages– Easy to see who has access– Easy to change/revoke access
YDisadvantages– Time consuming to check access
YExtensions to ease management– Groups– EACLs
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 147
Extended Access Control ListsY Conditional authorization
– Implemented as restrictions on ACL entries andembedded as restrictions in authentication andauthorization credentials
P rincipa l R ight s Cond it ionsb cn RW H W -A u th en t ica tion
Reta in O ld Item sgos t-g rou p RW TIM E: 9A M -5PM
au th o rizationserve r
R D eleg a ted -A ccess
* R Load Lim it 8U se: N on -C om m ercia l
* R Paym en t: $P r ice
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 148
Generic Authorization and Access-control API
� Allows applications to use the securityinfrastructure to implement security policies.
gaa_get_object_eacl function called before other GAA APIroutines, each of which requires a handle to object EACL to identifyEACLs on which to operate.
gaa_check_authorization function tells application whetherrequested operation is authorized, or if additional application specificchecks are required
Application
GAA API
input
output
gaa_get_ object_eacl
gaa_check_authorization
Yes,no,maybe
SC,obj_id,op
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 149
Capabilities
YAdvantages– Easy and efficient to check access– Easily propagated
YDisadvantages– Hard to protect capabilities– Easily propagated– Hard to revoke
YHybrid approach–EACL’s/proxies
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 150
Protecting capabilities
YStored in TCB– Only protected calls manipulate
YLimitations ?– Works in centralized systems
YDistributed Systems– Tokens with random or special coding– Possibly protect through encryption– How does Amoeba do it? (claimed)
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 151
Basic Security Services
Y Authentication
Y Authorization
Y Accounting
Y Audit
Y Assurance
Y Payment
Y Protection
Y PolicyY Privacy
Y Confidentiality
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 152
Network Threats
–Unauthorized release of data–Unauthorized modification of data–Spurious association initiation
XImpersonation
–Denial of use–Traffic analysis
YAttacks may be–Active or passive
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 153
Likely points of attack (location)
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 154
Likely points of attack (module)
Y Against the protocols– Sniffing for passwords and credit card numbers– Interception of data returned to user– Hijacking of connections
Y Against the server– The commerce protocol is not the only way in– Once an attacker is in, all bets are off
Y Against the client’s system– You have little control over the client’s system
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 155
Network Attacks
EavesdroppingListening for passwords or credit card numbers
Message stream modificationChanging links and data returned by server
HijackingKilling client and taking over connection
C SAttacker
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 156
Network Attack Countermeasures
Don’t send anything importantNot everything needs to be protected
EncryptionFor everything elseMechanism limited by client side software
C SAttacker
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 157
Encryption for confidentiality and integrity
Y Encrypton used to scramble data
PLAINTEXT PLAINTEXTCIPHERTEXT
ENCRYPTION(KEY)
DECRYPTION(KEY)
++
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 158
Authentication
Y Proving knowledge of encryption key– Nonce = Non repeating value
{Nonce or timestamp}Kc
C S
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 159
CSci555: Advanced Operating SystemsLecture 6 - October 8, 1999
Dr. Clifford NeumanUniversity of Southern CaliforniaInformation Sciences Institute
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 160
Key distribution
Y Conventional cryptography– Single key shared by both parties
Y Public Key cryptography– Public key published to world
– Private key known only by owner
Y Third party certifies or distributes keys– Certification infrastructure
– Authentication
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 161
Authentication w/ Conventional Crypto
Y Kerberos
KDC1
2
SC3
or Needham Schroeder
,4,5
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 162
Authentication w/ PK Crypto
Y Based on public key certificates
DS
1
2
SC
3
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 163
KerberosThird-party authentication service
– Distributes session keys for authentication, confidentiality,and integrity
TGS
4. Ts+{Reply}Kt
3. TgsReq
KDC
1. Req2. T+{Reply}Kc
C S5. Ts + {ts}Kcs
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 164
Public Key Cryptography (revisited)
Y Key Distribution– Confidentiality not needed for public key– Solves n2 problem
Y Performance– Slower than conventional cryptography– Implementations use for key distribution, then use
conventional crypto for data encryption
Y Trusted third party still needed– To certify public key– To manage revocation– In some cases, third party may be off-line
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 165
Certificate-Based AuthenticationCertification authorities issue signed certificates
– Banks, companies, & organizations like Verisign act as CA’s– Certificates bind a public key to the name of a user– Public key of CA certified by higher-level CA’s– Root CA public keys configured in browsers & other software– Certificates provide key distribution
Authentication steps– Verifier provides nonce, or a timestamp is used instead.– Principal selects session key and sends it to verifier with nonce,
encrypted with principal’s private key and verifier’s public key, andpossibly with principal’s certificate
– Verifier checks signature on nonce, and validates certificate.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 166
Secure Sockets Layer (and TLS)
Encryption support provided betweenBrowser and web server - below HTTP layer
Client checks server certificateWorks as long as client starts with the correct URL
Key distribution supported through cert stepsAuthentication provided by verify steps
C S
Attacker
Hello
Hello + CertS
{PMKey}Ks [CertC + VerifyC ]
VerifyS
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 167
Trust models for certification
Y X.509 Hierarchical– Single root (original plan)– Multi-root (better accepted)– SET has banks as CA’s and common SET root
Y PGP Model– “Friends and Family approach” - S. Kent
Y Other representations for certifications
Y No certificates at all– Out of band key distribution– SSH
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 168
Global Authentication ServiceY Pair-wise trust in hierarchy
– Name is derived from path followed– Shortcuts allowed, but changes name– Exposure of path is important for security
Y Compared to Kerberos– Transited field in Kerberos - doesn’t change name
Y Compared with X.509– X.509 has single path from root– X.509 is for public key systems
Y Compared with PGP– PGP evaluates path at end, but may have name
conflicts
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 169
Capability Based Systems - Amoeba
“Authentication not an end in itself”Y Theft of capabilities an issue
– Claims about no direct access to network– Replay an issue
Y Modification of capabilities a problem– One way functions provide a good solution
Y Where to store capabilities for convenience– In the user-level naming system/directory– 3 columns
Y Where is authentication in Amoeba– To obtain initial capability
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 170
Capability Directories in Amoeba
BCN User Group OtherFile1File2users
Katia User Group OtherFile3bcnusers
users User Group OtherBCNKatiatyao
Login CapBCNKatiatyao
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 171
Security Architectures
Y DSSA– Delegation is the important issue
X Workstation can act as userX Software can act as workstation - if given keyX Software can act as developer - if checksum validated
– Complete chain needed to assume authority– Roles provide limits on authority - new sub-principal
Y Proxies - Also based on delegation– Limits on authority explicitly embedded in proxy– Works well with access control lists
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 172
Distributed AuthorizationY It must be possible to maintain authorization
information separate from the end servers– Less duplication of authorization database– Less need for specific prior arrangement– Simplified management
Y Based on restricted proxies which support– Authorization servers– Group Servers– Capabilities– Delegation
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 173
Proxies
Y A proxy allows a second principal tooperate with the rights and privileges ofthe principal that issued the proxy– Existing authentication credentials– Too much privilege and too easily propagated
Y Restricted Proxies– By placing conditions on the use of
proxies, they form the basis of a flexibleauthorization mechanism
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 174
Restricted Proxies
Y Two Kinds of proxies– Proxy key needed to exercise bearer proxy– Restrictions limit use of a delegate proxy
Y Restrictions limit authorized operations– Individual objects– Additional conditions
+ ProxyProxyConditions:Use between 9AM and 5PMGrantee is user X, Netmaskis 128.9.x.x, must be able toread this fine print, can you
PROXY CERTIFICATE
Grantor
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 175
Authorization and Group Services
1. Authenticated authorization request (operation X)
2. [operation X only]R, {Kproxy} Ksession
3. [operation X only]R, authentication using Kproxy
R1
2
SC3
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 176
Central AuthorizationY Authorization server uses
extended ACLs– Conditions are not evaluated, but instead
attached to credentials
Y Groups implemented by auth server– Server grants right to assert group membership
Y Application servers configuredto use authorization server– Minimal local ACL– Can use multiple Authorization servers
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 177
Applied Security
YElectronic commerce– SSL Applies authentication and encryption– NetCheque applies proxies– SET applies certification– End system security a major issue
Y What we have today– Firewalls– Web passwords, encryption, certificates– Windows 2000 uses Kerberos
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 178
CSci555: Advanced Operating SystemsLecture 7 - October 15, 1999File Systems
Dr. Katia ObraczkaUniversity of Southern CaliforniaInformation Sciences Institute
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 179
Outline
Y Time in distributed systems.
Y File systems.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 180
File Systems
Y Provide set of primitives that abstractusers from details of storage accessand management.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 181
Distributed File Systems
Y Promote sharing across machineboundaries.
Y Transparent access to files.
Y Make diskless machines viable.
Y Increase disk space availability byavoiding duplication.
Y Balance load among multiple servers.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 182
Sun Network File System 1Y De facto standard:
– Mid 80’s.– Widely adopted in academia and industry.
Y Provides transparent access to remotefiles.
Y Uses Sun RPC and XDR.– NFS protocol defined as set of procedures
and corresponding arguments.– Synchronous RPC:
X Client blocks until it gets results from server.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 183
Sun NFS 2
Y Stateless server:– Remote procedure calls are self-contained.
– Servers don’t need to keep state aboutprevious requests.X Flush all modified data to disk before returning
from RPC call.
– Robustness.X No state to recover.X Clients retry.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 184
Location TransparencyY Client’s file name space includes
remote files.– Shared remote files are exported by
server.– They need to be remote-mounted by client.
Client/root
vmunix usr
staffstudents
Server 1/root
export
users
joe bob
Server 2/root
nfs
users
ann eve
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 185
Achieving Transparency 1
Y Mount service.– Mount remote file systems in the client’s
local file name space.– Mount service process runs on each node
to provide RPC interface for mounting andunmounting file systems at client.
– Runs at system boot time or user logintime.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 186
Achieving Transparency 2Y Automounter.
– Dynamically mounts file systems.– Runs as user-level process on clients
(daemon).– Resolves references to unmounted pathnames
within files managed by the automounter bymounting them on demand.
– Maintains a table of mount points and thecorresponding server(s); sends probes toserver(s).
– Primitive form of replication.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 187
Transparency?
Y Early binding.– Mount system call attaches remote file
system to local mount point.– Client deals with host name once.– But, mount needs to happen before remote
files become accessible.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 188
Other Functions
Y NFS file and directory operations:– read, write, create, delete, getattr, etc.
Y Access control:– File and directory access permissions.
Y Path name translation:– Lookup for each path component.– Caching.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 189
Implementation
UnixFS
NFSclient
VFS
Client
Unix Kernel
NFSserver
UnixFS
VFS
Server
Unix Kernel
Clientprocess
RPC
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 190
Virtual File System 1Y VFS added to UNIX kernel.
– Location-transparent file access.– Distinguishes between local and remote access.
Y @ client:– Processes file system system calls to determine
whether access is local (passes it to UNIX FS) orremote (passes it to NFS client).
Y @ server:– NFS server receives request and passes it to local
FS through VFS.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 191
VFS 2
Y If local, translates file handle to internalfile id’s (in UNIX i-nodes).
Y V-node:X If file local, reference to file’s i-node.X If file remote, reference to file handle.
Y File handle: uniquely distinguishes file.
File system id I-node # I-node generation #
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 192
NFS Caching
Y File contents and attributes.
Y Client versus server caching.
Client Server
$ $
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 193
Server CachingY Read:
– Same as UNIX FS.
– Caching of file pages and attributes.– Cache replacement uses LRU.
Y Write:– Write through (as opposed to delayed
writes of conventional UNIX FS). Why?
– [Delayed writes: modified written to diskwhen buffer space needed, sync operation(every 30 sec), file close].
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 194
Client Caching 1
Y Timestamp-based cache invalidation.
Y Read:– Cached entries have TS with last-modified
time.– Blocks assumed to be valid for TTL.
X TTL specified at mount time.X Typically 3 sec for files.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 195
Client Caching 2
Y Write:– Modified pages marked and flushed to
server at file close or sync (every 30 sec).
Y Consistency?– Not always guaranteed!
– E.g., client modifies file; delay formodification to reach servers + 3-secwindow for cache validation from clientssharing file.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 196
Cache Validation
Y Validation check performed when:– First reference to file after TTL expires.
– File open or new block fetched from server.
Y Done for all files (even if not being shared).
Y Expensive!– Potentially, every 3 sec get file attributes.– If needed invalidate all blocks.
– Fetch fresh copy when file is next accessed.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 197
The Sprite File System 1
Y Main memory caching on both clientand server.
Y Write-sharing consistency guarantees.
Y Variable size caches.– VM and FS negotiate amount of memory
needed.– According to caching needs, cache size
changes.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 198
Sprite 2
Y Sprite supports concurrent writes bydisabling caching of write-shared files.– If file shared, server notifies client that has
file open for writing to write modified blocksback to server.
– Server notifies all client that have file openfor read that file is no longer cacheable;clients discard all cached blocks, soaccess goes throu server.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 199
Sprite 3
Y Sprite servers are stateful.– Need to keep state about current
accesses.– Centralized points for cache consistency.
X Bottleneck?X Single point of failure?
Y Tradeoff: consistency versusperformance.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 200
Andrew
Y Distributed computing environmentdeveloped at CMU.
Y Campus wide computing system.– Between 5 and 10K workstations.
– 1991: |~ 800 ws’s, 40 servers.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 201
Andrew FS
Y Goals:– Information sharing.
– Scalability.X Key strategy: caching of whole files at client.X Whole file serving
– Entire file transferred to client.
X Whole file caching– Local copy of file cached on client’s local disk.– Survive client’s reboots and server unavailability.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 202
Whole File Caching
Y Local cache contains several mostrecently used files.
S(1)open<file>?
(2) open<file>
C
(5) file
(3)
(4)(6)
file
- Subsequent operations on file applied to local copy.- On close, if file modified, sent back to server.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 203
Implementation 1
Y Network of ws’s running Unix BSD 4.3and Mach.
Y Implemented as 2 user-level processes:– Vice: runs at each Andrew server.
– Venus: runs at each Andrew client.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 204
Implementation 2Y Modified BSD 4.3 Unix
kernel.– At client, intercept file
system calls (open,close, etc.) and passthem to Venus whenreferring to shared files.
Y File partition on localdisk used as cache.
Y Venus manages cache.– LRU replacement policy.– Cache large enough to
hold 100’s of average-sized files.
Unix kernel
Unix kernel
Vice
User program
Venus
Network
Client
Server
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 205
File Sharing
Y Files are shared or local.– Shared files
X Utilities (/bin, /lib): infrequently updated or filesaccessed by single user (user’s home directory).
X Stored on servers and cached on clients.X Local copies remain valid for long time.
– Local filesX Temporary files (/tmp) and files used for start-up.X Stored on local machine’s disk.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 206
File Name Space
Y Regular UNIX directory hierarchy.Y “cmu” subtree contains shared files.Y Local files stored on local machine.Y Shared files: symbolic links to shared files.
/
tmp bin vmunixcmu
bin
Local Shared
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 207
AFS Caching 1Y AFS-1 uses timestamp-based cache
invalidation.
Y AFS-2 and + use callbacks.– When serving file, Vice server promises to
notify Venus client when file is modified.– Stateless servers?
– Callback stored with cached file.X Valid.X Canceled: when client is notified by server that
file has been modified.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 208
AFS Caching 2Y Calbacks implemented using RPC.Y When accessing file, Venus checks if file
exists and if callback valid; if canceled,fetches fresh copy from server.
Y Failure recovery:– When restarting after failure, Venus checks
each cached file by sending validation requestto server.
– Also periodic checks in case of communicationfailures.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 209
AFS Caching 3
Y @ file close time, Venus on clientmodifying file sends update to Viceserver.
Y Server updates its own copy and sendscallback cancellation to all clientscaching file.
Y Consistency?
Y Concurrent updates?
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 210
AFS Replication
Y Read-only replication.– Only read-only files allowed to be
replicated at several servers.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 211
Coda
Y Evolved from AFS.
Y Goal: constant data availability.– Improved replication.
X Replication of read-write volumes.
– Disconnected operation: mobility.X Extension of AFS’s whole file caching mechanism.
Y Access to shared file repository (servers)versus relying on local resources whenserver not available.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 212
Replication in CodaY Replication unit: file volume (set of files).
Y |Set of replicas of file volume: volumestorage group (VSG).
Y Subset of replicas available to client:AVSG.– Different clients, different AVSGs.– AVSG membership changes as server availability
changes.– On write: when file is closed, copies of modified file
broadcast to AVSG.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 213
Optimistic Replication
Y Goal is availability!
Y Replicated files are allowed to bemodified even in the presence ofpartitions or during disconnectedoperation.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 214
Disconnected OperationY AVSG = { }.
Y Network/server failures or host on the move.
Y Rely on local cache to serve all needed files.
Y Loading the cache:– User intervention: list of files to be cached.
– Learning usage patterns over time.
Y Upon reconnection, cached copies validatedagainst server’s files.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 215
Normal and Disconnected Operation
Y During normal operation:– Coda behaves like AFS.– Cache miss transparent to user; only performance
penalty.– Load balancing across replicas.– Cost: replica consistency + cache consistency.
Y Disconnected operation:– No replicas are accessible; cache miss prevents
further progress; need to load cache beforedisconnection.
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 216
Replication and Caching
Y Coda integrates server replication and clientcaching.– On cache hit and valid data: Venus does not
need to contact server.
– On cache miss: Venus gets data from an AVSGserver, i.e., the preferred server (PS).X PS chosen at random or based on proximity, load.
– Venus also contacts other AVSG servers andcollect their versions; if conflict, abort operation;if replicas stale, update them off-line.