advanced operating systems lecture...

216
Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 1 Advanced Operating Systems Lecture notes Clifford Neuman, Katia Obraczka University of Southern California Information Sciences Institute

Upload: others

Post on 22-Jan-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 1

Advanced Operating SystemsLecture notes

Clifford Neuman, Katia ObraczkaUniversity of Southern CaliforniaInformation Sciences Institute

Page 2: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 2

CSci555: Advanced Operating SystemsLecture 1 - September 3, 1999

Dr. Clifford NeumanUniversity of Southern CaliforniaInformation Sciences Institute

Page 3: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 3

Some things an operating system does

Y Memory ManagementY Scheduling / Resource management

Y CommunicationY Protection and Security

Y File Management - I/OY Naming

Y SynchronizationY User Interface

Page 4: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 4

Progression of Operating Systems

Primary goal of a distributed system:– Sharing

Progression over past years– Dedicated machines– Batch Processing

– Time Sharing– Workstations and PC’s– Distributed Systems

Page 5: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 5

Structure of Distributed Systems

Y Kernel– Basic functionality and protection

Y Application Level– What does the real work

Y Servers– Service and support functions needed by

applications– Many functions that used to be in Kernel

are now in servers.

Page 6: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 6

Structure of Distributed Systems

User Space

Kernel

UP

SVR

User Space

Kernel

SVR

Page 7: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 7

Characteristics of a Distributed System

Y Basic characteristics:– Multiple Computers– Interconnections

– Shared State

Page 8: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 8

Why Distributed Systems are Hard

Y Scale:– Numeric– Geographic– Administrative

Y Loss of control over parts of the system

Y Unreliability of Messages

Y Parts of the system down or inaccessible– Lamport: You know you have a distributed system when

the crash of a computer you have never heard of stops youfrom getting any work done.

Page 9: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 9

CSci555: Advanced Operating SystemsLecture 2 - September 10, 1999

Dr. Katia ObraczkaUniversity of Southern CaliforniaInformation Sciences Institute

Page 10: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 10

Administration 1

Y Diagnostic exams:– Everyone should have gotten their grades

and our recommendation by e-mail.– Talk to us if you haven’t heard anything.

Y Academic Integrity Policy:– NOT A JOKE!– We’ll strictly enforce it.

– Read it and talk to us if you havequestions.

Page 11: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 11

Administration 2

Y Reading report #1 assigned.– Get it from the class Web page.

– Look at guidelines for writing andsubmitting report.

Y Office:– SAL 236, phone (213) 740-4777.– Information on Web page updated.

Page 12: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 12

Last Class

Y How to read a technical paper.

Y What is a distributed system.

Y Motivation, applications, some history.

Y More: textbook chapters 1 and 2.– Including a chart with historic facts for

more details on distributed systems history(pages 23 and 24).

Page 13: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 13

Outline

Y E2E Argument [Saltzer et al.]

Y Communication Models.– Message passing.– Distributed shared memory (DSM).

– Remote procedure call (RPC) [Birrel et al.]X Light-weight RPC [Bershad et al.]

– DSM case studiesX IVY [Li et al.]X Linda [Carriero et al.]

Page 14: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 14

End-to-End Argument 1

Y QUESTION: Where to place distributedsystems functions?

Y Layered system design:– Different levels of abstraction for simplicity.

– Lower layer provides service to upperlayer.

– Very well defined interfaces.

Page 15: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 15

E2E Argument 2

Y E2E argument paper argues that functionsshould be moved closer to the applicationthat uses them.

Y Rationale:– Some functions can only be completely and

correctly implemented with app’s knowledge.X Example: Reliable message delivery.

– App’s that don’t need certain functions shouldnot have to pay for them.

Page 16: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 16

E2E Argument 3

Y Counter argument:– Performance.

– Example: File transferX Reliability checks at lower layers detect

problms earlier.X Abort transfer and re-try without having to wait

till whole file is transmitted.

Y Bottom line: “spread out” functionalityacross layers.

Page 17: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 17

E2E Argument 4

Y Another perspective:– Why pay for something you don’t need.

X Example 1: the Internet.X Example 2: trend in kernel design - take away

from kernel as much functionality as possible.

Y We’ll revisit the e2e argumentthroughout the course.

Y Paper has more examples.

Page 18: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 18

Communication Models

Y Support for processes to communicateamong themselves.

Y Traditional (centralized) OS’s:– Provide local (within single machine)

communication support.– Distributed OS’s: must provide support for

communication across machineboundaries.X Over LAN or WAN.

Page 19: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 19

Communication ParadigmsY 2 paradigms

– Message Passing (MP)

– Distributed Shared Memory (DSM)

Y Message Passing– Processes communicate by sending

messages.

Y Distributed Shared Memory– Communication through a “virtual shared

memory”.

Page 20: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 20

Message PassingY Basic communication primitives:

– Send message.

– Receive message.

Y Modes of communication:– Synchronous versus asynchronous.

Y Semantics:– Reliable versus unreliable.

...Send

Sending Q

...

Receiving QReceive

Page 21: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 21

Synchronous Communication

Y Blocking send: blocks until message istransmitted (out of the send queue).

Y Blocking receive: blocks until messagearrives in the receive queue.

Page 22: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 22

Asynchronous Communication

Y Non-blocking send: sending processcontinues as soon message is queued.

Y Blocking or non-blocking receive:– Blocking: timeout or threads.

– Non-blocking: proceeds while waiting formessage.X Message is queued upon arrival.X Process needs to poll or be interrupted.

Page 23: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 23

Reliability 1

Y Unreliable communication:– Aka, best effort, ie, “send and hope for the

best”.– No ACKs or retransmissions.– Application programmer must design their

own reliability mechanism.– Example: User Datagram Protocol (UDP)

X Applications using UDP either don’t need reliabilityor build their own (e.g., UNIX NFS and DNS (bothUDP and TCP))

Page 24: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 24

Reliability 2Y Reliable communication:

– Different degrees of reliability.– Processes have some guarantee that

messages will be delivered.– Example: Transmission Control Protocol

(TCP)– Reliability mechanisms:

X Positive acknowledgments (ACKs).X Negative Acknowledgments (NACKs).

– Possible to build reliability atop unreliableservice (E2E argument).

Page 25: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 25

Distributed Shared Memory

Y Motivated by development of shared-memory multiprocessors which doshare memory.

Y Abstraction used for sharing dataamong processes running on machinesthat do not share memory.

Y Processes think they read from andwrite to a “virtual shared memory”.

Page 26: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 26

DSM 2

Y Primitives: read and write.

Y OS ensures that all processes (may bein different machines) sees all updates.– Happens transparently to processes.

Page 27: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 27

DSM and MPY DSM is an abstraction!

– Gives programmers the flavor of a centralizedmemory system, which is a well-knownprogramming environment.

– No need to worry about communication andsynchronization.

Y But, it is implemented atop MP.– No real shared memory.

– OS takes care of required communication.

Page 28: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 28

Caching in DSMY For performance, DSM caches data

locally.– More efficient access (locality).

– But, must keep caches consistent.– Caching of pages in case of page-based

DSM.

Y Issues:– Page size.

– Consistency mechanism.

Page 29: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 29

Approaches to DSM 1

Y Hardware-based:– Multi-processor architectures with

processor-memory modules connected byhigh-speed LAN (E.g., Stanford’s DASH).

– Specialized hardware to handle reads andwrites and perform required consistencymechanisms.

Page 30: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 30

Approaches to DSM 2

Y Paged-based:– Example: IVY.

– DSM implemented as region of processor’svirtual memory; occupies same addressspace range for every participatingprocess.

– OS keeps DSM data consistency as part ofpage fault handling.

Page 31: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 31

Approaches to DSM 3

Y Library-based:– Or language-based.

– Example: Linda.– Language or language extensions.

– Compiler inserts appropriate library callswhenever processes access DSM items.

– Library calls access local data andcommunicate when necessary.

Page 32: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 32

Remote Procedure Call

Y Builds on MP.

Y Main idea: extend traditional (local)procedure call to perform transfer ofcontrol and data across network.

Y Easy to use: analogous to local calls.

Y But, procedure is executed by adifferent process, probably on adifferent machine.

Y Fits very well with client-server model.

Page 33: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 33

RPC Mechanism1. Invoke RPC.2. Calling process suspends.3. Parameters passed across network to

target machine.4. Procedure executed remotely.5. When done, results passed back to caller.6. Caller resumes execution.Is this synchronous or asynchronous?

Page 34: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 34

RPC Advantages

Y Easy to use.

Y Well-known mechanism.

Y Abstract data type– Client-server model.– Server as collection of exported

procedures on some shared resource.– Example: file server.

Y Reliable.

Page 35: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 35

RPC Semantics 1

Y Related to delivery guarantees.

Y “Maybe call”:– Clients cannot tell for sure whether remote

procedure was executed or not due tomessage loss, server crash, etc.

– Usually not acceptable.

Page 36: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 36

RPC Semantics 2

Y “At-least-once” call– Remote procedure executed at least once,

but maybe more than once.– Retransmissions but no duplicate filtering.– Idempotent operations OK; e.g., reading

data that is read-only.

Page 37: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 37

RPC Semantics 3

Y “At-most-once” call– Most appropriate for non-idempotent

operations.– Remote procedure executed 0 or 1 time,

ie, exactly once or not at all.

– Use of retransmissions and duplicatefiltering.

– Example: Birrel et al. Implementation.X Use of pobes to check if server crashed.

Page 38: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 38

RPC Implementation 1

Page 39: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 39

RPC Implementation 2

Y RPC runtime mechanism responsiblefor retransmissions, acknowledgments.

Y Stubs responsible for data packagingand unpackaging; also known asmarshalling and unmarshalling: puttingdata in form suitable for transmission.Example: Sun’s XDR.

Page 40: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 40

BindingY How to determine where server is? Which

procedure to call?– “Resource discovery” problem

X Name service: advertises servers and services.X Example: Birrel et al uses Grapevine.

Y Early versus late binding.– Early: server address and procedure name

hardcoded in client.

– Late: go to name service.

Page 41: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 41

RPC PerformanceY Sources of overhead

– data copying

– scheduling and context switch.

Y Light-Weight RPC– Shows that most invocations took place on a

single machine.– LW-RPC: improve RPC performance for local

case.

– Optimizes data copying and threadscheduling for local case.

Page 42: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 42

LW-RPC 1

Y Argument copying– RPC: 4 times (2 on call and 2 on return);

copying between kernel and user space.

– LW-RPC: common data area (A-stack)shared by client and server and used topass parameters and results; access byclient or server, one at a time.

Page 43: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 43

LW-RPC 2

Y A-stack avoids copying between kerneland user spaces.

Y Client and server share the same thread:less context switch (like regular calls).

A

Page 44: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 44

DSM Case Studies: IVY

Y Environment:”loosely coupled”multiprocessor.– Memory is physically distributed.

– Memory mapping managers (OS kernel):X Map local memories to shared virtual space.X Local memory as cache of shared virtual

space.X Memory reference may cause page fault; page

retrieved and consistency handled.

Page 45: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 45

IVY

Y Issues:– Read-only versus writable data.

– Locality of reference.– Granularity (1 Kbyte page size).

X Bigger pages versus smaller pages.

Page 46: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 46

IVY

Y Memory coherence strategies:– Page synchronization

X InvalidationX Write broadcast

– Page ownershipX Fixed: page always owned by same processorX Dynamic

Page 47: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 47

IVY Page Synchronization

Y Invalidation:– On write fault, invalidate all copies; give

faulting process write access; gets copy ofpage if not already there.

– Problem: must update page on reads.

Y Write broadcast:– On write fault, fault handler writes to all

copies.

– Expensive!

Page 48: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 48

IVY Memory Coherence

Y Paper discusses approaches to memorycoherence in page-based DSM.– Centralized: single manager residing on a

single processor managing all pages.

– Distributed: multiple managers on multipleprocessors managing subset of pages.

Page 49: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 49

DSM Case Studies: LindaY Language-based approach to DSM.

Y Environment:– Similar to IVY, ie, loosely coupled

machines connected via fast broadcastbus.

– Instead of shared address space,processes make library calls inserted bycompiler when accessing DSM.

– Libraries access local data andcommunicate to maintain consistency.

Page 50: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 50

Linda

Y DSM: tuple space.

Y Basic operations:– out (data): data added to tuple space.– in (data): removes matching data from TS;

destructive.

– read (data): same as “in”, but tuple remainsin TS (non-destructive).

Page 51: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 51

Linda Primitives: Examples

Y out (“P”, 5, false) : tuple (“P”, 5, false)added to TS.– “P” : name

– Other components are data values.– Implementation reported on the paper:

every node stores complete copy of TS.

– out (data) causes data to be broadcast toevery node.

Page 52: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 52

Linda Primitives: Examples

Y in (“P”, int I, bool b): tuple (“P”, 5, false)removed from TS.– If matching tuple found locally, local kernel

tries to delete tuple on all nodes.

Page 53: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 53

CSci555: Lecture 3 September 17, 1999Concurrency, Deadlock, Transactions, Time.

Dr. Katia ObraczkaUniversity of Southern CaliforniaInformation Sciences Institute

Page 54: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 54

Administration

Y Reading assignment due next class.– Concerns regarding grading.

Y Tanenbaum book.– Recommended reading for background

material.

– Not required (ie, you don’t have to buy it ifyou don’t want to).

Page 55: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 55

Last Class

Y E2E argument.

Y Communication Models.– MP.– DSM.

– RPC.– IVY.

·Linda.

Page 56: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 56

Today

Y Linda.

Y Concurrency and Synchronization [Hauseret al.]

Y Transactions [Spector et al.]

Y Distributed Deadlocks [Chandy et al.]

Y Time in Distributed Systems [Lamport andJefferson]

Y Replication [Birman and Gifford]

Page 57: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 57

DSM Case Studies: Linda

Y Language-based approach to DSM.Y Environment:

– Similar to IVY, ie, loosely coupledmachines connected via fast broadcastbus.

– Instead of shared address space,processes make library calls inserted bycompiler when accessing DSM.

– Libraries access local data andcommunicate to maintain consistency.

Page 58: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 58

Linda

Y DSM: tuple space.

Y Basic operations:– out (data): data added to tuple space.– in (data): removes matching data from TS;

destructive.

– read (data): same as “in”, but tuple remainsin TS (non-destructive).

Page 59: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 59

Linda Primitives: Examples

Y out (“P”, 5, false) : tuple (“P”, 5, false)added to TS.– “P” : name

– Other components are data values.– Implementation reported on the paper:

every node stores complete copy of TS.

– out (data) causes data to be broadcast toevery node.

Page 60: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 60

Linda Primitives: Examples

Y in (“P”, int I, bool b): tuple (“P”, 5, false)removed from TS.– If matching tuple found locally, local kernel

tries to delete tuple on all nodes.

Page 61: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 61

Concurrency Control and Synchronization

Y How to control and synchronize possiblyconflicting operations on shared data byconcurrent processes?

Y First, some terminology.– Processes.

– Threads.– Tasks.

Page 62: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 62

Processes

Y Text book, page 165– Processing activity associated with an

execution environment, ie, address spaceand resources (such as communication andsynchronization resources).

Page 63: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 63

ThreadsY OS abstraction of an activity/task.

– Execution environment expensive to createand manage.

– Multiple threads share single executionenvironment.

– Single process may spawn multiple threads.– Maximize degree of concurrency among

related activities.

– Example: multi-threaded servers allowconcurrent processing of client requests.

Page 64: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 64

Other Terminology

Y Process versus task.– Process: heavy-weight unit of execution.

– Task/thread: light-weight unit of execution.– Table with other terminology page 166 of

textbook.

Page 65: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 65

Threads Case Study 1

Y Hauser et al.

Y Examine use of user-level threads in 2OS’s:– Xerox Parc’s Cedar (research).

– GVX (commercial version of Cedar).

Y Study dynamic thread behavior.– Number of threads.– Thread lifetime.

Page 66: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 66

Thread Paradigms

Y Different categories of usage:– Defer work: thread does work not vital to the

main activity.X Examples: printing a document, sending mail.

– Pumps: used in pipelining; use output of athread as input and produce output to beconsumed by another task.

– Sleepers: tasks that repeatedly wait for anevent to execute; e.g., check for networkconnectivity every x seconds.

Page 67: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 67

Synchronization

Y So far, how one defines/activatesconcurrent activities.

Y But how to control access to shareddata and still get work done?

Y Synchronization via:– Shared data [DSM model].– Communication [MP model].

Page 68: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 68

Synchronization by Shared Data

Y Primitives– Semaphores.

– Condition critical regions.– Monitors.

flexibilitystructure

Page 69: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 69

Synchronization by MP.

Y Explicit communication.

Y Primitives: send and receive.

Y Mailboxes.

Page 70: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 70

Transactions 1

Y Database term.– Execution of program that accesses a

database.

Y In distributed systems,– Concurrency control in the client/server

model.– From client’s point of view, sequence of

operations executed by server in servicingclient’s request.

Page 71: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 71

Transaction Atomicity

Y “All or nothing”.

Y Sequence of operations to serviceclient’s request are performed in onestep, ie, either all of them are executedor none are.

Y Issues:– Multiple concurrent clients: “isolation”.

– Server failures: “failure atomicity”.

Page 72: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 72

Transaction Features

Y Recoverability: server should be able to“roll back” to state before transactionexecution.

Y Serializability: transactions have sameeffect whether executed concurrently orsequentially.

Y Durability: effects of transactions arepermanent.

Page 73: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 73

Local versus Distributed Transactions

Y Local transactions:– All transaction operations executed by

single server.

Y Distributed transactions:– Involve multiple servers.

Y Both local and distributed transactionscan be simple or nested.– Nesting: increase level of concurrency.

Page 74: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 74

Concurrency Control

Y Maintain transaction serializability.

Y Server operations: read or write.

Y 3 mechanisms:– Locks.– Optimistic concurrency control.

– Timestamp ordering.

Page 75: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 75

Locks 1

Y Order transactions accessing shareddata based on order of access to data.

Y Lock granularity: affects level ofconcurrency.– 1 lock per shared data item.

– Read (shared) and write locks.– Lock compatibility (page 381).

Page 76: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 76

Lock Implementation

Y Server lock manager– Maintains table of locks for server data

items.– Lock and unlock operations.– Clients wait on a lock for given data until

data is released; then client is signalled.– Each client’s request runs as separate

server thread.

Page 77: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 77

Deadlock

Y Use of locks can lead to deadlock.

Y Deadlock: each transaction waits foranother transaction to release a lockforming a wait cycle.

Y Deadlock condition: cycle in the wait-forgraph.

Y Deadlock prevention and detection.

Y Deadlock resolution: lock timeout.

Page 78: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 78

Optimistic Concurrency Control 1

Y Assume that most of the time,probability of conflict is low.

Y Transactions allowed to proceed inparallel until ready to commit.

Y When client issues close transaction,check whether there was conflict; if so,some transactions aborted.

Page 79: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 79

Optimistic Concurrency 2Y Read phase

– Transactions have tentative version of dataitems it accesses.

– Tentative versions allow transactions to abortwithout making their effect permanent.

Y Validation phase– Executed upon close transaction.– Checks serially equivalence.

– If validation fails, conflict resolution decideswhich transaction(s) to abort.

Page 80: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 80

Optimistic Concurrency 3

Y Write phase– If transaction is validated, all of its tentative

versions are made permanent.– Read-only transactions commit

immediately.

– Write transactions commit only after theirtentative versions are recorded inpermanent storage.

Page 81: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 81

Timestamp Ordering

Y Uses timestamps to order transactionsaccessing same data items according totheir starting times.

Y Assigning timestamps:– Clock based.

– Monotonically increasing counter.

Page 82: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 82

Distributed Transactions 1

c

s1

s2

s3

1

2

3

Simple DistributedTransaction

c

s1 s112

s121

s111

s11

s12

Nested DistributedTransaction

Page 83: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 83

Distributed Transactions 2

Y Transaction coordinator– First server contacted by client.

– Responsible for aborting/commiting.– Adding workers.

Y Workers– Other servers involved.

Page 84: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 84

Atomicity in Distributed Transactions

Y Harder: several servers involved.

Y Atomic commit protocols– 1-phase commit

X Example: coordinator sends “commit” or “abort”to workers; keeps re-broadcasting until it getsACK from all of them that request wasperformed.

X Inefficient.X Client makes decision to commit which may not

be feasible.

Page 85: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 85

2-Phase Commit 1

Y First phase: voting– Each server votes to commit or abort

transaction.

Y Second phase: carrying out jointdecision.– If any server votes to abort, joint decision is

to abort.

Page 86: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 86

2-Phase Commit 2

Y Issues:– Ensure all servers vote and reach same

decision despite of failures.– When vote to commit, ensure server will

commit even in the event of failure.Coordinator Workers

1. Prepared to commit?2. Prepared tocommit.3. Committed.

4. Committed.5. Done.

Can commit?

YesDo commit

Have committed

Page 87: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 87

Concurrency Control in Distributed Transactions 1

Y Locks– Each server manages locks for its own

data.– Decides whether to grant lock or make

transaction wait.

– Locks cannot be released until transactioncommitted or aborted on all serversinvolved.

– Distributed deadlocks!

Page 88: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 88

Concurrency Control in Distributed Transactions 2

Y Timestamp Ordering– Globally unique timestamps.

X First server issues globally unique TS andpasses it around.

X TS: <server id, local TS>

– Servers must ensure same order ofoperations on all servers.

– Clock synchronization issues

Page 89: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 89

Concurrency Control in Distributed Transactions 3

Y Optimistic concurrency control– Makes even more sense?

– Validation phase occurs during 1st. phaseof 2-phase commit.

Page 90: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 90

Camelot [Spector et al.]

Y Supports execution of distributedtransactions.

Y Specialized functions:– Disk management

X Allocation of large contiguous chunks.

– Recovery managementX Transaction abort and failure recovery.

– Transaction managementX Abort, commit, and nest transactions.

Page 91: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 91

Distributed Deadlock 1

Y When locks are used, deadlock can occur.

Y Circular wait in wait-for graph meansdeadlock.

Y Centralized deadlock detection, prevention,and resolutions schemes.– Examples:

X Detection of cycle in wait-for graph.X Lock timeouts: hard to set TO value, aborting

unnecessarily.

Page 92: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 92

Distributed Deadlock 2

Y Much harder to detect, prevent, andresolve. Why?– No global view.

– No central agent.– Communication-related problems

X Unreliability.X Delay.X Cost.

Page 93: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 93

Distributed Deadlock Detection

Y Cycle in the global wait-for graph.

Y Global graph can be constructed fromlocal graphs: hard!– Servers need to communicate to find

cycles.

Y Example from book (page 425).

Page 94: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 94

Distributed Deadlock Detection Algorithms 1

Y [Chandy et al.]

Y Message sequencing is preserved.

Y Resource versus communication models.– Resource model

X Processes, resources, and controllers.X Process requests resource from controller.

– Communication modelX Processes communicate directly via messages

(request, grant, etc) requesting resources.

Page 95: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 95

Resource versus Communication Models

Y In resource model, controllers aredeadlock detection agents; incommunication model, processes.

Y In resource model, process cannotcontinue until all requested resourcesgranted; in communication model, processcannot proceed until it can communicatewith at least one process it’s waiting for.

Y Different models, different detection alg’s.

Page 96: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 96

Distributed Deadlock Detection Schemes

Y Graph-theory based.

Y Resource model: deadlock when cycleamong dependent processes.

Y Communication model: deadlock whenknot (all vertices that can be reachedfrom i can also reach i) of waitingprocesses.

Page 97: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 97

Deadlock Detection in Resource Model

Y Use probe messages to follow edges ofwait-for graph (aka edge chasing).

Y Probe carries transaction wait-forrelations representing path in globalwait-for graph.

Page 98: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 98

Deadlock Detection Example1. Server 1 detects transaction T is waiting for U,

which is waiting for data from server 2.2. Server 1 sends probe T->U to server 2.3. Server 2 gets probe and checks if U is also waiting;

if so (say for V), it adds V to probe T->U->V. If V iswaiting for data from server 3, server 2 forwardsprobe.

4. Paths are built one edge at a time.Before forwarding probe, server checks for cycle (e.g.,

T->U->V->T).5. If cycle detected, a transaction is aborted.

Page 99: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 99

Replication 1Y Keep more than one copy of data item.

Y Technique for improving performance indistributed systems.

Y In the context of concurrent access todata, replicate data for increaseavailability.– Improved response time.

– Improved availability.– Improved fault tolerance.

Page 100: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 100

Replication 2

Y But nothing comes for free.

Y What’s the tradeoff?– Consistency maintenance.

Y Consistency maintenance approaches:– Lazy consistency (gossip approach).

– Quorum consensus.– Process group.

Stronger consistency

Page 101: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 101

Quorum Consensus

Y Goal: prevent partitions from fromproducing inconsistent results.

Y Quorum: subgroup of replicas whose sizegives it the right to carry out operations.

Y Quorum consensus replication:– Update will propagate successfully to a

subgroup of replicas.– Other replicas will have outdated copies but

will be updated off-line.

Page 102: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 102

Weighted Voting [Gifford] 1

Y Every copy assigned a number of votes(weight assigned to a particular replica).

Y Read: Must obtain R votes to read fromany up-to-date copy.

Y Write: Must obtain write quorum of Wbefore performing update.

Page 103: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 103

Weighted Voting 2

Y W > 1/2 total votes, R+W > total votes.

Y Ensures non-null intersection betweenevery read quorum and write quorum.

Y Read quorum guaranteed to havecurrent copy.

Y Freshness is determined by versionnumbers.

Page 104: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 104

Weighted Voting 3Y On read:

– Try to find enough copies, ie, total votes noless than R. Not all copies need to be current.

– Since it overlaps with write quorum, at leastone copy is current.

Y On write:– Try to find set of up-to-date replicas whose

votes no less than W.

– If no sufficient quorum, current copies replaceold ones, then update.

Page 105: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 105

ISIS 1

Y Goal: provide programmingenvironment for development ofdistributed systems.

Y Assumptions:– DS as a set of processes with disjoint

address spaces, communicating over LANvia MP.

– Processes and nodes can crash.

– Partitions may occur.

Page 106: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 106

ISIS 2

Y Distinguishing feature: groupcommunication mechanisms– Process group: processes cooperating in

implementing task.

– Process can belong to multiple groups.– Dynamic group membership.

Page 107: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 107

Virtual Synchrony

Y Real synchronous systems– Events (e.g., message delivery) occur in

the same order everywhere.– Expensive and not very efficient.

Y Virtual synchronous systems– Illusion of synchrony.– Weaker ordering guarantees when

applications allow it.

Page 108: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 108

ISIS Features

Y Atomic multicast– One or none.

– As long as 1 member is up, message willbe delivered to remaining members.

– FBCAST: unordered multicast.

– CBCAST: causally ordered multicast.X “Happen before” order.

Page 109: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 109

Time in Distributed Systems

Y Notion of time is critical.

Y “Happen before” notion.– Example: concurrency control using TS’s.

Page 110: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 110

Atomic Multicast 1

Y One or none.

Y As long as 1 member is up, message willbe delivered to remaining members.

Y FBCAST: unordered multicast.

Y CBCAST: causally ordered multicast.– “Happened before” order.

– Messages from given process in order.– UDP+ACKs+reXmissions.

Page 111: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 111

Atomic Multicast 2

Y ABCAST: totally ordered multicast.– Ordered delivery of messages across

processes.– Uses token site to serialize messages.

Page 112: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 112

Other Features

Y Process groups– Group membership management.

Y Broadcast and group RPC– RPC-like interface to CBCAST, ABCAST,

and FBCAST protocols.

– Delivery guaranteesX Caller indicates how many responses required.

– No responses: asynchronous.– 1 or more: synchronous.

Page 113: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 113

Implementation

Y Set of library calls on top of UNIX.

Y Commercially available.

Y In the paper, example of distributed DBimplementation using ISIS.

Y HORUS: extension to WANs.

Page 114: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 114

Time in Distributed Systems

Y Notion of time is critical.

Y “Happened before” notion.– Example: concurrency control using TS’s.– “Happened before” notion is not

straightforward in distributed systems.X No guarantees of synchronized clocks.X Communication latency.

Page 115: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 115

Event Ordering

Y Lamport defines partial ordering:1. If 2 events occurred in the same process,

then they occurred in the order observed.2. Whenever message sent between 2

processes, event sending messageoccurred before event receiving message.

3. If X->Y and Y->Z, then X->Z.

Page 116: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 116

Causal Ordering

Y “Happened before” also called causalordering.

Y Example: text book page 298.

Y In summary, possible to drawhappened-before relationship betweenevents if they happen in same processor there’s chain of messages betweenthem.

Page 117: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 117

Logical Clocks

Y Monotonically increasing counter.

Y No relation with real clock.

Y Each process keeps its own logicalclock Cp used to timestamp events.

Page 118: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 118

Causal Ordering and Logical Clocks

1. Cp incremented before each event. Cp=Cp+1.2. When p sends message m, it

piggybacks t=Cp.3. When q receives (m, t), it computes: Cq=max(Cq, t) before timestamping

message receipt event.Example: text book page 299.

Page 119: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 119

Total Ordering

Y Extending partial to total order.

Y Global timestamps:– (Ta, pa), where Ta is local TS and pa is the

process id.– (Ta, pa) < (Tb, pb) iff Ta < Tb or

Ta=Tb and pa<pb

– Total order consistent with partial order.

Page 120: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 120

Virtual Time [Jefferson] 1

Y Time warp mechanism.

Y May or may not have connection withreal time.

Y Uses optimistic approach, ie, eventsand messages are processed in theorder received: “look-ahead”.

Page 121: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 121

Virtual Time 2

Y Process virtual clock (local virtual clock)set to TS of next message in inputqueue.

Y If next message’s TS is in the past,rollback!– Can happen due to different computation

rates and communication latency.

Page 122: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 122

Rolling Back

Y Process goes back to TS(latestmessage).

Y Cancels all intermediate effects.

Y Then, executes forward.

Y Rolling back is expensive!– Messages may have been sent to other

processes causing them to sendmessages, etc.

Page 123: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 123

Anti-Messages 1Y For every message, there is an anti-

message with same content but differentsign.

Y When sending message, message goesto receiver input queue and a copy with “-”sign is enqueued in the sender’s outputqueue.

Y Message is retained for use in case of rollback.

Page 124: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 124

Anti-Message 2

Y Message + its antimessage = 0 when inthe same queue.

Y Processes must keep log to “undo”operations.

Page 125: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 125

Implementation

Y Local control.

Y Global control– How to make sure system as a whole

progresses.– Errors and I/O when roll back occurs.

– Avoid running out of memory.

Page 126: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 126

Global Virtual Clock

Y Snapshot of system at given real time.

Y Minimum of all local virtual times.

Y Lower bound on how far processes rollback.

Y Purge state before GVT.

Y GVT computed concurrently with rest oftime warp mechanism.

Page 127: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 127

CSci555: Advanced Operating SystemsLecture 4 - September 24, 1999

Dr. Clifford NeumanUniversity of Southern CaliforniaInformation Sciences Institute

Page 128: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 128

Naming Concepts

Y Name– What you call something

Y Address– Where it is located

Y Route– How one gets to it

Y But it is not that clear anymore, it depends onperspective. A name from one perspectivemay be an address from another.– Perspective means layer of abstraction

What is http://www.isi.edu/people/bcn ?

Page 129: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 129

What are the things we name

Y Users– To direct, and to identify

Y Hosts (computers)– High level and low level

Y Services– Service and instance

Y Files and other “objects”– Content and repository

Y Groups– Of any of the above

Page 130: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 130

How we name things

Y Host-Based Naming– Host-name is required part of object name

Y Global Naming– Must look-up name in global database to find address– Name transparency

Y User/Object Centered Naming– Namespace is centered around user or object

Y Attribute-Based Naming– Object identified by unique characteristics– Related to resource discovery / search / indexes

Page 131: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 131

Namespace

Y A name space maps:

Σ∗ → X ε O

At a particular point in time.

Y The rest of the definition, and even some ofthe above, is open to discussion/debate.

Y What is a “flat namespace”– Implementation issue

Page 132: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 132

Case Studies

Y Host Table– Flat namespace (?)– Global namespace (?)

Y Domain name system– Arbitrary depth– Iterative or recursive(chained) lookup– Multi-level caching

Y Grapevine– Two-level, iterative lookup– Clearinghouse 3 level

GetHostByName(usc.arpa){ scan(host file); return(matching entry); }

Page 133: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 133

Case Studies

Y Host Table– Flat namespace (?)– Global namespace (?)

Y Domain name system– Arbitrary depth– Iterative or recursive(chained) lookup– Multi-level caching

Y Grapevine– Two-level, iterative lookup– Clearinghouse 3 level

GetHostByName(host.sc)

esgv

gvsc

Page 134: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 134

Case Studies

Y Host Table– Flat namespace (?)– Global namespace (?)

Y Domain name system– Arbitrary depth– Iterative or recursive(chained) lookup

Y Grapevine– Two-level, iterative lookup– Clearinghouse 3 level

GetHostByName(aludra.usc.edu)

/ edu usc.edu

Page 135: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 135

Scalability of naming

Y Scalability– Ability to continue to operate efficiently as a system grows large,

either numerically, geographically, or administratively.

Y Affected by– Frequency of update– Granularity– Evolution/reconfiguration

Y DNS characteristics– Multi-level implementation– Replication of root and other servers– Multi-level caching

Page 136: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 136

Other implementations of naming

Y Broadcast– Limited scalability, but faster local response

Y Prefix tables– Essentially a form of caching

Y Capabilities– Combines security and naming– Traditional name service built over capability

based addresses

Page 137: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 137

Closure

Y Closure binds an object to the namespace withinwhich names embedded in the object are to beresolved.– Namespace is dynamic– “Object” may as small as the name itself

Page 138: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 138

Advanced Name Systems

Y DEC’s Global Naming– Support for reorganization the key idea– Little coordination needed in advance

Y Prospero Directory Service– Multiple namespace centered around a “root” node that

is specific to each namespace.– Closure binds objects to this “root” node.– Used today as an embedded directory service.

Page 139: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 139

Non Hierarchical Naming

Y Examples– Prospero– The Web

Y User centered naming– Prospero– Tilde– Quicksilver

Page 140: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 140

Resource Discovery

Y Similar to naming– Browsing related to directory services– Indexing and search similar to attribute based naming

Y Attribute based naming– Profile– Multi-structured naming

Y Search engines

Page 141: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 141

Resource Discovery

Y Similar to naming– Browsing related to directory services– Indexing and search similar to attribute based naming

Y Object handles– Uniform Resource Locators (URL’s)

X Is it a name or an address?

– Uniform Resource Names (URN’s)X Is a directory service required

Page 142: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 142

CSci555: Advanced Operating SystemsLecture 5 - October 1, 1999

Dr. Clifford NeumanUniversity of Southern CaliforniaInformation Sciences Institute

Page 143: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 143

The Web

Y Object handles– Uniform Resource Locators (URL’s)

X Is it a name or an address?

– Uniform Resource Names (URN’s)X Is a directory service required

– How URL’s are misused

Y XML– Definitions provide a form of closure

X Conceptual level rather than the “namespace” level.

Page 144: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 144

Security Goals

Y Protection– Enforced by hardware– Depends on trusted kernel

Y Authentication– Determining identity of principal

Y Integrity– Authenticity of document– That it hasn’t changes

Y Privacy– That inappropriate information is not disclosed

Page 145: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 145

Security Policy

YAccess Matrix

–implemented as:XCapabilities or

XAccess Control list

Subject O BJ1 O BJ2b cn RW Rgos t-g rou p RW -ob raczk a R RWtyao R RCsci555 R -

Page 146: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 146

Access Control Lists

YAdvantages– Easy to see who has access– Easy to change/revoke access

YDisadvantages– Time consuming to check access

YExtensions to ease management– Groups– EACLs

Page 147: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 147

Extended Access Control ListsY Conditional authorization

– Implemented as restrictions on ACL entries andembedded as restrictions in authentication andauthorization credentials

P rincipa l R ight s Cond it ionsb cn RW H W -A u th en t ica tion

Reta in O ld Item sgos t-g rou p RW TIM E: 9A M -5PM

au th o rizationserve r

R D eleg a ted -A ccess

* R Load Lim it 8U se: N on -C om m ercia l

* R Paym en t: $P r ice

Page 148: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 148

Generic Authorization and Access-control API

� Allows applications to use the securityinfrastructure to implement security policies.

gaa_get_object_eacl function called before other GAA APIroutines, each of which requires a handle to object EACL to identifyEACLs on which to operate.

gaa_check_authorization function tells application whetherrequested operation is authorized, or if additional application specificchecks are required

Application

GAA API

input

output

gaa_get_ object_eacl

gaa_check_authorization

Yes,no,maybe

SC,obj_id,op

Page 149: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 149

Capabilities

YAdvantages– Easy and efficient to check access– Easily propagated

YDisadvantages– Hard to protect capabilities– Easily propagated– Hard to revoke

YHybrid approach–EACL’s/proxies

Page 150: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 150

Protecting capabilities

YStored in TCB– Only protected calls manipulate

YLimitations ?– Works in centralized systems

YDistributed Systems– Tokens with random or special coding– Possibly protect through encryption– How does Amoeba do it? (claimed)

Page 151: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 151

Basic Security Services

Y Authentication

Y Authorization

Y Accounting

Y Audit

Y Assurance

Y Payment

Y Protection

Y PolicyY Privacy

Y Confidentiality

Page 152: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 152

Network Threats

–Unauthorized release of data–Unauthorized modification of data–Spurious association initiation

XImpersonation

–Denial of use–Traffic analysis

YAttacks may be–Active or passive

Page 153: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 153

Likely points of attack (location)

Page 154: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 154

Likely points of attack (module)

Y Against the protocols– Sniffing for passwords and credit card numbers– Interception of data returned to user– Hijacking of connections

Y Against the server– The commerce protocol is not the only way in– Once an attacker is in, all bets are off

Y Against the client’s system– You have little control over the client’s system

Page 155: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 155

Network Attacks

EavesdroppingListening for passwords or credit card numbers

Message stream modificationChanging links and data returned by server

HijackingKilling client and taking over connection

C SAttacker

Page 156: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 156

Network Attack Countermeasures

Don’t send anything importantNot everything needs to be protected

EncryptionFor everything elseMechanism limited by client side software

C SAttacker

Page 157: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 157

Encryption for confidentiality and integrity

Y Encrypton used to scramble data

PLAINTEXT PLAINTEXTCIPHERTEXT

ENCRYPTION(KEY)

DECRYPTION(KEY)

++

Page 158: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 158

Authentication

Y Proving knowledge of encryption key– Nonce = Non repeating value

{Nonce or timestamp}Kc

C S

Page 159: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 159

CSci555: Advanced Operating SystemsLecture 6 - October 8, 1999

Dr. Clifford NeumanUniversity of Southern CaliforniaInformation Sciences Institute

Page 160: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 160

Key distribution

Y Conventional cryptography– Single key shared by both parties

Y Public Key cryptography– Public key published to world

– Private key known only by owner

Y Third party certifies or distributes keys– Certification infrastructure

– Authentication

Page 161: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 161

Authentication w/ Conventional Crypto

Y Kerberos

KDC1

2

SC3

or Needham Schroeder

,4,5

Page 162: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 162

Authentication w/ PK Crypto

Y Based on public key certificates

DS

1

2

SC

3

Page 163: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 163

KerberosThird-party authentication service

– Distributes session keys for authentication, confidentiality,and integrity

TGS

4. Ts+{Reply}Kt

3. TgsReq

KDC

1. Req2. T+{Reply}Kc

C S5. Ts + {ts}Kcs

Page 164: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 164

Public Key Cryptography (revisited)

Y Key Distribution– Confidentiality not needed for public key– Solves n2 problem

Y Performance– Slower than conventional cryptography– Implementations use for key distribution, then use

conventional crypto for data encryption

Y Trusted third party still needed– To certify public key– To manage revocation– In some cases, third party may be off-line

Page 165: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 165

Certificate-Based AuthenticationCertification authorities issue signed certificates

– Banks, companies, & organizations like Verisign act as CA’s– Certificates bind a public key to the name of a user– Public key of CA certified by higher-level CA’s– Root CA public keys configured in browsers & other software– Certificates provide key distribution

Authentication steps– Verifier provides nonce, or a timestamp is used instead.– Principal selects session key and sends it to verifier with nonce,

encrypted with principal’s private key and verifier’s public key, andpossibly with principal’s certificate

– Verifier checks signature on nonce, and validates certificate.

Page 166: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 166

Secure Sockets Layer (and TLS)

Encryption support provided betweenBrowser and web server - below HTTP layer

Client checks server certificateWorks as long as client starts with the correct URL

Key distribution supported through cert stepsAuthentication provided by verify steps

C S

Attacker

Hello

Hello + CertS

{PMKey}Ks [CertC + VerifyC ]

VerifyS

Page 167: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 167

Trust models for certification

Y X.509 Hierarchical– Single root (original plan)– Multi-root (better accepted)– SET has banks as CA’s and common SET root

Y PGP Model– “Friends and Family approach” - S. Kent

Y Other representations for certifications

Y No certificates at all– Out of band key distribution– SSH

Page 168: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 168

Global Authentication ServiceY Pair-wise trust in hierarchy

– Name is derived from path followed– Shortcuts allowed, but changes name– Exposure of path is important for security

Y Compared to Kerberos– Transited field in Kerberos - doesn’t change name

Y Compared with X.509– X.509 has single path from root– X.509 is for public key systems

Y Compared with PGP– PGP evaluates path at end, but may have name

conflicts

Page 169: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 169

Capability Based Systems - Amoeba

“Authentication not an end in itself”Y Theft of capabilities an issue

– Claims about no direct access to network– Replay an issue

Y Modification of capabilities a problem– One way functions provide a good solution

Y Where to store capabilities for convenience– In the user-level naming system/directory– 3 columns

Y Where is authentication in Amoeba– To obtain initial capability

Page 170: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 170

Capability Directories in Amoeba

BCN User Group OtherFile1File2users

Katia User Group OtherFile3bcnusers

users User Group OtherBCNKatiatyao

Login CapBCNKatiatyao

Page 171: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 171

Security Architectures

Y DSSA– Delegation is the important issue

X Workstation can act as userX Software can act as workstation - if given keyX Software can act as developer - if checksum validated

– Complete chain needed to assume authority– Roles provide limits on authority - new sub-principal

Y Proxies - Also based on delegation– Limits on authority explicitly embedded in proxy– Works well with access control lists

Page 172: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 172

Distributed AuthorizationY It must be possible to maintain authorization

information separate from the end servers– Less duplication of authorization database– Less need for specific prior arrangement– Simplified management

Y Based on restricted proxies which support– Authorization servers– Group Servers– Capabilities– Delegation

Page 173: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 173

Proxies

Y A proxy allows a second principal tooperate with the rights and privileges ofthe principal that issued the proxy– Existing authentication credentials– Too much privilege and too easily propagated

Y Restricted Proxies– By placing conditions on the use of

proxies, they form the basis of a flexibleauthorization mechanism

Page 174: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 174

Restricted Proxies

Y Two Kinds of proxies– Proxy key needed to exercise bearer proxy– Restrictions limit use of a delegate proxy

Y Restrictions limit authorized operations– Individual objects– Additional conditions

+ ProxyProxyConditions:Use between 9AM and 5PMGrantee is user X, Netmaskis 128.9.x.x, must be able toread this fine print, can you

PROXY CERTIFICATE

Grantor

Page 175: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 175

Authorization and Group Services

1. Authenticated authorization request (operation X)

2. [operation X only]R, {Kproxy} Ksession

3. [operation X only]R, authentication using Kproxy

R1

2

SC3

Page 176: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 176

Central AuthorizationY Authorization server uses

extended ACLs– Conditions are not evaluated, but instead

attached to credentials

Y Groups implemented by auth server– Server grants right to assert group membership

Y Application servers configuredto use authorization server– Minimal local ACL– Can use multiple Authorization servers

Page 177: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 177

Applied Security

YElectronic commerce– SSL Applies authentication and encryption– NetCheque applies proxies– SET applies certification– End system security a major issue

Y What we have today– Firewalls– Web passwords, encryption, certificates– Windows 2000 uses Kerberos

Page 178: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 178

CSci555: Advanced Operating SystemsLecture 7 - October 15, 1999File Systems

Dr. Katia ObraczkaUniversity of Southern CaliforniaInformation Sciences Institute

Page 179: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 179

Outline

Y Time in distributed systems.

Y File systems.

Page 180: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 180

File Systems

Y Provide set of primitives that abstractusers from details of storage accessand management.

Page 181: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 181

Distributed File Systems

Y Promote sharing across machineboundaries.

Y Transparent access to files.

Y Make diskless machines viable.

Y Increase disk space availability byavoiding duplication.

Y Balance load among multiple servers.

Page 182: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 182

Sun Network File System 1Y De facto standard:

– Mid 80’s.– Widely adopted in academia and industry.

Y Provides transparent access to remotefiles.

Y Uses Sun RPC and XDR.– NFS protocol defined as set of procedures

and corresponding arguments.– Synchronous RPC:

X Client blocks until it gets results from server.

Page 183: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 183

Sun NFS 2

Y Stateless server:– Remote procedure calls are self-contained.

– Servers don’t need to keep state aboutprevious requests.X Flush all modified data to disk before returning

from RPC call.

– Robustness.X No state to recover.X Clients retry.

Page 184: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 184

Location TransparencyY Client’s file name space includes

remote files.– Shared remote files are exported by

server.– They need to be remote-mounted by client.

Client/root

vmunix usr

staffstudents

Server 1/root

export

users

joe bob

Server 2/root

nfs

users

ann eve

Page 185: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 185

Achieving Transparency 1

Y Mount service.– Mount remote file systems in the client’s

local file name space.– Mount service process runs on each node

to provide RPC interface for mounting andunmounting file systems at client.

– Runs at system boot time or user logintime.

Page 186: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 186

Achieving Transparency 2Y Automounter.

– Dynamically mounts file systems.– Runs as user-level process on clients

(daemon).– Resolves references to unmounted pathnames

within files managed by the automounter bymounting them on demand.

– Maintains a table of mount points and thecorresponding server(s); sends probes toserver(s).

– Primitive form of replication.

Page 187: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 187

Transparency?

Y Early binding.– Mount system call attaches remote file

system to local mount point.– Client deals with host name once.– But, mount needs to happen before remote

files become accessible.

Page 188: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 188

Other Functions

Y NFS file and directory operations:– read, write, create, delete, getattr, etc.

Y Access control:– File and directory access permissions.

Y Path name translation:– Lookup for each path component.– Caching.

Page 189: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 189

Implementation

UnixFS

NFSclient

VFS

Client

Unix Kernel

NFSserver

UnixFS

VFS

Server

Unix Kernel

Clientprocess

RPC

Page 190: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 190

Virtual File System 1Y VFS added to UNIX kernel.

– Location-transparent file access.– Distinguishes between local and remote access.

Y @ client:– Processes file system system calls to determine

whether access is local (passes it to UNIX FS) orremote (passes it to NFS client).

Y @ server:– NFS server receives request and passes it to local

FS through VFS.

Page 191: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 191

VFS 2

Y If local, translates file handle to internalfile id’s (in UNIX i-nodes).

Y V-node:X If file local, reference to file’s i-node.X If file remote, reference to file handle.

Y File handle: uniquely distinguishes file.

File system id I-node # I-node generation #

Page 192: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 192

NFS Caching

Y File contents and attributes.

Y Client versus server caching.

Client Server

$ $

Page 193: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 193

Server CachingY Read:

– Same as UNIX FS.

– Caching of file pages and attributes.– Cache replacement uses LRU.

Y Write:– Write through (as opposed to delayed

writes of conventional UNIX FS). Why?

– [Delayed writes: modified written to diskwhen buffer space needed, sync operation(every 30 sec), file close].

Page 194: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 194

Client Caching 1

Y Timestamp-based cache invalidation.

Y Read:– Cached entries have TS with last-modified

time.– Blocks assumed to be valid for TTL.

X TTL specified at mount time.X Typically 3 sec for files.

Page 195: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 195

Client Caching 2

Y Write:– Modified pages marked and flushed to

server at file close or sync (every 30 sec).

Y Consistency?– Not always guaranteed!

– E.g., client modifies file; delay formodification to reach servers + 3-secwindow for cache validation from clientssharing file.

Page 196: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 196

Cache Validation

Y Validation check performed when:– First reference to file after TTL expires.

– File open or new block fetched from server.

Y Done for all files (even if not being shared).

Y Expensive!– Potentially, every 3 sec get file attributes.– If needed invalidate all blocks.

– Fetch fresh copy when file is next accessed.

Page 197: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 197

The Sprite File System 1

Y Main memory caching on both clientand server.

Y Write-sharing consistency guarantees.

Y Variable size caches.– VM and FS negotiate amount of memory

needed.– According to caching needs, cache size

changes.

Page 198: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 198

Sprite 2

Y Sprite supports concurrent writes bydisabling caching of write-shared files.– If file shared, server notifies client that has

file open for writing to write modified blocksback to server.

– Server notifies all client that have file openfor read that file is no longer cacheable;clients discard all cached blocks, soaccess goes throu server.

Page 199: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 199

Sprite 3

Y Sprite servers are stateful.– Need to keep state about current

accesses.– Centralized points for cache consistency.

X Bottleneck?X Single point of failure?

Y Tradeoff: consistency versusperformance.

Page 200: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 200

Andrew

Y Distributed computing environmentdeveloped at CMU.

Y Campus wide computing system.– Between 5 and 10K workstations.

– 1991: |~ 800 ws’s, 40 servers.

Page 201: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 201

Andrew FS

Y Goals:– Information sharing.

– Scalability.X Key strategy: caching of whole files at client.X Whole file serving

– Entire file transferred to client.

X Whole file caching– Local copy of file cached on client’s local disk.– Survive client’s reboots and server unavailability.

Page 202: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 202

Whole File Caching

Y Local cache contains several mostrecently used files.

S(1)open<file>?

(2) open<file>

C

(5) file

(3)

(4)(6)

file

- Subsequent operations on file applied to local copy.- On close, if file modified, sent back to server.

Page 203: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 203

Implementation 1

Y Network of ws’s running Unix BSD 4.3and Mach.

Y Implemented as 2 user-level processes:– Vice: runs at each Andrew server.

– Venus: runs at each Andrew client.

Page 204: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 204

Implementation 2Y Modified BSD 4.3 Unix

kernel.– At client, intercept file

system calls (open,close, etc.) and passthem to Venus whenreferring to shared files.

Y File partition on localdisk used as cache.

Y Venus manages cache.– LRU replacement policy.– Cache large enough to

hold 100’s of average-sized files.

Unix kernel

Unix kernel

Vice

User program

Venus

Network

Client

Server

Page 205: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 205

File Sharing

Y Files are shared or local.– Shared files

X Utilities (/bin, /lib): infrequently updated or filesaccessed by single user (user’s home directory).

X Stored on servers and cached on clients.X Local copies remain valid for long time.

– Local filesX Temporary files (/tmp) and files used for start-up.X Stored on local machine’s disk.

Page 206: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 206

File Name Space

Y Regular UNIX directory hierarchy.Y “cmu” subtree contains shared files.Y Local files stored on local machine.Y Shared files: symbolic links to shared files.

/

tmp bin vmunixcmu

bin

Local Shared

Page 207: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 207

AFS Caching 1Y AFS-1 uses timestamp-based cache

invalidation.

Y AFS-2 and + use callbacks.– When serving file, Vice server promises to

notify Venus client when file is modified.– Stateless servers?

– Callback stored with cached file.X Valid.X Canceled: when client is notified by server that

file has been modified.

Page 208: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 208

AFS Caching 2Y Calbacks implemented using RPC.Y When accessing file, Venus checks if file

exists and if callback valid; if canceled,fetches fresh copy from server.

Y Failure recovery:– When restarting after failure, Venus checks

each cached file by sending validation requestto server.

– Also periodic checks in case of communicationfailures.

Page 209: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 209

AFS Caching 3

Y @ file close time, Venus on clientmodifying file sends update to Viceserver.

Y Server updates its own copy and sendscallback cancellation to all clientscaching file.

Y Consistency?

Y Concurrent updates?

Page 210: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 210

AFS Replication

Y Read-only replication.– Only read-only files allowed to be

replicated at several servers.

Page 211: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 211

Coda

Y Evolved from AFS.

Y Goal: constant data availability.– Improved replication.

X Replication of read-write volumes.

– Disconnected operation: mobility.X Extension of AFS’s whole file caching mechanism.

Y Access to shared file repository (servers)versus relying on local resources whenserver not available.

Page 212: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 212

Replication in CodaY Replication unit: file volume (set of files).

Y |Set of replicas of file volume: volumestorage group (VSG).

Y Subset of replicas available to client:AVSG.– Different clients, different AVSGs.– AVSG membership changes as server availability

changes.– On write: when file is closed, copies of modified file

broadcast to AVSG.

Page 213: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 213

Optimistic Replication

Y Goal is availability!

Y Replicated files are allowed to bemodified even in the presence ofpartitions or during disconnectedoperation.

Page 214: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 214

Disconnected OperationY AVSG = { }.

Y Network/server failures or host on the move.

Y Rely on local cache to serve all needed files.

Y Loading the cache:– User intervention: list of files to be cached.

– Learning usage patterns over time.

Y Upon reconnection, cached copies validatedagainst server’s files.

Page 215: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 215

Normal and Disconnected Operation

Y During normal operation:– Coda behaves like AFS.– Cache miss transparent to user; only performance

penalty.– Load balancing across replicas.– Cost: replica consistency + cache consistency.

Y Disconnected operation:– No replicas are accessible; cache miss prevents

further progress; need to load cache beforedisconnection.

Page 216: Advanced Operating Systems Lecture notesgost.isi.edu/courses/csci555/fall99/lectures-bm/99premidx1.pdf · Title: Microsoft PowerPoint - premweb.ppt Created Date: 10/18/1999 9:57:25

Copyright © 1995-1998 Clifford Neuman, Katia Obraczka - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Fall 1999 216

Replication and Caching

Y Coda integrates server replication and clientcaching.– On cache hit and valid data: Venus does not

need to contact server.

– On cache miss: Venus gets data from an AVSGserver, i.e., the preferred server (PS).X PS chosen at random or based on proximity, load.

– Venus also contacts other AVSG servers andcollect their versions; if conflict, abort operation;if replicas stale, update them off-line.