distributed computing

Distributed Operating System and

File SystemDeepak John

Department Of I.T,CE-Poonjar

Introduction

� Distributed OS

I. Presents users (and applications) with an integrated

computing platform that hides the individual computers.

II. Has control over all of the nodes (computers) in the network

and allocates their resources to tasks without user

involvement.

III. the user doesn't know (or care) where his programs are

running.

� Ex:Cluster computer systems,V system, Sprite

Deepak John,CE-Poonjar

Operating System layer

Applications, services

Computer &

Platform

Middleware

OS: kernel,libraries & servers

network hardware

OS1

Computer & network hardware

Node 1 Node 2

Processes, threads,communication, ...

OS2Processes, threads,communication, ...


�Middleware implements abstractions that support network-wide

programming. Examples:

� RPC and RMI (Sun RPC, Corba, Java RMI)

� event distribution and filtering (Corba Event Notification, Elvin)

� resource discovery for mobile and ubiquitous computing

� support for multimedia streaming


Functions that OS should provide for middleware

1. Encapsulation

provide a set of operations that meet their clients’ needs

2. Protection

protect resource from illegitimate access

3. Concurrent processing

support clients access resource concurrently

4. Invocation mechanism: a means of accessing an encapsulated

resource

�Communication : Pass operation parameters and results between

resource managers

�Scheduling: Schedule the processing of the invoked operation


Core OS functionality

Communication

manager

Thread manager Memory manager

Supervisor

Process manager


�Process manager

Handles the creation of and operations upon processes.

�Thread manager

Thread creation, synchronization and scheduling

�Communication manager

Communication between threads attached to different

processes on the same computer

�Memory manager

Management of physical and virtual memory

�Supervisor

�Dispatching of interrupts, system call traps and other exceptions

�control of memory management unit and hardware caches

�processor and floating point unit register manipulations


Kernel and Protection

�Kernel

�always runs

�complete access privileges for the physical resources

�Different execution mode

�supervisor mode (kernel process) / user mode (user process)

�system call trap: invocation mechanism for resources managed by

kernel

�An address space: a collection of ranges of virtual memory locations,

in each of which a specified combination of memory access rights

applies, e.g.: read only or read-write

�The cost for protection

�switching between different processes take many processor cycles

�a system call trap is a more expensive operation than a simple method

callDeepak John,CE-Poonjar

Processes and Threads

� Process� A program in execution

� Problem: sharing between related activities are awkward and

expensive

� a process consists of an execution environment together with

one or more threads.

� An execution environment is a collection of local kernel managed

resources to which its threads have access. An execution

environment primarily consists of:

� an address space;

� thread synchronization and communication resources such as

semaphores and communication interfaces (for example, sockets);

� higher-level resources such as open files and windows.


Process address space

Stack

Text

Heap

Auxiliary

regions

0

2N

�Address space

� a unit of management of a process’s

virtual memory

� Up to 232 bytes and sometimes up to

264 bytes.

� consists of one or more regions.

�Region

� an area of continuous virtual memory

that is accessible by the threads of the

owning process.

� can be shared

1. kernel code

2. libraries

3. shared data & communication

4. Copy on writeDeepak John,CE-Poonjar

Creation of new process in distributed system

� Creating process by the operation system

� Fork, in UNIX

� Process creation in distributed system

� The choice of a target host

� Choice of process host

� running new processes at their originator’s computer

� sharing processing load between a set of computers

� Load sharing system

� Centralized, Decentralized, Hierarchical

� sender-initiated,receiver-initiated


� Load sharing policy

� Transfer policy: situate a new process locally or remotely

� Location policy: which node should host the new process

1. Static policy without regard to the current state of the system

2.Adaptive policy applies heuristics to make their allocation

decision

� Creation of a new execution environment

1. Initializing the address space

� Statically defined format

� With respect to an existing execution environment,

e.g. fork()

2. Copy-on-write scheme

A technique that is used to reduce amount of data copy.Deepak John,CE-Poonjar

Copy-on-write – a convenient optimization

a) Before write b) After write

Sharedframe

A's pagetable

B's pagetable

Process A’s address space Process B’s address space

Kernel

RA RB

RB copiedfrom RA


Threads concept and implementation

�Thread

� threads were introduced as a lightweight – operating system

provided .

�a process consists of its address-space, and a set of threads attached

to that process.

�The operating system can perform less expensive context switches

between threads attached to the same process and threads attached to

the same process can access the same memory etc, such that

communication/ synchronisation can be much cheaper and less

awkward


A server application generally

consists of:

�A single thread, the receiver-

thread which receives all the

requests, places them in a queue

and dispatches those requests to

be dealt with by the worker-

threads

�The worker-thread which deals

with the request may be a thread

in the same process or it may be

a thread in another

process


Client and server with threads �Worker pool

�Server creates a fixed pool of

“worker” threads to process the

requests when it starts up.

�The module marked ‘receipt

and queuing is typically

implemented by an ‘I/O’ thread,

�Pro: simple

�Cons

1. Inflexibility: worker threads

number unequal to current

request number

2. High level of switching

between the I/O and worker

thread�For single thread, server can handle 100 client

requests per second


Alternative server threading architectures

�Thread-per-request

�Server spawn a new worker thread for each new request, destroy it

when the request processing finish.

� Pro: throughput is potentially maximized.

� Con: overhead of the thread creation and destruction


� Thread-per-connection

� Server creates a new worker thread when client creates a connection,

destroys the thread when the client closes the connection.

� Pro: lower thread management overheads compared with the thread-

per-request.

� Con: client may be delayed while a worker thread has several

outstanding requests but another thread has no work to perform

�Thread-per-object

�Associate a thread with each remote object

� Pro&Con are similar to thread-per-connection


Threads versus multiple processes

� Creating a thread is (much) cheaper than a process (~10-20times)

� Switching to a different thread in same process is (much) cheaper

(5-50 times)

� Threads within same process can share data and other resources

more conveniently and efficiently (without copying or messages)

� Threads within a process are not protected from each other

Threads implementation

can be implemented:

�in the OS kernel (Win NT, Solaris, Mach)

�at user level e.g. by a thread library, or in the language (Ada, Java).


Java thread constructor and management methods

• Thread(ThreadGroup group, Runnable target, String name)

Creates a new thread in the SUSPENDED state, which will

belong to group and be identified as name; the thread will

execute the run() method of target.

• setPriority(int newPriority), getPriority()

Set and return the thread’s priority.

• run()

A thread executes the run() method of its target object, if it has

one, and otherwise its own run() method (Thread implements

Runnable).

Methods of objects that inherit from class Thread


• start()

Change the state of the thread from SUSPENDED to

RUNNABLE.

• sleep(int millisecs)

Cause the thread to enter the SUSPENDED state for the

specified time.

• yield()

Enter the READY state and invoke the scheduler.

• destroy()

Destroy the thread.


Java thread synchronization calls• thread.join(int millisecs)

Blocks the calling thread for up to the specified time until thread

has terminated.

• thread.interrupt()

Interrupts thread: causes it to return from a blocking method call

such as sleep().

• object.wait(long millisecs, int nanosecs)

Blocks the calling thread until a call made to notify() or

notifyAll() on object wakes the thread, or the thread is interrupted,

or the specified time has elapsed.

• object.notify(), object.notifyAll()

Wakes, respectively, one or all of any threads that have called

wait() on object.Deepak John,CE-Poonjar

Operating system architecture

Monolithic Kernel Microkernel

Server: Dynamically loaded server program:Kernel code and data:

.......

.......

Key:

S4

S1 .......

S1 S2 S3

S2 S3 S4

Monolithic kernel and microkernel


Monolithic vs Microkernel�A monolithic kernel provides all of the services via a single

image, that is a single program initialized when the computer boots

� A microkernel instead implements only the absolute minimum:

Basic virtual memory, Basic scheduling and Inter-process

communication

�All other services such as device drivers, the file system,

networking etc are implemented as user-level server processes that

communicate with each other and the kernel via IPC


The Microkernel ApproachThe major advantages of the microkernel approach include:

�Extensibility - major functionality can be added without modifying

the core kernel of the operating system

�Modularity - the different functions of the operating system can be

forced into modularity behind memory protection barriers. A

monolithic kernel must use programming language features or code

conventions to attempt to ensure this

�Robustness -relatively small kernel might be likely to contain fewer

bugs than a larger program, however, this point is rather contentious

�Portability - since only a small portion of the operating system, its

smaller kernel, relies on the particulars of a given machine it is easier

to port to a new machine architecture


The role of the microkernel

Middleware

Language

support

subsystem

Language

support

subsystem

OS emulation

subsystem....

Microkernel

Hardware

The microkernel supports middleware via subsystems


The Monolithic Approach�The major advantage of the monolithic approach is the relative

efficiency with which operations may be invoked. Since services

share an address space with the core of the kernel they need not make

system calls to access core-kernel functionality

�Most operating systems in use today are a kind of hybrid solution

�Linux is a monolithic kernel, but modules may be dynamically

loaded and unloaded at run time.


Distributed File Systems


Purposes of a Distributed File System

� Sharing of storage and information across a network

� Convenience (and efficiency) of a conventional file system

� Persistent storage that most other services (e.g., Web servers) need

Files

�Files are an abstraction of permanent storage.

�A file is typically defined as a sequence of similar-sized data items

along with a set of attributes.

� A directory is a file that provides a mapping from text names to

internal file identifiers.


File attributes

updated

by system:

File length

Creation timestamp

Read timestamp

Write timestamp

Attribute timestamp

Reference count

Owner

File type

Access control list

E.g. for UNIX: rw-rw-r--

updated

by owner:


File Systems�Responsible for the (a) organization, (b)storage, (c) retrieval, (d)

naming, (e)sharing, and (f) protection of files.

�Provide a set of programming operations that characterize the file

abstraction,particularly operations to read and write subsequences of

data items beginning at any point of a file.

Directory module: relates file names to file IDs

File module: relates file IDs to particular files

Access control module: checks permission for operation requested

File access module: reads or writes file data or attributes

Block module: accesses and allocates disk blocks

Device module: disk I/O and buffering

File

system

modules


UNIX file system operations


Distributed File System Requirements


File Service Architecture

� An architecture that offers a clear separation of the main concerns

in providing access to files is obtained by structuring the file

service as three components:

� A flat file service

� A directory service

� A client module.

� The Client module implements exported interfaces by flat file and

directory services on server side.


Client computer Server computer

Applicationprogram

Applicationprogram

Client module

Flat file service

Directory service

Lookup

AddName

UnName

GetNames

Read

Write

Create

Delete

GetAttributes

SetAttributesDeepak John,CE-Poonjar

� Flat file service:� Concerned with the implementation of operations on the contents of file.

Unique File Identifiers (UFIDs) are used to refer to files in all requestsfor flat file service operations.

� Directory Service:� Provides mapping between text names for the files and their UFIDs.

Clients may obtain the UFID of a file by quoting its text name todirectory service. Directory service supports functions needed generatedirectories, to add new files to directories.

� Client Module:� It runs on each computer and provides integrated service (flat file and

directory) as a single API to application programs.

� It holds information about the network locations of flat-file and directoryserver processes; and achieve better performance throughimplementation of a cache of recently used file blocks at the client.


Flat file service operations

Read(FileId, i, n) -> Data

— throws BadPosition

If 1 ≤ i ≤ Length(File): Reads a sequence of up to n items

from a file starting at item i and returns it in Data.

Write(FileId, i, Data)

— throws BadPosition

If 1 ≤ i ≤ Length(File)+1: Writes a sequence of Data to a

file, starting at item i, extending the file if necessary.

Create() -> FileId Creates a new file of length 0 and delivers a UFID for it.

Delete(FileId) Removes the file from the file store.

GetAttributes(FileId) -> Attr Returns the file attributes for the file.

SetAttributes(FileId, Attr) Sets the file attributes


Access control� UNIX checks access rights when a file is opened ,subsequent

checks during read/write are not necessary

� distributed environment

a. server has to check.

b. stateless approaches

1. access check once when UFID is issued

• client gets an encoded "capability" (who can access and how)

• capability is submitted with each subsequent request

2. access check for each request.


File Group

A collection of files that can be located

on any server or moved between

servers while maintaining the same

names.

� Similar to a UNIX filesystem

� Helps with distributing the load of

file serving between several servers.

� File groups have identifiers which

are unique throughout the system

(and hence for an open system, they

must be globally unique).

� Used to refer to file groups and

files

To construct a globally

unique ID we use some

unique attribute of the

machine on which it is

created, e.g. IP number,

even though the file group

may move subsequently.

IP address date

32 bits 16 bits

File Group ID:


Case Study: Sun NFS

� An industry standard for file sharing on local networks since the

1980s

� An open standard with clear and simple interfaces.

� OS independent

� unix implementation

� rpc

� udp or tcp

� Supports many of the design requirements already mentioned:

� Transparency,heterogeneity,efficiency,fault tolerance

� Limited achievement of:

� Concurrency,replication,consistency,security


NFS architecture

Client computer Server computer

UNIXfile

system

NFSclient

NFSserver

UNIXfile

system

Applicationprogram

Applicationprogram

Virtual file systemVirtual file system

Oth

er

file

syste

m

UNIX kernel

system calls

NFSprotocol

(remote operations)

UNIX

Operations on local files

Operationson

remote files

Applicationprogram

NFSClient

KernelApplicationprogram

NFSClient

Client computer


Virtual file system� access transparency

� part of unix kernel

� NFS file handle, 3 components:

� filesystem identifier

� different groups of files

� i-node (index node)

� structure for finding the file

� i-node generation number

� i-nodes are reused

� incremented when reused

� VFS

� struct for each file system

� v-node for each open file

� file handle for remote file

� i-node number for local fileDeepak John,CE-Poonjar

NFS access control and authentication

� Stateless server, so the user's identity and access rights must be

checked by the server on each request.

� In the local file system they are checked only on open()

� Every client request is accompanied by the userID and groupID

� Server is exposed to imposter attacks unless the userID and groupID

are protected by encryption

� Kerberos has been integrated with NFS to provide a stronger and

more comprehensive security solution


• read(fh, offset, count) -> attr, data

• write(fh, offset, count, data) -> attr

• create(dirfh, name, attr) -> newfh, attr

• remove(dirfh, name) status

• getattr(fh) -> attr

• setattr(fh, attr) -> attr

• lookup(dirfh, name) -> fh, attr

• rename(dirfh, name, todirfh, toname)

• link(newdirfh, newname, dirfh, name)

• readdir(dirfh, cookie, count) ->

entries

NFS server operations (simplified)

fh = file handle:

Filesystem identifier i-node number i-node generation

Read(FileId, i, n) -> Data

Write(FileId, i, Data)

Create() -> FileId

Delete(FileId)

GetAttributes(FileId) -> Attr

SetAttributes(FileId, Attr)

Lookup(Dir, Name) -> FileId

AddName(Dir, Name, File)

UnName(Dir, Name)

GetNames(Dir, Pattern)

->NameSeq


Mount service� the process of including a new file system is called mounting.

� communicates with the mount process on the server in a mountprotocol.

� Server maintains a table of clients who have mounted filesystems at

that server

� Each client maintains a table of mounted file systems holding.

� hard-mounted

� user process is suspended until request is successful

� when server is not responding

� request is retried until it's satisfied

� soft-mounted

� if server fails, client returns failure after a small number of retries

� user process handles the failureDeepak John,CE-Poonjar

THE MOUNT PROTOCOL:

The following operations occur:

1. The client's request is sent via RPC to the mount server ( onserver machine.)

2. Mount server checks export list containing

a) file systems that can be exported,

b) legal requesting clients.

c) It's legitimate to mount any directory within the legalfilesystem.

3. Server returns "file handle" to client.

4. Server maintains list of clients and mounted directories -- this isstate information! But this data is only a "hint" and isn't treated asessential.

5. Mounting often occurs automatically when client or server boots.Deepak John,CE-Poonjar

Local and remote file systems

jim jane joeann

usersstudents

usrvmunix

Client Server 2

. . . nfs

Remote

mountstaff

big bobjon

people

Server 1

export

(root)

Remote

mount

. . .

x

(root) (root)

Note: The file system mounted at /usr/students in the client is actually the sub-tree

located at /export/people in Server 1; the file system mounted at /usr/staff in the client is

actually the sub-tree located at /nfs/users in Server 2.


Pathname translation

� pathname: /users/students/dc/abc

� server doesn't receive the entire pathname for translation,

� client breaks down the pathnames into parts

� iteratively translate each part

� translation is cached


Automounter

NFS client catches attempts to access 'empty' mount points and routes

them to the Automounter

� Automounter has a table of mount points and multiple candidate

serves for each

� it sends a probe message to each candidate server and then uses the

mount service to mount the filesystem at the first server to respond

� Keeps the mount table small


Server caching� caching file pages, directory/file attributes

� read-ahead: prefetch pages following the most-recently read file

pages

� delayed-write: write to disk when the page in memory is needed

for other purposes

� "sync" flushes "dirty" pages to disk every 30 seconds

� two write option

1. write-through: write to disk before replying to the client

2. cache and commit:

� stored in memory cache

� write to disk before replying to a "commit" request from the

clientDeepak John,CE-Poonjar

Client caching� caches results of read, write, getattr, lookup, readdir

� clients responsibility to poll the server for consistency

� timestamp-based methods for consistency validation

� Tc: time when the cache entry was last validated

� Tm: time when the block was last modified at the server

� cache entry is valid if:

1. T - Tc < t, where t is the freshness interval

� t is adaptively adjusted:

� files: 3 to 30 seconds depending on freq of updates

� directories: 30 to 60 seconds

2. Tmclient = Tmserver

� need validation for all cache accesses

Reading


writing� dirty: modified page in cache

� flush to disk: file is closed or sync from client

� bio-daemon (block input-output)

� read-ahead: after each read request, request the next file block from

the server as well

� delayed write: after a block is filled, it's sent to the server

� reduce the time to wait for read/write


Summary for NFS� access transparency: same system calls for local or remote files

� location transparency: could have a single name space for all files(depending on all the clients to agree the same name space)

� mobility transparency: mount table need to be updated on each client(not transparent)

� scalability: can usually support large loads, add processors, disks,servers...

� file replication: read-only replication, no support for replication offiles with updates

� hardware and OS: many ports

� fault tolerance: stateless and idempotent

� consistency: not quite one-copy for efficiency

� security: added encryption--Kerberos

� efficiency: pretty efficient, wide-spread useDeepak John,CE-Poonjar

Naming Services

� Definition

� Key benefits

� Resource localization

� Uniform naming

� Device independent address (e.g., you can move domain

name/web site from one server to another server seamlessly).

In a Distributed System, a Naming Service is a specific service

whose aim is to provide a consistent and uniform naming of

resources, this allowing other programs or services to localize

them and obtain the required metadata for interacting with

them.


Requirements for name spaces

� Allow simple but meaningful names to be used

� Potentially infinite number of names

� Structured

� to allow similar subnames without clashes

� to group related names

� Management of trust


Iterative navigation

Client1

2

3

A client iteratively contacts name servers NS1–NS3 in order to resolve a name

NS2

NS1

NS3

Nameservers

Iterative Navigation is the act of chaining multiple Naming Services in

order to resolve a single name to the corresponding resource.

Iterative Navigation is the act of chaining multiple Naming Services in

order to resolve a single name to the corresponding resource.


Recursive Navigation

� The Iterative Navigation can be…

� Recursive:

� it is performed by the naming server

� the server becomes like a client for the next server

� Non recursive:

� it is performed by the client or the first server

� if it is performed by the server it is “server controlled”

� the server bounces back the next hop to its client


Non-recursive and recursive server-controlled navigation

A name server NS1 communicates with other name servers on behalf of a client

Recursiveserver-controlled

1

2

3

5

4

client

NS2

NS1

NS3

1

2

34

client

NS2

NS1

NS3

Non-recursiveserver-controlled

DNS offers recursive navigation as an option, but iterative is the

standard technique. Recursive navigation must be used in domains

that limit client access to their DNS information for security reasons.Deepak John,CE-Poonjar

The SNS - a Simple Name Service model

� Stores attributes of named objects such as users, computers and

services and group names.

Users

Computers

Services

Email server, login info, encoded passwords,

home directory

Network addresses, architecture, OS, owner

Named object Value

Service address, version no.

Group Mailing lists, group1, group2,...


SNS basic design requirements

� Specify the Types of named objects:

� users, services, computers and group names and directories.

� Other types of objects may be integrated;

� The names are used only within the organization;

� Efficient name lookup;

� Access control:

� everyone can read but Authorized write;


DNS - The Internet Domain Name System

� A distributed naming database (specified in RFC 1034/1305)

� Name structure reflects administrative structure of the Internet

� Rapidly resolves domain names to IP addresses

� exploits caching heavily

� typical query time ~100 milliseconds

� Scales to millions of computers

� partitioned database

� caching

� replication


Basic DNS algorithm for name resolution (domain name -> IP number)

• Look for the name in the local cache

• Try a superior DNS server, which responds with:

– another recommended DNS server

– the IP address (which may not be entirely up to date)


DNS server functions and configuration

� Main function is to resolve domain names for computers, i.e. to get

their IP addresses.

� Other functions:

� reverse resolution - get domain name from IP address

� Host information - type of hardware and OS

� Well-known services - a list of well-known services offered by a

host

� Other attributes can be included (optional)


DNS issues

� Name tables change infrequently, but when they do, caching can

result in the delivery of stale data.

� Clients are responsible for detecting this and recovering

� Its design makes changes to the structure of the name space

difficult. For example:

� merging previously separate domain trees under a new root

� moving subtrees to a different part of the structure (e.g. if

Scotland became a separate country, its domains should all be

moved to a new country-level domain.)

.


distributed computing

Education