distributed computing
DESCRIPTION
Distributed Computing,dispributed Operating System,Distributed File SystemTRANSCRIPT
Distributed Operating System and
File SystemDeepak John
Department Of I.T,CE-Poonjar
Introduction
� Distributed OS
I. Presents users (and applications) with an integrated
computing platform that hides the individual computers.
II. Has control over all of the nodes (computers) in the network
and allocates their resources to tasks without user
involvement.
III. the user doesn't know (or care) where his programs are
running.
� Ex:Cluster computer systems,V system, Sprite
Deepak John,CE-Poonjar
Operating System layer
Applications, services
Computer &
Platform
Middleware
OS: kernel,libraries & servers
network hardware
OS1
Computer & network hardware
Node 1 Node 2
Processes, threads,communication, ...
OS2Processes, threads,communication, ...
Deepak John,CE-Poonjar
�Middleware implements abstractions that support network-wide
programming. Examples:
� RPC and RMI (Sun RPC, Corba, Java RMI)
� event distribution and filtering (Corba Event Notification, Elvin)
� resource discovery for mobile and ubiquitous computing
� support for multimedia streaming
Deepak John,CE-Poonjar
Functions that OS should provide for middleware
1. Encapsulation
provide a set of operations that meet their clients’ needs
2. Protection
protect resource from illegitimate access
3. Concurrent processing
support clients access resource concurrently
4. Invocation mechanism: a means of accessing an encapsulated
resource
�Communication : Pass operation parameters and results between
resource managers
�Scheduling: Schedule the processing of the invoked operation
Deepak John,CE-Poonjar
Core OS functionality
Communication
manager
Thread manager Memory manager
Supervisor
Process manager
Deepak John,CE-Poonjar
�Process manager
Handles the creation of and operations upon processes.
�Thread manager
Thread creation, synchronization and scheduling
�Communication manager
Communication between threads attached to different
processes on the same computer
�Memory manager
Management of physical and virtual memory
�Supervisor
�Dispatching of interrupts, system call traps and other exceptions
�control of memory management unit and hardware caches
�processor and floating point unit register manipulations
Deepak John,CE-Poonjar
Kernel and Protection
�Kernel
�always runs
�complete access privileges for the physical resources
�Different execution mode
�supervisor mode (kernel process) / user mode (user process)
�system call trap: invocation mechanism for resources managed by
kernel
�An address space: a collection of ranges of virtual memory locations,
in each of which a specified combination of memory access rights
applies, e.g.: read only or read-write
�The cost for protection
�switching between different processes take many processor cycles
�a system call trap is a more expensive operation than a simple method
callDeepak John,CE-Poonjar
Processes and Threads
� Process� A program in execution
� Problem: sharing between related activities are awkward and
expensive
� a process consists of an execution environment together with
one or more threads.
� An execution environment is a collection of local kernel managed
resources to which its threads have access. An execution
environment primarily consists of:
� an address space;
� thread synchronization and communication resources such as
semaphores and communication interfaces (for example, sockets);
� higher-level resources such as open files and windows.
Deepak John,CE-Poonjar
Process address space
Stack
Text
Heap
Auxiliary
regions
0
2N
�Address space
� a unit of management of a process’s
virtual memory
� Up to 232 bytes and sometimes up to
264 bytes.
� consists of one or more regions.
�Region
� an area of continuous virtual memory
that is accessible by the threads of the
owning process.
� can be shared
1. kernel code
2. libraries
3. shared data & communication
4. Copy on writeDeepak John,CE-Poonjar
Creation of new process in distributed system
� Creating process by the operation system
� Fork, in UNIX
� Process creation in distributed system
� The choice of a target host
� Choice of process host
� running new processes at their originator’s computer
� sharing processing load between a set of computers
� Load sharing system
� Centralized, Decentralized, Hierarchical
� sender-initiated,receiver-initiated
Deepak John,CE-Poonjar
� Load sharing policy
� Transfer policy: situate a new process locally or remotely
� Location policy: which node should host the new process
1. Static policy without regard to the current state of the system
2.Adaptive policy applies heuristics to make their allocation
decision
� Creation of a new execution environment
1. Initializing the address space
� Statically defined format
� With respect to an existing execution environment,
e.g. fork()
2. Copy-on-write scheme
A technique that is used to reduce amount of data copy.Deepak John,CE-Poonjar
Copy-on-write – a convenient optimization
a) Before write b) After write
Sharedframe
A's pagetable
B's pagetable
Process A’s address space Process B’s address space
Kernel
RA RB
RB copiedfrom RA
Deepak John,CE-Poonjar
Threads concept and implementation
�Thread
� threads were introduced as a lightweight – operating system
provided .
�a process consists of its address-space, and a set of threads attached
to that process.
�The operating system can perform less expensive context switches
between threads attached to the same process and threads attached to
the same process can access the same memory etc, such that
communication/ synchronisation can be much cheaper and less
awkward
Deepak John,CE-Poonjar
A server application generally
consists of:
�A single thread, the receiver-
thread which receives all the
requests, places them in a queue
and dispatches those requests to
be dealt with by the worker-
threads
�The worker-thread which deals
with the request may be a thread
in the same process or it may be
a thread in another
process
Deepak John,CE-Poonjar
Client and server with threads �Worker pool
�Server creates a fixed pool of
“worker” threads to process the
requests when it starts up.
�The module marked ‘receipt
and queuing is typically
implemented by an ‘I/O’ thread,
�Pro: simple
�Cons
1. Inflexibility: worker threads
number unequal to current
request number
2. High level of switching
between the I/O and worker
thread�For single thread, server can handle 100 client
requests per second
Deepak John,CE-Poonjar
Alternative server threading architectures
�Thread-per-request
�Server spawn a new worker thread for each new request, destroy it
when the request processing finish.
� Pro: throughput is potentially maximized.
� Con: overhead of the thread creation and destruction
Deepak John,CE-Poonjar
� Thread-per-connection
� Server creates a new worker thread when client creates a connection,
destroys the thread when the client closes the connection.
� Pro: lower thread management overheads compared with the thread-
per-request.
� Con: client may be delayed while a worker thread has several
outstanding requests but another thread has no work to perform
�Thread-per-object
�Associate a thread with each remote object
� Pro&Con are similar to thread-per-connection
Deepak John,CE-Poonjar
Threads versus multiple processes
� Creating a thread is (much) cheaper than a process (~10-20times)
� Switching to a different thread in same process is (much) cheaper
(5-50 times)
� Threads within same process can share data and other resources
more conveniently and efficiently (without copying or messages)
� Threads within a process are not protected from each other
Threads implementation
can be implemented:
�in the OS kernel (Win NT, Solaris, Mach)
�at user level e.g. by a thread library, or in the language (Ada, Java).
Deepak John,CE-Poonjar
Java thread constructor and management methods
• Thread(ThreadGroup group, Runnable target, String name)
Creates a new thread in the SUSPENDED state, which will
belong to group and be identified as name; the thread will
execute the run() method of target.
• setPriority(int newPriority), getPriority()
Set and return the thread’s priority.
• run()
A thread executes the run() method of its target object, if it has
one, and otherwise its own run() method (Thread implements
Runnable).
Methods of objects that inherit from class Thread
Deepak John,CE-Poonjar
• start()
Change the state of the thread from SUSPENDED to
RUNNABLE.
• sleep(int millisecs)
Cause the thread to enter the SUSPENDED state for the
specified time.
• yield()
Enter the READY state and invoke the scheduler.
• destroy()
Destroy the thread.
Deepak John,CE-Poonjar
Java thread synchronization calls• thread.join(int millisecs)
Blocks the calling thread for up to the specified time until thread
has terminated.
• thread.interrupt()
Interrupts thread: causes it to return from a blocking method call
such as sleep().
• object.wait(long millisecs, int nanosecs)
Blocks the calling thread until a call made to notify() or
notifyAll() on object wakes the thread, or the thread is interrupted,
or the specified time has elapsed.
• object.notify(), object.notifyAll()
Wakes, respectively, one or all of any threads that have called
wait() on object.Deepak John,CE-Poonjar
Operating system architecture
Monolithic Kernel Microkernel
Server: Dynamically loaded server program:Kernel code and data:
.......
.......
Key:
S4
S1 .......
S1 S2 S3
S2 S3 S4
Monolithic kernel and microkernel
Deepak John,CE-Poonjar
Monolithic vs Microkernel�A monolithic kernel provides all of the services via a single
image, that is a single program initialized when the computer boots
� A microkernel instead implements only the absolute minimum:
Basic virtual memory, Basic scheduling and Inter-process
communication
�All other services such as device drivers, the file system,
networking etc are implemented as user-level server processes that
communicate with each other and the kernel via IPC
Deepak John,CE-Poonjar
The Microkernel ApproachThe major advantages of the microkernel approach include:
�Extensibility - major functionality can be added without modifying
the core kernel of the operating system
�Modularity - the different functions of the operating system can be
forced into modularity behind memory protection barriers. A
monolithic kernel must use programming language features or code
conventions to attempt to ensure this
�Robustness -relatively small kernel might be likely to contain fewer
bugs than a larger program, however, this point is rather contentious
�Portability - since only a small portion of the operating system, its
smaller kernel, relies on the particulars of a given machine it is easier
to port to a new machine architecture
Deepak John,CE-Poonjar
The role of the microkernel
Middleware
Language
support
subsystem
Language
support
subsystem
OS emulation
subsystem....
Microkernel
Hardware
The microkernel supports middleware via subsystems
Deepak John,CE-Poonjar
The Monolithic Approach�The major advantage of the monolithic approach is the relative
efficiency with which operations may be invoked. Since services
share an address space with the core of the kernel they need not make
system calls to access core-kernel functionality
�Most operating systems in use today are a kind of hybrid solution
�Linux is a monolithic kernel, but modules may be dynamically
loaded and unloaded at run time.
Deepak John,CE-Poonjar
Distributed File Systems
Deepak John,CE-Poonjar
Purposes of a Distributed File System
� Sharing of storage and information across a network
� Convenience (and efficiency) of a conventional file system
� Persistent storage that most other services (e.g., Web servers) need
Files
�Files are an abstraction of permanent storage.
�A file is typically defined as a sequence of similar-sized data items
along with a set of attributes.
� A directory is a file that provides a mapping from text names to
internal file identifiers.
Deepak John,CE-Poonjar
File attributes
updated
by system:
File length
Creation timestamp
Read timestamp
Write timestamp
Attribute timestamp
Reference count
Owner
File type
Access control list
E.g. for UNIX: rw-rw-r--
updated
by owner:
Deepak John,CE-Poonjar
File Systems�Responsible for the (a) organization, (b)storage, (c) retrieval, (d)
naming, (e)sharing, and (f) protection of files.
�Provide a set of programming operations that characterize the file
abstraction,particularly operations to read and write subsequences of
data items beginning at any point of a file.
Directory module: relates file names to file IDs
File module: relates file IDs to particular files
Access control module: checks permission for operation requested
File access module: reads or writes file data or attributes
Block module: accesses and allocates disk blocks
Device module: disk I/O and buffering
File
system
modules
Deepak John,CE-Poonjar
UNIX file system operations
Deepak John,CE-Poonjar
Distributed File System Requirements
Deepak John,CE-Poonjar
File Service Architecture
� An architecture that offers a clear separation of the main concerns
in providing access to files is obtained by structuring the file
service as three components:
� A flat file service
� A directory service
� A client module.
� The Client module implements exported interfaces by flat file and
directory services on server side.
Deepak John,CE-Poonjar
Client computer Server computer
Applicationprogram
Applicationprogram
Client module
Flat file service
Directory service
Lookup
AddName
UnName
GetNames
Read
Write
Create
Delete
GetAttributes
SetAttributesDeepak John,CE-Poonjar
� Flat file service:� Concerned with the implementation of operations on the contents of file.
Unique File Identifiers (UFIDs) are used to refer to files in all requestsfor flat file service operations.
� Directory Service:� Provides mapping between text names for the files and their UFIDs.
Clients may obtain the UFID of a file by quoting its text name todirectory service. Directory service supports functions needed generatedirectories, to add new files to directories.
� Client Module:� It runs on each computer and provides integrated service (flat file and
directory) as a single API to application programs.
� It holds information about the network locations of flat-file and directoryserver processes; and achieve better performance throughimplementation of a cache of recently used file blocks at the client.
Deepak John,CE-Poonjar
Flat file service operations
Read(FileId, i, n) -> Data
— throws BadPosition
If 1 ≤ i ≤ Length(File): Reads a sequence of up to n items
from a file starting at item i and returns it in Data.
Write(FileId, i, Data)
— throws BadPosition
If 1 ≤ i ≤ Length(File)+1: Writes a sequence of Data to a
file, starting at item i, extending the file if necessary.
Create() -> FileId Creates a new file of length 0 and delivers a UFID for it.
Delete(FileId) Removes the file from the file store.
GetAttributes(FileId) -> Attr Returns the file attributes for the file.
SetAttributes(FileId, Attr) Sets the file attributes
Deepak John,CE-Poonjar
Access control� UNIX checks access rights when a file is opened ,subsequent
checks during read/write are not necessary
� distributed environment
a. server has to check.
b. stateless approaches
1. access check once when UFID is issued
• client gets an encoded "capability" (who can access and how)
• capability is submitted with each subsequent request
2. access check for each request.
Deepak John,CE-Poonjar
File Group
A collection of files that can be located
on any server or moved between
servers while maintaining the same
names.
� Similar to a UNIX filesystem
� Helps with distributing the load of
file serving between several servers.
� File groups have identifiers which
are unique throughout the system
(and hence for an open system, they
must be globally unique).
� Used to refer to file groups and
files
To construct a globally
unique ID we use some
unique attribute of the
machine on which it is
created, e.g. IP number,
even though the file group
may move subsequently.
IP address date
32 bits 16 bits
File Group ID:
Deepak John,CE-Poonjar
Case Study: Sun NFS
� An industry standard for file sharing on local networks since the
1980s
� An open standard with clear and simple interfaces.
� OS independent
� unix implementation
� rpc
� udp or tcp
� Supports many of the design requirements already mentioned:
� Transparency,heterogeneity,efficiency,fault tolerance
� Limited achievement of:
� Concurrency,replication,consistency,security
Deepak John,CE-Poonjar
NFS architecture
Client computer Server computer
UNIXfile
system
NFSclient
NFSserver
UNIXfile
system
Applicationprogram
Applicationprogram
Virtual file systemVirtual file system
Oth
er
file
syste
m
UNIX kernel
system calls
NFSprotocol
(remote operations)
UNIX
Operations on local files
Operationson
remote files
Applicationprogram
NFSClient
KernelApplicationprogram
NFSClient
Client computer
Deepak John,CE-Poonjar
Virtual file system� access transparency
� part of unix kernel
� NFS file handle, 3 components:
� filesystem identifier
� different groups of files
� i-node (index node)
� structure for finding the file
� i-node generation number
� i-nodes are reused
� incremented when reused
� VFS
� struct for each file system
� v-node for each open file
� file handle for remote file
� i-node number for local fileDeepak John,CE-Poonjar
NFS access control and authentication
� Stateless server, so the user's identity and access rights must be
checked by the server on each request.
� In the local file system they are checked only on open()
� Every client request is accompanied by the userID and groupID
� Server is exposed to imposter attacks unless the userID and groupID
are protected by encryption
� Kerberos has been integrated with NFS to provide a stronger and
more comprehensive security solution
Deepak John,CE-Poonjar
• read(fh, offset, count) -> attr, data
• write(fh, offset, count, data) -> attr
• create(dirfh, name, attr) -> newfh, attr
• remove(dirfh, name) status
• getattr(fh) -> attr
• setattr(fh, attr) -> attr
• lookup(dirfh, name) -> fh, attr
• rename(dirfh, name, todirfh, toname)
• link(newdirfh, newname, dirfh, name)
• readdir(dirfh, cookie, count) ->
entries
NFS server operations (simplified)
fh = file handle:
Filesystem identifier i-node number i-node generation
Read(FileId, i, n) -> Data
Write(FileId, i, Data)
Create() -> FileId
Delete(FileId)
GetAttributes(FileId) -> Attr
SetAttributes(FileId, Attr)
Lookup(Dir, Name) -> FileId
AddName(Dir, Name, File)
UnName(Dir, Name)
GetNames(Dir, Pattern)
->NameSeq
Deepak John,CE-Poonjar
Mount service� the process of including a new file system is called mounting.
� communicates with the mount process on the server in a mountprotocol.
� Server maintains a table of clients who have mounted filesystems at
that server
� Each client maintains a table of mounted file systems holding.
� hard-mounted
� user process is suspended until request is successful
� when server is not responding
� request is retried until it's satisfied
� soft-mounted
� if server fails, client returns failure after a small number of retries
� user process handles the failureDeepak John,CE-Poonjar
THE MOUNT PROTOCOL:
The following operations occur:
1. The client's request is sent via RPC to the mount server ( onserver machine.)
2. Mount server checks export list containing
a) file systems that can be exported,
b) legal requesting clients.
c) It's legitimate to mount any directory within the legalfilesystem.
3. Server returns "file handle" to client.
4. Server maintains list of clients and mounted directories -- this isstate information! But this data is only a "hint" and isn't treated asessential.
5. Mounting often occurs automatically when client or server boots.Deepak John,CE-Poonjar
Local and remote file systems
jim jane joeann
usersstudents
usrvmunix
Client Server 2
. . . nfs
Remote
mountstaff
big bobjon
people
Server 1
export
(root)
Remote
mount
. . .
x
(root) (root)
Note: The file system mounted at /usr/students in the client is actually the sub-tree
located at /export/people in Server 1; the file system mounted at /usr/staff in the client is
actually the sub-tree located at /nfs/users in Server 2.
Deepak John,CE-Poonjar
Pathname translation
� pathname: /users/students/dc/abc
� server doesn't receive the entire pathname for translation,
� client breaks down the pathnames into parts
� iteratively translate each part
� translation is cached
Deepak John,CE-Poonjar
Automounter
NFS client catches attempts to access 'empty' mount points and routes
them to the Automounter
� Automounter has a table of mount points and multiple candidate
serves for each
� it sends a probe message to each candidate server and then uses the
mount service to mount the filesystem at the first server to respond
� Keeps the mount table small
Deepak John,CE-Poonjar
Server caching� caching file pages, directory/file attributes
� read-ahead: prefetch pages following the most-recently read file
pages
� delayed-write: write to disk when the page in memory is needed
for other purposes
� "sync" flushes "dirty" pages to disk every 30 seconds
� two write option
1. write-through: write to disk before replying to the client
2. cache and commit:
� stored in memory cache
� write to disk before replying to a "commit" request from the
clientDeepak John,CE-Poonjar
Client caching� caches results of read, write, getattr, lookup, readdir
� clients responsibility to poll the server for consistency
� timestamp-based methods for consistency validation
� Tc: time when the cache entry was last validated
� Tm: time when the block was last modified at the server
� cache entry is valid if:
1. T - Tc < t, where t is the freshness interval
� t is adaptively adjusted:
� files: 3 to 30 seconds depending on freq of updates
� directories: 30 to 60 seconds
2. Tmclient = Tmserver
� need validation for all cache accesses
Reading
Deepak John,CE-Poonjar
writing� dirty: modified page in cache
� flush to disk: file is closed or sync from client
� bio-daemon (block input-output)
� read-ahead: after each read request, request the next file block from
the server as well
� delayed write: after a block is filled, it's sent to the server
� reduce the time to wait for read/write
Deepak John,CE-Poonjar
Summary for NFS� access transparency: same system calls for local or remote files
� location transparency: could have a single name space for all files(depending on all the clients to agree the same name space)
� mobility transparency: mount table need to be updated on each client(not transparent)
� scalability: can usually support large loads, add processors, disks,servers...
� file replication: read-only replication, no support for replication offiles with updates
� hardware and OS: many ports
� fault tolerance: stateless and idempotent
� consistency: not quite one-copy for efficiency
� security: added encryption--Kerberos
� efficiency: pretty efficient, wide-spread useDeepak John,CE-Poonjar
Naming Services
� Definition
� Key benefits
� Resource localization
� Uniform naming
� Device independent address (e.g., you can move domain
name/web site from one server to another server seamlessly).
In a Distributed System, a Naming Service is a specific service
whose aim is to provide a consistent and uniform naming of
resources, this allowing other programs or services to localize
them and obtain the required metadata for interacting with
them.
Deepak John,CE-Poonjar
Requirements for name spaces
� Allow simple but meaningful names to be used
� Potentially infinite number of names
� Structured
� to allow similar subnames without clashes
� to group related names
� Management of trust
Deepak John,CE-Poonjar
Iterative navigation
Client1
2
3
A client iteratively contacts name servers NS1–NS3 in order to resolve a name
NS2
NS1
NS3
Nameservers
Iterative Navigation is the act of chaining multiple Naming Services in
order to resolve a single name to the corresponding resource.
Iterative Navigation is the act of chaining multiple Naming Services in
order to resolve a single name to the corresponding resource.
Deepak John,CE-Poonjar
Recursive Navigation
� The Iterative Navigation can be…
� Recursive:
� it is performed by the naming server
� the server becomes like a client for the next server
� Non recursive:
� it is performed by the client or the first server
� if it is performed by the server it is “server controlled”
� the server bounces back the next hop to its client
Deepak John,CE-Poonjar
Non-recursive and recursive server-controlled navigation
A name server NS1 communicates with other name servers on behalf of a client
Recursiveserver-controlled
1
2
3
5
4
client
NS2
NS1
NS3
1
2
34
client
NS2
NS1
NS3
Non-recursiveserver-controlled
DNS offers recursive navigation as an option, but iterative is the
standard technique. Recursive navigation must be used in domains
that limit client access to their DNS information for security reasons.Deepak John,CE-Poonjar
The SNS - a Simple Name Service model
� Stores attributes of named objects such as users, computers and
services and group names.
Users
Computers
Services
Email server, login info, encoded passwords,
home directory
Network addresses, architecture, OS, owner
Named object Value
Service address, version no.
Group Mailing lists, group1, group2,...
Deepak John,CE-Poonjar
SNS basic design requirements
� Specify the Types of named objects:
� users, services, computers and group names and directories.
� Other types of objects may be integrated;
� The names are used only within the organization;
� Efficient name lookup;
� Access control:
� everyone can read but Authorized write;
Deepak John,CE-Poonjar
DNS - The Internet Domain Name System
� A distributed naming database (specified in RFC 1034/1305)
� Name structure reflects administrative structure of the Internet
� Rapidly resolves domain names to IP addresses
� exploits caching heavily
� typical query time ~100 milliseconds
� Scales to millions of computers
� partitioned database
� caching
� replication
Deepak John,CE-Poonjar
Basic DNS algorithm for name resolution (domain name -> IP number)
• Look for the name in the local cache
• Try a superior DNS server, which responds with:
– another recommended DNS server
– the IP address (which may not be entirely up to date)
Deepak John,CE-Poonjar
DNS server functions and configuration
� Main function is to resolve domain names for computers, i.e. to get
their IP addresses.
� Other functions:
� reverse resolution - get domain name from IP address
� Host information - type of hardware and OS
� Well-known services - a list of well-known services offered by a
host
� Other attributes can be included (optional)
Deepak John,CE-Poonjar
DNS issues
� Name tables change infrequently, but when they do, caching can
result in the delivery of stale data.
� Clients are responsible for detecting this and recovering
� Its design makes changes to the structure of the name space
difficult. For example:
� merging previously separate domain trees under a new root
� moving subtrees to a different part of the structure (e.g. if
Scotland became a separate country, its domains should all be
moved to a new country-level domain.)
.
Deepak John,CE-Poonjar