dm rasanjalee himali csc8320 – advanced operating systems (section 2.6) fall 2009

Distributed SystemsMajor Design Issues

DM Rasanjalee Himali

CSc8320 – Advanced Operating Systems (SECTION 2.6)

FALL 2009

Section I

The Basics

Introduction A distributed system consist of concurrent

processes accessing distributed resources

Resources are shared through message passing in a network environment that may be unreliable and contain untrusted components.

Major Design Issues

1. Object Models and Naming Schemes2. Distributed Coordination3. Interprocess Communication4. Distributed Resources5. Fault Tolerance and Security

1. Object Models and Naming Schemes

Objects in Computer System:◦ Ex:

Processes, data files, memory, devices, processors, networks

◦ Are represented by set of allowable operations of the object

◦ Physical details of the object are transparent to other objects

Object Servers:◦ Is the process that manages the object◦ Objects are encapsulated in servers◦ Only visible entities in the system are servers◦ Ex:

process servers, file servers, memory servers etc.◦ A client is a null server that accesses the object server


Identifying Server:◦ To contact a server, server must be

identifiable.

◦ Three identification methods:1. Identification by name2. Identification by physical or logical address3. Identification by service that servers provide

1. Identification by Name:◦ Names are generally assumed to be unique◦ But multiple addresses for same server may exist , and needs to change if server

moves◦ Names are more intuitive than addresses

2. Identification by physical or logical address◦ Name to logical address mapping is done by name server in OS.◦ logical address to physical address mapping is a network service◦ The PORT used by many systems is a logical address.◦ Associating more than one port to server provide multiple entry points to server

3. Identification by service that servers provide◦ Multiple servers can share the same port◦ This can be used for service identification in distributed system.◦ Client is only interested in requested service◦ Who provide the service is irrelevant◦ Multiple servers can provide the same service◦ This approach is critical to implement an autonomous system.◦ A resolution protocol is needed to translate service to server



Object models and naming :

◦ Must be addressed early in the system design as many things depend on the naming scheme:

◦ Ex: Structure of the system Management of the namespace Name resolution Access methods

2. Distributed Coordination Interacting concurrent processes require

coordination to achieve synchronization.

Types of Synchronization Requirements:◦ In general there are three types of synchronization

requirements:

1. Barrier Synchronization◦ A set of processes or events must reach a common

synchronization point before they can continue

2. Condition coordination◦ A process or event must wait for a condition that will be set

asynchronously by other interacting processes to maintain some ordering of execution

3. Mutual Exclusion◦ Concurrent processes must have mutual exclusion when accessing

a critical shared resource

2. Distributed Coordination Synchronization Implies the need for the knowledge

of state information about other processes

Problems with Synchronization:1. Complete State of information is difficult to obtain

◦ Ex: ◦ no shared memory environment

◦ Solution: ◦ Use message passing to convey state information

2. Inaccurate or Incomplete information ◦ Ex:

◦ message transfer delays◦ Solution:

◦ Use centralized coordinator that move from one process to another (no single point of failure)

2. Distributed Coordination3. Deadlock of Processes

Interacting processes can lead to deadlock Deadlock :Circular waiting of processors

Problem: Sometimes it is not practical to implement deadlock prevention or

avoidance strategies in a distributed system Solution:

Detect and recover from deadlocks

Problem: Detection of deadlocks in a distributed system is non-trivial (b’s global

state of the system is not available) Who should initiate the detection algorithm? How the algorithm be implemented in distributed fashion by message

passing? Who should be the victim in order to abort and resolve the deadlock? How the victim can be recovered? Efficiency of the of deadlock resolution and recovery seems more than that

of detection

2. Distributed Coordination

Distributed solutions to synchronization and deadlock problems:

◦ Use partial global state for decision making Many applications do not need absolute global

knowledge of the system

◦ Exchange of local knowledge among cooperating sites

3. Interprocessor Communication Communication:

◦ Most important issue in any distributed system

◦ In OSs, interaction between processes and information flow between objects depend on communication

◦ Message passing is the only means of communication in distributed system

◦ Goal: Have transparency in communication by providing higher level

communication methods that hide the physical details of the message passing

◦ Two concepts are used to achieve this goal: Client/Server model Remote Procedure Calls (RPC)

3. Interprocessor Communication Client/Server model:

◦ Programming paradigm for structuring processing in distributed systems

◦ All system interactions are viewed as a pair of message exchanges Client process send request to server Server responds with a reply message

Remote Procedure Calls:◦ Client/Server request/reply message exchange is represented

as a procedure call in programming languages◦ RPC: Procedure call to a remote server

3. Interprocessor Communication Multicast and Broadcast:

◦ Client/Server, RPC : Unicast (point-to-point)

◦ Notion of “groups” is inherent to distributed systems

◦ Processes cooperate in group activities

◦ Group communication in distributed systems is logical multicast (perhaps without broadcasting hardware)

◦ Communication needs to go through several layers of protocols and be propagated to a no. of physically distributed nodes.

◦ Thus it is more susceptible to failures in the system

◦ Reliable and atomic group broadcast remains an open issue in distributed systems

4. Distributed Resources Only resource needed for computation are data and

processing

Data: may reside physically in distributed memory or secondary

storage

Processing Capacity: Aggregate processing power of all processors

Goal: Achieve transparency in allocating processing capacity

processes (distributing processes/load to the processors )

4. Distributed Resources Static Load Distribution:

Also called multiprocessor scheduling Goal:

minimize completion time of a set of related processes Issue:

Communication overhead on design of scheduling strategies

Dynamic Load Distribution: Also called load sharing Goal:

Maximize utilization of set of processes Issue:

Process migration

4. Distributed Resources Distributed Shared Memory:

Transparent memory system Assume data resides in distributed memory modules Present single shared memory view of physically distributed

memories Goal:

Maximize transparency

Other issues (for distributed file systems & distributed shared memory):

Sharing & replication of data Need protocols to maintain consistency & coherency of data Existence of replicas should be transparent to the user

5. Fault Tolerance & Security Distributed systems are vulnerable to failures and

security threats

Failures: Faults due to unintentional intrusion

Security Violations: Faults due to intentional intrusion

Dependable Distributed System: Fault tolerant system System faults are transparent to the user

5. Fault Tolerance & Security Solution for Failures:

Redundancy in the system: Is an inherent property of distributed systems as data and resources can be

replicated

Rollback: Recovery from failures requires rolling back the execution of failed process

and other affected processes The execution state must be kept for rollback recovery (difficult task in

distributed systems)

Solution for Security: Issues :

Trustworthiness of the communicating processes Confidentiality and integrity of messages & data Authentication & Authorization

Solutions: Authentication : Clients , servers & messages must be authenticated Authorization : access control across physical network with heterogeneous

components under different administrative units, using different security models

Section II

Related Work

Related Work Peer-to-Peer Networks:

distributed network architecture composed of participants that

make a portion of their resources available directly to their peers without intermediary network hosts or servers.

Peers are both suppliers and consumers of resources

◦ Research: Security and privacy in P2P

systems Resource discovery/management

in P2P systems

Related Work Peer-to-Peer Search

BFS – Breadth First Search

Random BFS

(-) sacrifices performance and network utilization for simplicity

(+) guarantees high hit rates at the expense of a large no. of messages

(-) RBFS algorithm is probabilistic and the query might not reach some large network segments

(+) does not require global knowledge

Section III

Future Work

Future Work Develop a model for P2P Search

Bayesian Inferencing Value of Information

Extend P2P search for P2P Web Search Most centralized Web search engines currently find it harder

to catch up with the growth in information needs Local & decentralized global directory

Semantic P2P Overlay Networks Node connections be influenced by content / existence of

multiple overlay networks based on content Dynamic restructuring of overlay

References Randy Chow, Theodore Johnson, “Distributed

Operating Systems & Algorithms”, Addison Wesley, 1997

Semantic Overlay Networks for P2P Systems, Arturo Crespo and Hector Garcia-Molina, 2002

Random walks in peer-to-peer networks: algorithms and evaluation , Christos Gkantsidis, Milena Mihail, Amin Saberi , 2006

www.en.wikipedia.com

dm rasanjalee himali csc8320 – advanced operating systems (section 2.6) fall 2009

Documents

process servers

servers provideobject

memory servers

naming schemesobject

file servers

system design

irrelevantmultiple servers

objectsobject servers