cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OSDebasis Das

CS 704D Advanced OS 2

Distributed

Mutual Exclusion


ComplexitiesIn distributed systems

Absence of shared memory

Inter-node communication delays can be

considerable

Global system state cannot be observed by

constituent machines due to communication

delays, component failures, absence of shared

memory

Many more modes of failures yet fail soft is a

goal


Some ConsiderationsPolicies/strategies developed for a distributed

system can be made applicable in a uniprocessor case

However, policies/strategies developed for uniprocessor case cannot be extended to distributed case

Same can be simulated by adding a central resources allocator

Increase traffic to central allocator, the system will fail when the allocator fails

Election of a successor would be needed


Required AssumptionsMessages exchanged by a pair of communicating

processes need to be received in the same order as

they were generated (pipelining property)

Every message is received without errors, no

duplicates

The underlying network ensures all nodes are fully

connected. Any node can communicate with every

other node


Desirable Properties of Algorithms

All nodes should have equal amount of information

Each node makes decisions on the basis of local information. The algorithm should ensure that nodes make consistent & coherent decisions

All nodes reach decisions through about equal effort

Failure of a node should not cause complete break down. The ability of reaching a decision and accessing the resources should not be affected


Time & Ordering of EventsHappened Before Relationship

Logical clock needs to ensureIf a and b are events in the same process and

a comes before b then a->bIf event a is a representation of sending of a

message and b is that of receiving of message in another process the a->b

It is a transitive relationship; that is if a->b and b->c then a->c

If a and b has no happened before relationship the a and b are said to be concurrent


Time & Ordering of EventsLogical Clock Properties

If a->b the C(a) < C(b)Clock condition is satisfied if

If a and b are events in a process Pi and if a comes before b the Ci(a)< Ci(b)

If a is an event sending message m by process Pi and b is the receipt of message by process Pj then Ci(a) <Cj(b)


Time & Ordering of EventsLogical Clock Implementation

Process Pi increments the clock Ci between

successive events

Message m needs to be time stamped so that

T(m)=ci(a)

Receiving process adjusts clock such that it is

max of (Cj+1, Tm)


Total Orderingab only when

Ci(a) “less than” Cj(b) orCi(a) = Cj(b) and Pi “less than” Pj

Simple way to implement “less than” relation would be to assign a unique number to each process and define the “less than” such that i < j.


Lamport’s Algorithm Initiator i: Process Pi requires an exclusive access to a

resource. Sends time stamped message request (Ti, i) where Ti = Ci to all the other processes.

Other processes(j, j not= i): When Pj receives the request, places the request on its own queue, send a reply with time stamp (Tj, j) to Process Pi

Pi is allowed access only whenPi request is in front of the queue andAll replies are time stamped later that the Pi time stamp

Pi sends a release message by sending a release message, time stamped suitably

Pj removes Pi request from it request queueCost: 3 (N-1) messages, works best on bus based system where

broadcast costs are minimal


Ricart-Agarwala Algorithm Initiator i: Process Pi requires an exclusive access to a resource.

Sends time stamped message request (Ti, i) where Ti = Ci to all the other processes.

Other processes(j, j not= i): When Pj receives the request reacts as follows, If Pj is not requesting the resource, it sends a time stamped reply If Pj needs the resource and the time stamp precedes the Pi’s time

stamp Pi’s request is retained, else a time stamped reply is returned.Pi is allowed access only when

Pi request is in front of the queue andAll replies are time stamped later that the Pi time stamp

Pi sends a releases resource by sending a release message, for each pending resources

Cost: 2(N-1) messages


Distributed Shared MemoryA software abstraction over the loosely

coupled systemsProvides a shared memory kind of operation

over the underlying IPC/RPC mechanismsCan be implemented in OS kernel or runtime

systemAlso known as Distributed Shared Virtual

Memory System (DSVM)The shared space exists only virtually


DSM Architecture

Distributed Shared Memory Layer

Memory

Mapping

CPU(s)

Memory

Mapping

CPU(s)

Memory

Mapping

CPU(s)

Communication Network


DSM ArchitectureUnlike tightly coupled systems, this shared memory is

entirely virtualPartitioned into blocksLocal memory is treated as large local cachesIf the data requested is not available locally a network

fault is generatedOS, through a message, requests the node holding the

block and gets it migrated to the node where fault occurred

Data may be replicated locallyConfiguration varies depending on what kind of

replication, migration policies are used


Design issuesGranularity (block size): Smaller size, higher

faults, traffic; larger blocks mean jobs with higher locality

Structure: Layout of data, depends on applicationCoherence & access synchronization: Like the

cache situation in a uniprocessor systemData Location & access: what data to be

replicated, locatedReplacement strategyThrashingHeterogeneity


Granularity


Block Size Selection FactorsLarge block sizes favored as overheads to transfer

smaller blocks and larger one not too differentPaging overhead- paging overheads also favors

larger block sizes, application should thus have larger locality of reference

Directory size-smaller block larger directory, larger management overhead

Thrashing- thrashing is likely to increase with larger block size

False sharing-larger block sizes increases probability. Consequence, higher thrashing


Page Size as Block SizePage size is preferred as the DSM block sizeAdvantages are

Existing page fault hardware can be used as block fault mechanism. Memory coherence can be handled in page fault handlers

Access control can be managed with existing memory mapping systems

If page size is less than packet size, no extra overhead

Page size proved to be, over time, the right unit as far as memory contention


Structure


Structure of Shared Memory SpaceApproaches to structuring

No structure: a linear array of memory, easy to

design

By data type: granularity per variable,

complex to handle

As database: as tuple space, associative

memory, primitives need to be added to

languages, non transparent access to shared

data


Consistency Models


Consistency ModelsStrict consistency

Sequential consistency

Causal consistency

Pipelined random access memory consistency

Processor consistency

Weak consistency

Release consistency


Strict Consistency ModelValue read of a memory address is the same

as the latest write at that address

Writes become visible to all nodes

Needs absolute ordering of memory

read/write operations, a global time required

(to define most recent)

Nearly impossible to implement


Sequential Consistency ModelAll processes should see the same ordering of

read, writes

Exact interleaving does not matter

No memory operation is started unless

earlier operations have completed

Acceptable in most applications


Causal Consistency ModelOperations are seen in same order (correct

order)when they are causally relatedW2 follows w1 and causally related, then w1,

w2 is the order every process should seeThey may not be seen in same order when

not related causally


Pipelined RAM Consistency ModelAll writes of a single process are seen in the

same order by other processes (as in a pipeline)

However, writes by other processes may appear in different order.(W11,w12) and (w21, w22) can be seen as

(wi1,wi2) followed by (w21, w22) or (w21, w22) followed by (w11,w12)

Simple to implement


Processor Consistency ModelAdds memory coherence to the PRAM model

That is if the writes are for a particular

memory location then all processes should

see the writes in the same order that

maintains memory coherence


Weak Consistency Model Changes in memory can be made after a set of changes has happened (example

critical section)

Isolated access to variable is usually rare, usually there will be several accesses

and then none at all

Difficulty is the system would not know when to show the changes

Application programmers can take care of this through a synchronization

variable

Necessarily

All accesses to sync variable must follow strongest consistency9sequential)

All pending writes must be completed before access to sync variable is allowed

All previous access to sync must be completed before another access is allowed


Release Consistency ModelWeak consistency model requires that

All changes made by a process are propagated to all nodes

All changes at other nodes are propagated to the processor node

Acquire and release variable used for sync so that only one of the operations above need to be done


Discussion of ModelsStrict sequential model s difficult to

implement, almost never implementedSequential consistency model is most

commonly usedCausal, PRAM, processor, weak and release

consistency are the ones implemented in many DSM systems, programmers need to intervene

Weak and release consistency provides explicit sync variables to help with the consistency


Implementing Sequential Concurrency Model

Implementing sequential consistency would depend on what replication/ migration are allowed

Migration/Replication strategiesNon replicated, non migrating blocks

(NRNMBs)Non replicated, migrating blocks (NRMBs)Replicated, migrating blocks (RMBs)Replicated, non migrating blocks (RNMBs)


NRNMBAll requests to a block are routed through the

OS and MMU to this one block that is not replicate and does not move anywhere

Can causeBottleneck because of serializing of memory

accessesParallelism is not possible


NRMBNo copies, if required entire block may be

moved to the node that requires itAdvantages

No communication costs, all accesses are localApplications can take advantage of locality,

applications with high locality will perform better

DisadvantagesProne to thrashingNo advantage of parallelism


Data Locating in NRMBBroadcast

Fault happens, a request is broadcast, current owner sends the block

Broadcast cause communication overheadsCentralized server

Request sent to the server, servers asks the node holding the block to send it to the requesting node, updates location information

Fixed distributed server Fault handler finds mapping of block to the specific server, send

request and gets the blockDynamic distributed server

Fault causes a local search for probable owner, goes to that node, finds another probable owner or the block, gets block updates info


RMBReplication is required to increase

parallelismReads can be done locally, writes has

overheadsHigh read/write ratio systems can apportion

the write overhead over many readsMaintaining coherence throughout replicated

block is an issueTwo basic protocols used are

Write-invalidateWrite update


Coherence ProtocolsWrite-invalidate

On write fault, the fault handler copies the block from one of the nodes to its own

Invalidates all the copies, writes dataIf another node needs it now, the updated block

is replicatedWrite update

On write fault, copy block to local node, update data

Send address & new data to all the replicasOperation resumes after all the writes are done


ComparisonWrite update typically needs a global

sequencer to makes sure all nodes see writes in the same sequence

Also the operations are full writesTogether there is a significant communication

overheadWrite invalidate does not need all that, just a

invalidation signalWrite invalidate is thus more often used

method


Data Locating in RMB StrategyOwner of a block needs to be located, the

most recent node which had write accessNode that has a valid copy will need to be

trackedUse on of the following

BroadcastingCentralized server algorithmFixed distributed server algorithmDynamic distributed server algorithm


RNMBReplicas are maintained but blocks do not

migrate

Consistency is maintained by updating all the

replicas by a write update like process


Data Locating in RNMB Strategy

Replica locations do not change

Replicas are kept consistent

Read requests can go to the nodes that has

the data block

Writes through global sequencer


Munin: A Release Consistent DSM SystemStructure: a collection of shared variablesEach shared variable goes to a separate

memory pageacquireLock and releaselock are usedDifferent consistency protocol is applied for

different types of shared variable used in the systemRead-only, migratory, write-shared, producer-

consumer, result, reduction and conventional


Replacement Strategy


Replacement StrategyShared memory blocks are replicated and/or

migrated so two strategies need to be

decided

Block to be replaced

Where should the replaced block go


Blocks to ReplaceUsage based vs. non-usage basedFixed space vs. variable space

UnusedNilRead onlyRead-owned Writable


Place for Replacement BlockUsing secondary store locally

Using memory space of other nodes- store at

free memory space in some other node. Free

memory space status need to be exchanged,

piggybacking on normal communication

messages


Thrashing


Thrashing SituationsDSM allows migration, so migration back and

forth leads to thrashing

Bata blocks keep migrating between nodes due

to interleaved accesses by processes

Read only blocks are repeatedly invalidated so

after replication


Thrashing Reduction StrategiesApplication controlled locks

Locking an application to a node for a time,

deciding t could be a very difficult issue

Tune coherence strategy to the usage pattern,

transparency of the memory system is

compromised


Other Approaches to DSM


Approaches Data caching managed by the OS

Data Caching managed by MMUs

Data Caching managed by the language run

time system


Heterogeneous DSM


Features of Heterogeneous DSMData Conversion

Structuring DSM as a source of source language objects

Allowing one type of data in a block only (has complications) Memory fragmentation Compilation issues Entire page is converted but a small part may be

used before transfer Not transparent, user provided conversion may be

required


Advantages of DSM


AdvantagesSimpler abstraction

Better portability of distributed applications

Better performance of some Systems

Flexible communications environment

Ease of process migration

cs704 d distributedmutualexcclusion&memory

Technology

pi time stamppi

time stamped suitablypj

time stamped replyif

process pj

release message

dadvanced os11

dadvanced os8

dadvanced os2