cs704 d distributedmutualexcclusion&memory
DESCRIPTION
Distributed mutual exclusion, lamprort algorithm, Ricart Agarwala algorithm, Distributed memory systemTRANSCRIPT
CS 704D Advanced OSDebasis Das
CS 704D Advanced OS 2
Distributed
Mutual Exclusion
CS 704D Advanced OS 3
ComplexitiesIn distributed systems
Absence of shared memory
Inter-node communication delays can be
considerable
Global system state cannot be observed by
constituent machines due to communication
delays, component failures, absence of shared
memory
Many more modes of failures yet fail soft is a
goal
CS 704D Advanced OS 4
Some ConsiderationsPolicies/strategies developed for a distributed
system can be made applicable in a uniprocessor case
However, policies/strategies developed for uniprocessor case cannot be extended to distributed case
Same can be simulated by adding a central resources allocator
Increase traffic to central allocator, the system will fail when the allocator fails
Election of a successor would be needed
CS 704D Advanced OS 5
Required AssumptionsMessages exchanged by a pair of communicating
processes need to be received in the same order as
they were generated (pipelining property)
Every message is received without errors, no
duplicates
The underlying network ensures all nodes are fully
connected. Any node can communicate with every
other node
CS 704D Advanced OS 6
Desirable Properties of Algorithms
All nodes should have equal amount of information
Each node makes decisions on the basis of local information. The algorithm should ensure that nodes make consistent & coherent decisions
All nodes reach decisions through about equal effort
Failure of a node should not cause complete break down. The ability of reaching a decision and accessing the resources should not be affected
CS 704D Advanced OS 7
Time & Ordering of EventsHappened Before Relationship
Logical clock needs to ensureIf a and b are events in the same process and
a comes before b then a->bIf event a is a representation of sending of a
message and b is that of receiving of message in another process the a->b
It is a transitive relationship; that is if a->b and b->c then a->c
If a and b has no happened before relationship the a and b are said to be concurrent
CS 704D Advanced OS 8
Time & Ordering of EventsLogical Clock Properties
If a->b the C(a) < C(b)Clock condition is satisfied if
If a and b are events in a process Pi and if a comes before b the Ci(a)< Ci(b)
If a is an event sending message m by process Pi and b is the receipt of message by process Pj then Ci(a) <Cj(b)
CS 704D Advanced OS 9
Time & Ordering of EventsLogical Clock Implementation
Process Pi increments the clock Ci between
successive events
Message m needs to be time stamped so that
T(m)=ci(a)
Receiving process adjusts clock such that it is
max of (Cj+1, Tm)
CS 704D Advanced OS 10
Total Orderingab only when
Ci(a) “less than” Cj(b) orCi(a) = Cj(b) and Pi “less than” Pj
Simple way to implement “less than” relation would be to assign a unique number to each process and define the “less than” such that i < j.
CS 704D Advanced OS 11
Lamport’s Algorithm Initiator i: Process Pi requires an exclusive access to a
resource. Sends time stamped message request (Ti, i) where Ti = Ci to all the other processes.
Other processes(j, j not= i): When Pj receives the request, places the request on its own queue, send a reply with time stamp (Tj, j) to Process Pi
Pi is allowed access only whenPi request is in front of the queue andAll replies are time stamped later that the Pi time stamp
Pi sends a release message by sending a release message, time stamped suitably
Pj removes Pi request from it request queueCost: 3 (N-1) messages, works best on bus based system where
broadcast costs are minimal
CS 704D Advanced OS 12
Ricart-Agarwala Algorithm Initiator i: Process Pi requires an exclusive access to a resource.
Sends time stamped message request (Ti, i) where Ti = Ci to all the other processes.
Other processes(j, j not= i): When Pj receives the request reacts as follows, If Pj is not requesting the resource, it sends a time stamped reply If Pj needs the resource and the time stamp precedes the Pi’s time
stamp Pi’s request is retained, else a time stamped reply is returned.Pi is allowed access only when
Pi request is in front of the queue andAll replies are time stamped later that the Pi time stamp
Pi sends a releases resource by sending a release message, for each pending resources
Cost: 2(N-1) messages
CS 704D Advanced OS 13
Distributed Shared MemoryA software abstraction over the loosely
coupled systemsProvides a shared memory kind of operation
over the underlying IPC/RPC mechanismsCan be implemented in OS kernel or runtime
systemAlso known as Distributed Shared Virtual
Memory System (DSVM)The shared space exists only virtually
CS 704D Advanced OS 14
DSM Architecture
Distributed Shared Memory Layer
Memory
Mapping
CPU(s)
Memory
Mapping
CPU(s)
Memory
Mapping
CPU(s)
Communication Network
CS 704D Advanced OS 15
DSM ArchitectureUnlike tightly coupled systems, this shared memory is
entirely virtualPartitioned into blocksLocal memory is treated as large local cachesIf the data requested is not available locally a network
fault is generatedOS, through a message, requests the node holding the
block and gets it migrated to the node where fault occurred
Data may be replicated locallyConfiguration varies depending on what kind of
replication, migration policies are used
CS 704D Advanced OS 16
Design issuesGranularity (block size): Smaller size, higher
faults, traffic; larger blocks mean jobs with higher locality
Structure: Layout of data, depends on applicationCoherence & access synchronization: Like the
cache situation in a uniprocessor systemData Location & access: what data to be
replicated, locatedReplacement strategyThrashingHeterogeneity
CS 704D Advanced OS 17
Granularity
CS 704D Advanced OS 18
Block Size Selection FactorsLarge block sizes favored as overheads to transfer
smaller blocks and larger one not too differentPaging overhead- paging overheads also favors
larger block sizes, application should thus have larger locality of reference
Directory size-smaller block larger directory, larger management overhead
Thrashing- thrashing is likely to increase with larger block size
False sharing-larger block sizes increases probability. Consequence, higher thrashing
CS 704D Advanced OS 19
Page Size as Block SizePage size is preferred as the DSM block sizeAdvantages are
Existing page fault hardware can be used as block fault mechanism. Memory coherence can be handled in page fault handlers
Access control can be managed with existing memory mapping systems
If page size is less than packet size, no extra overhead
Page size proved to be, over time, the right unit as far as memory contention
CS 704D Advanced OS 20
Structure
CS 704D Advanced OS 21
Structure of Shared Memory SpaceApproaches to structuring
No structure: a linear array of memory, easy to
design
By data type: granularity per variable,
complex to handle
As database: as tuple space, associative
memory, primitives need to be added to
languages, non transparent access to shared
data
CS 704D Advanced OS 22
Consistency Models
CS 704D Advanced OS 23
Consistency ModelsStrict consistency
Sequential consistency
Causal consistency
Pipelined random access memory consistency
Processor consistency
Weak consistency
Release consistency
CS 704D Advanced OS 24
Strict Consistency ModelValue read of a memory address is the same
as the latest write at that address
Writes become visible to all nodes
Needs absolute ordering of memory
read/write operations, a global time required
(to define most recent)
Nearly impossible to implement
CS 704D Advanced OS 25
Sequential Consistency ModelAll processes should see the same ordering of
read, writes
Exact interleaving does not matter
No memory operation is started unless
earlier operations have completed
Acceptable in most applications
CS 704D Advanced OS 26
Causal Consistency ModelOperations are seen in same order (correct
order)when they are causally relatedW2 follows w1 and causally related, then w1,
w2 is the order every process should seeThey may not be seen in same order when
not related causally
CS 704D Advanced OS 27
Pipelined RAM Consistency ModelAll writes of a single process are seen in the
same order by other processes (as in a pipeline)
However, writes by other processes may appear in different order.(W11,w12) and (w21, w22) can be seen as
(wi1,wi2) followed by (w21, w22) or (w21, w22) followed by (w11,w12)
Simple to implement
CS 704D Advanced OS 28
Processor Consistency ModelAdds memory coherence to the PRAM model
That is if the writes are for a particular
memory location then all processes should
see the writes in the same order that
maintains memory coherence
CS 704D Advanced OS 29
Weak Consistency Model Changes in memory can be made after a set of changes has happened (example
critical section)
Isolated access to variable is usually rare, usually there will be several accesses
and then none at all
Difficulty is the system would not know when to show the changes
Application programmers can take care of this through a synchronization
variable
Necessarily
All accesses to sync variable must follow strongest consistency9sequential)
All pending writes must be completed before access to sync variable is allowed
All previous access to sync must be completed before another access is allowed
CS 704D Advanced OS 30
Release Consistency ModelWeak consistency model requires that
All changes made by a process are propagated to all nodes
All changes at other nodes are propagated to the processor node
Acquire and release variable used for sync so that only one of the operations above need to be done
CS 704D Advanced OS 31
Discussion of ModelsStrict sequential model s difficult to
implement, almost never implementedSequential consistency model is most
commonly usedCausal, PRAM, processor, weak and release
consistency are the ones implemented in many DSM systems, programmers need to intervene
Weak and release consistency provides explicit sync variables to help with the consistency
CS 704D Advanced OS 32
Implementing Sequential Concurrency Model
Implementing sequential consistency would depend on what replication/ migration are allowed
Migration/Replication strategiesNon replicated, non migrating blocks
(NRNMBs)Non replicated, migrating blocks (NRMBs)Replicated, migrating blocks (RMBs)Replicated, non migrating blocks (RNMBs)
CS 704D Advanced OS 33
NRNMBAll requests to a block are routed through the
OS and MMU to this one block that is not replicate and does not move anywhere
Can causeBottleneck because of serializing of memory
accessesParallelism is not possible
CS 704D Advanced OS 34
NRMBNo copies, if required entire block may be
moved to the node that requires itAdvantages
No communication costs, all accesses are localApplications can take advantage of locality,
applications with high locality will perform better
DisadvantagesProne to thrashingNo advantage of parallelism
CS 704D Advanced OS 35
Data Locating in NRMBBroadcast
Fault happens, a request is broadcast, current owner sends the block
Broadcast cause communication overheadsCentralized server
Request sent to the server, servers asks the node holding the block to send it to the requesting node, updates location information
Fixed distributed server Fault handler finds mapping of block to the specific server, send
request and gets the blockDynamic distributed server
Fault causes a local search for probable owner, goes to that node, finds another probable owner or the block, gets block updates info
CS 704D Advanced OS 36
RMBReplication is required to increase
parallelismReads can be done locally, writes has
overheadsHigh read/write ratio systems can apportion
the write overhead over many readsMaintaining coherence throughout replicated
block is an issueTwo basic protocols used are
Write-invalidateWrite update
CS 704D Advanced OS 37
Coherence ProtocolsWrite-invalidate
On write fault, the fault handler copies the block from one of the nodes to its own
Invalidates all the copies, writes dataIf another node needs it now, the updated block
is replicatedWrite update
On write fault, copy block to local node, update data
Send address & new data to all the replicasOperation resumes after all the writes are done
CS 704D Advanced OS 38
ComparisonWrite update typically needs a global
sequencer to makes sure all nodes see writes in the same sequence
Also the operations are full writesTogether there is a significant communication
overheadWrite invalidate does not need all that, just a
invalidation signalWrite invalidate is thus more often used
method
CS 704D Advanced OS 39
Data Locating in RMB StrategyOwner of a block needs to be located, the
most recent node which had write accessNode that has a valid copy will need to be
trackedUse on of the following
BroadcastingCentralized server algorithmFixed distributed server algorithmDynamic distributed server algorithm
CS 704D Advanced OS 40
RNMBReplicas are maintained but blocks do not
migrate
Consistency is maintained by updating all the
replicas by a write update like process
CS 704D Advanced OS 41
Data Locating in RNMB Strategy
Replica locations do not change
Replicas are kept consistent
Read requests can go to the nodes that has
the data block
Writes through global sequencer
CS 704D Advanced OS 42
Munin: A Release Consistent DSM SystemStructure: a collection of shared variablesEach shared variable goes to a separate
memory pageacquireLock and releaselock are usedDifferent consistency protocol is applied for
different types of shared variable used in the systemRead-only, migratory, write-shared, producer-
consumer, result, reduction and conventional
CS 704D Advanced OS 43
Replacement Strategy
CS 704D Advanced OS 44
Replacement StrategyShared memory blocks are replicated and/or
migrated so two strategies need to be
decided
Block to be replaced
Where should the replaced block go
CS 704D Advanced OS 45
Blocks to ReplaceUsage based vs. non-usage basedFixed space vs. variable space
UnusedNilRead onlyRead-owned Writable
CS 704D Advanced OS 46
Place for Replacement BlockUsing secondary store locally
Using memory space of other nodes- store at
free memory space in some other node. Free
memory space status need to be exchanged,
piggybacking on normal communication
messages
CS 704D Advanced OS 47
Thrashing
CS 704D Advanced OS 48
Thrashing SituationsDSM allows migration, so migration back and
forth leads to thrashing
Bata blocks keep migrating between nodes due
to interleaved accesses by processes
Read only blocks are repeatedly invalidated so
after replication
CS 704D Advanced OS 49
Thrashing Reduction StrategiesApplication controlled locks
Locking an application to a node for a time,
deciding t could be a very difficult issue
Tune coherence strategy to the usage pattern,
transparency of the memory system is
compromised
CS 704D Advanced OS 50
Other Approaches to DSM
CS 704D Advanced OS 51
Approaches Data caching managed by the OS
Data Caching managed by MMUs
Data Caching managed by the language run
time system
CS 704D Advanced OS 52
Heterogeneous DSM
CS 704D Advanced OS 53
Features of Heterogeneous DSMData Conversion
Structuring DSM as a source of source language objects
Allowing one type of data in a block only (has complications) Memory fragmentation Compilation issues Entire page is converted but a small part may be
used before transfer Not transparent, user provided conversion may be
required
CS 704D Advanced OS 54
Advantages of DSM
CS 704D Advanced OS 55
AdvantagesSimpler abstraction
Better portability of distributed applications
Better performance of some Systems
Flexible communications environment
Ease of process migration