techniques for reducing consistency-related communication in distributed shared-memory systems
DESCRIPTION
TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS. J. B. Carter University of Utah J. K. Bennett and W. Zwaenepoel Rice University. INTRODUCTION. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS](https://reader031.vdocuments.us/reader031/viewer/2022012917/568152cb550346895dc0e59e/html5/thumbnails/1.jpg)
TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED
SHARED-MEMORY SYSTEMS
J. B. CarterUniversity of Utah
J. K. Bennett and W. ZwaenepoelRice University
![Page 2: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS](https://reader031.vdocuments.us/reader031/viewer/2022012917/568152cb550346895dc0e59e/html5/thumbnails/2.jpg)
INTRODUCTION
• Distributed shared memory is a software abstraction allowing a set of workstations connected by a LAN to share a single paged virtual address space
• Key issue in building a software DSM is minimizing the amount of data communication among the workstation memories
![Page 3: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS](https://reader031.vdocuments.us/reader031/viewer/2022012917/568152cb550346895dc0e59e/html5/thumbnails/3.jpg)
Why bother with DSM?
• Key idea is to build fast parallel computers that– are cheaper than conventional architectures– are convenient to use
• Conventional parallel computer architecture was the shared memory multiprocessor
![Page 4: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS](https://reader031.vdocuments.us/reader031/viewer/2022012917/568152cb550346895dc0e59e/html5/thumbnails/4.jpg)
CPU
Shared memory
Conventional parallel architecture
CACHE CACHE CACHE CACHE
CPU CPU CPU
![Page 5: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS](https://reader031.vdocuments.us/reader031/viewer/2022012917/568152cb550346895dc0e59e/html5/thumbnails/5.jpg)
Today’s architecture
• Clusters of workstations are much more cost effective– No need to develop complex bus and cache
structures– Can use off-the-shelf networking hardware
• Gigabit Ethernet • Myrinet (1.5 Gb/s)
– Can quickly integrate newest microprocessors
![Page 6: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS](https://reader031.vdocuments.us/reader031/viewer/2022012917/568152cb550346895dc0e59e/html5/thumbnails/6.jpg)
Limitations of cluster approach
• Communication within a cluster of workstation is through message passing– Much harder to program than concurrent
access to a shared memory• Many big programs were written for shared
memory architectures– Converting them to a message passing
architecture is a nightmare
![Page 7: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS](https://reader031.vdocuments.us/reader031/viewer/2022012917/568152cb550346895dc0e59e/html5/thumbnails/7.jpg)
Distributed shared memory
DSM = one shared global address space
main memories
![Page 8: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS](https://reader031.vdocuments.us/reader031/viewer/2022012917/568152cb550346895dc0e59e/html5/thumbnails/8.jpg)
Distributed shared memory
• DSM makes a cluster of workstations look like a shared memory parallel computer– Easier to write new programs– Easier to port existing programs
• Key problem is that DSM only provides the illusion of having a shared memory architecture– Data must still move back and forth among
the workstations
![Page 9: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS](https://reader031.vdocuments.us/reader031/viewer/2022012917/568152cb550346895dc0e59e/html5/thumbnails/9.jpg)
Characterizing a DSM (I)
• Four important issues:1. Size of transfer units (level of granularity)
• Big units are more efficient– Virtual memory pages
• Can have false sharing whenever page contains different variables that are accessed at the same time by different processors
![Page 10: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS](https://reader031.vdocuments.us/reader031/viewer/2022012917/568152cb550346895dc0e59e/html5/thumbnails/10.jpg)
False Sharing
accesses x accesses y
x y
page containing x and y will move back and forthbetween main memories of workstations
![Page 11: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS](https://reader031.vdocuments.us/reader031/viewer/2022012917/568152cb550346895dc0e59e/html5/thumbnails/11.jpg)
Characterizing a DSM (II)
2. Consistency model• Strict consistency is not possible• Various authors have proposed weak
consistency models–Cheaper to implement–Harder to use in a correct fashion
![Page 12: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS](https://reader031.vdocuments.us/reader031/viewer/2022012917/568152cb550346895dc0e59e/html5/thumbnails/12.jpg)
Characterizing a DSM (III)
3. Portability of programs• Some DSMs allow programs written for a
multiprocessor architecture to run on a cluster of workstations without any modifications (dusty decks)
• More efficient DSMs require more changes4. Portability of DSM
• Some DSMs require specific OS features
![Page 13: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS](https://reader031.vdocuments.us/reader031/viewer/2022012917/568152cb550346895dc0e59e/html5/thumbnails/13.jpg)
MUNIN
• Developed at Rice University• Based on software objects (variables)• Uses the processor virtual memory to detect
access to the shared objects• Includes several techniques for reducing
consistency-related communication• Only runs on top of V kernel
![Page 14: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS](https://reader031.vdocuments.us/reader031/viewer/2022012917/568152cb550346895dc0e59e/html5/thumbnails/14.jpg)
Key features
• Software release consistency: only requires the memory to be consistent at specific synchronization points,
• Multiple consistency protocols: allow the user to select the best consistency protocols for each data item,
• Write-shared protocols: reduce false sharing,• An update-with-timeout mechanism
![Page 15: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS](https://reader031.vdocuments.us/reader031/viewer/2022012917/568152cb550346895dc0e59e/html5/thumbnails/15.jpg)
SW RELEASE CONSISTENCY (I)
• Well-written parallel programs use locks to achieve mutual exclusion when they access shared variables– P(&mutex) and V(&mutex)– lock(&csect) and unlock(&csect) – request ( ) and release( )
• Unprotected accesses can produce unpredictable results
![Page 16: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS](https://reader031.vdocuments.us/reader031/viewer/2022012917/568152cb550346895dc0e59e/html5/thumbnails/16.jpg)
SW RELEASE CONSISTENCY (II)
• SW release consistency will only guarantee correctness of operations within a request/release pair
• No need to propagate new values of shared variables until the release
• Must guarantee that workstation has received the most recent values of all shared variables when it completes a request
![Page 17: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS](https://reader031.vdocuments.us/reader031/viewer/2022012917/568152cb550346895dc0e59e/html5/thumbnails/17.jpg)
SW RELEASE CONSISTENCY (III)shared int x;request( );
x = 1;release ( );// propagate x=1
shared int x;
request( );// wait for new value of x
x++;release ( );// propagate x=2
![Page 18: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS](https://reader031.vdocuments.us/reader031/viewer/2022012917/568152cb550346895dc0e59e/html5/thumbnails/18.jpg)
SW RELEASE CONSISTENCY (IV)
• Munin uses eager release: new values of shared variables are propagated at release time– Lazy release delays propagation until a
request is issued (Threadmarks)
• A workstation issuing a request gets the current values of all shared variables– Shared variables are not associated to a
particular critical section (as in Midway)
![Page 19: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS](https://reader031.vdocuments.us/reader031/viewer/2022012917/568152cb550346895dc0e59e/html5/thumbnails/19.jpg)
Munin Implementation (I)
• Three kinds of variables: 1. Ordinary variables: can only be accessed by
the process that created them2. Shared data variables: should always be
accessed from within critical regions3. Synchronization variables:
• locks, barriers or condition variables• must be accessed through special library
procedures .
![Page 20: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS](https://reader031.vdocuments.us/reader031/viewer/2022012917/568152cb550346895dc0e59e/html5/thumbnails/20.jpg)
Munin Implementation (II)
• When a processor modifies shared data inside a critical region, all update messages are buffered and delayed until the processor leaves the critical region
• Processes accessing shared data variables outside critical regions do it at their own risks– Same as with shared memory model– Risk is higher
![Page 21: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS](https://reader031.vdocuments.us/reader031/viewer/2022012917/568152cb550346895dc0e59e/html5/thumbnails/21.jpg)
FOUR CONSISTENCY PROTOCOLS
1. Conventional shared variables: – Replicated on demand – Single writer/multiple readers policy uses
an invalidation-based protocol 2. Read-only variables:
– Replicated on demand– Any attempt to modify them will result in a
runtime error
![Page 22: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS](https://reader031.vdocuments.us/reader031/viewer/2022012917/568152cb550346895dc0e59e/html5/thumbnails/22.jpg)
FOUR CONSISTENCY PROTOCOLS
3. Migratory variables:– Migrated among the processes accessing
them– Every process accessing them will always
get full read and write access4. Write-shared variables:
– Can be updated concurrently because different portions of the page are accessed
![Page 23: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS](https://reader031.vdocuments.us/reader031/viewer/2022012917/568152cb550346895dc0e59e/html5/thumbnails/23.jpg)
Implementation
• Programmer uses annotations to specify any of the last three consistency protocols– Read-only variables– Migratory variables– Write-shared variables
• Incorrect annotations may result in inefficient performance or in runtime errors but not in incorrect results
![Page 24: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS](https://reader031.vdocuments.us/reader031/viewer/2022012917/568152cb550346895dc0e59e/html5/thumbnails/24.jpg)
WRITE-SHARED PROTOCOL (I)
• Designed to fight false sharing• Uses a copy-on-write mechanism• Whenever a process is granted access to write-
shared data, the page containing these data is marked copy-on-write
• First attempt to modify the contents of the page will result in the creation of a copy of the page modified (the twin).
![Page 25: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS](https://reader031.vdocuments.us/reader031/viewer/2022012917/568152cb550346895dc0e59e/html5/thumbnails/25.jpg)
x = 1
y = 2
x = 1
y = 2
First write access
twin
x = 3
y = 2
Before
After
Compare with twinNew value of x is 3
Example
![Page 26: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS](https://reader031.vdocuments.us/reader031/viewer/2022012917/568152cb550346895dc0e59e/html5/thumbnails/26.jpg)
WRITE-SHARED PROTOCOL (II)
• At release time, the DSM will perform a word by word comparison of the page and its twin, store the diff in the space used by the twin page and notify all processors having a copy of the shared data of the update
• A runtime switch can be set to check for conflicting updates to write-shared data.
![Page 27: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS](https://reader031.vdocuments.us/reader031/viewer/2022012917/568152cb550346895dc0e59e/html5/thumbnails/27.jpg)
UPDATE TIME-OUT MECHANISM
• Munin does not send updates to processors holding stale replicas
• Anytime a processor receives an update for a page for which it does not have a twin, the page is marked supervisor-only and the time of receipt of the update is recorded.
• First local access to the page will cause a trap that will remove the restriction
![Page 28: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS](https://reader031.vdocuments.us/reader031/viewer/2022012917/568152cb550346895dc0e59e/html5/thumbnails/28.jpg)
UPDATE TIME-OUT MECHANISM
• When a process receives an update for a page that is still marked supervisor only, it checks the timestamp of the last update
• If more than 50 ms have elapsed, process notifies the originator of the update not to send more updates and invalidates the page.
![Page 29: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS](https://reader031.vdocuments.us/reader031/viewer/2022012917/568152cb550346895dc0e59e/html5/thumbnails/29.jpg)
CONCLUSIONS (I)
• The strongest point of Munin is its excellent performance– typically within 5 to 33% of the performances
of hand-coded message passing versions of the same programs
• Its major limitation is its dependence of some features of the V kernel
![Page 30: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS](https://reader031.vdocuments.us/reader031/viewer/2022012917/568152cb550346895dc0e59e/html5/thumbnails/30.jpg)
CONCLUSIONS (II)
• Munin requires programs to access shared data from within critical regions or after barriers– Appears to be a reasonable requirement
• Munin allows users to tune the performance of their programs by selecting the best consistency protocol for each shared variable– Can quickly become a tedious process
![Page 31: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS](https://reader031.vdocuments.us/reader031/viewer/2022012917/568152cb550346895dc0e59e/html5/thumbnails/31.jpg)
FURTHER DEVELOPMENTS
• Same team has come with a successor to Munin named TreadMarks
• Key differences are:– TreadMarks uses a more complex
lazy release protocol– TreadMarks is UNIX-based
• More portable