synchronization transformations for parallel computing
DESCRIPTION
Synchronization Transformations for Parallel Computing. Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara http://www.cs.ucsb.edu/~{pedro,martin}. Motivation. Parallel Computing Becomes Dominant Form of Computation - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Synchronization Transformations for Parallel Computing](https://reader031.vdocuments.us/reader031/viewer/2022013012/56814acb550346895db7e155/html5/thumbnails/1.jpg)
Synchronization Transformationsfor
Parallel Computing
Pedro Dinizand
Martin Rinard
Department of Computer ScienceUniversity of California, Santa Barbara
http://www.cs.ucsb.edu/~{pedro,martin}
![Page 2: Synchronization Transformations for Parallel Computing](https://reader031.vdocuments.us/reader031/viewer/2022013012/56814acb550346895db7e155/html5/thumbnails/2.jpg)
Motivation
Parallel Computing Becomes Dominant Form of Computation
Parallel Machines Require Parallel Software
Parallel Constructs Require New Analysis and Optimization Techniques
Our GoalEliminate Synchronization Overhead
![Page 3: Synchronization Transformations for Parallel Computing](https://reader031.vdocuments.us/reader031/viewer/2022013012/56814acb550346895db7e155/html5/thumbnails/3.jpg)
Talk Outline
• Motivation
• Model of Computation
• Synchronization Optimization Algorithm
• Applications Experience
• Dynamic Feedback
• Related Work
• Conclusions
![Page 4: Synchronization Transformations for Parallel Computing](https://reader031.vdocuments.us/reader031/viewer/2022013012/56814acb550346895db7e155/html5/thumbnails/4.jpg)
Model of Computation
• Parallel Programs• Serial Phases• Parallel Phases
• Single Address Space
• Atomic Operations on Shared Data• Mutual Exclusion Locks• Acquire Constructs• Release Constructs
Acq
S1MutualExclusionRegion
Rel
![Page 5: Synchronization Transformations for Parallel Computing](https://reader031.vdocuments.us/reader031/viewer/2022013012/56814acb550346895db7e155/html5/thumbnails/5.jpg)
Reducing Synchronization Overhead
Acq
S1
S2
Rel
S3
![Page 6: Synchronization Transformations for Parallel Computing](https://reader031.vdocuments.us/reader031/viewer/2022013012/56814acb550346895db7e155/html5/thumbnails/6.jpg)
Rel
Acq
![Page 7: Synchronization Transformations for Parallel Computing](https://reader031.vdocuments.us/reader031/viewer/2022013012/56814acb550346895db7e155/html5/thumbnails/7.jpg)
Synchronization Optimization
Idea:Replace Computations that Repeatedly Acquire and Release the Same Lock with a Computation that Acquires and Releases the Lock Only Once
Result:Reduction in the Number of
Executed Acquire and Release Constructs
Mechanism:Lock Movement Transformations and
Lock Cancellation Transformations
![Page 8: Synchronization Transformations for Parallel Computing](https://reader031.vdocuments.us/reader031/viewer/2022013012/56814acb550346895db7e155/html5/thumbnails/8.jpg)
Lock Cancellation
![Page 9: Synchronization Transformations for Parallel Computing](https://reader031.vdocuments.us/reader031/viewer/2022013012/56814acb550346895db7e155/html5/thumbnails/9.jpg)
Acquire Lock Movement
![Page 10: Synchronization Transformations for Parallel Computing](https://reader031.vdocuments.us/reader031/viewer/2022013012/56814acb550346895db7e155/html5/thumbnails/10.jpg)
Release Lock Movement
![Page 11: Synchronization Transformations for Parallel Computing](https://reader031.vdocuments.us/reader031/viewer/2022013012/56814acb550346895db7e155/html5/thumbnails/11.jpg)
Synchronization Optimization Algorithm
Overview:
• Find Two Mutual Exclusion Regions With the Same Lock
• Expand Mutual Exclusion Regions Using Lock Movement Transformations Until They are Adjacent
• Coalesce Using Lock Cancellation Transformation to Form a Single Larger Mutual Exclusion Region
![Page 12: Synchronization Transformations for Parallel Computing](https://reader031.vdocuments.us/reader031/viewer/2022013012/56814acb550346895db7e155/html5/thumbnails/12.jpg)
Interprocedural Control Flow Graph
![Page 13: Synchronization Transformations for Parallel Computing](https://reader031.vdocuments.us/reader031/viewer/2022013012/56814acb550346895db7e155/html5/thumbnails/13.jpg)
Acquire Movement Paths
![Page 14: Synchronization Transformations for Parallel Computing](https://reader031.vdocuments.us/reader031/viewer/2022013012/56814acb550346895db7e155/html5/thumbnails/14.jpg)
Release Movement Paths
![Page 15: Synchronization Transformations for Parallel Computing](https://reader031.vdocuments.us/reader031/viewer/2022013012/56814acb550346895db7e155/html5/thumbnails/15.jpg)
Migration Paths and Meeting Edge
![Page 16: Synchronization Transformations for Parallel Computing](https://reader031.vdocuments.us/reader031/viewer/2022013012/56814acb550346895db7e155/html5/thumbnails/16.jpg)
Intersection of Paths
![Page 17: Synchronization Transformations for Parallel Computing](https://reader031.vdocuments.us/reader031/viewer/2022013012/56814acb550346895db7e155/html5/thumbnails/17.jpg)
Compensation Nodes
![Page 18: Synchronization Transformations for Parallel Computing](https://reader031.vdocuments.us/reader031/viewer/2022013012/56814acb550346895db7e155/html5/thumbnails/18.jpg)
Final Result
![Page 19: Synchronization Transformations for Parallel Computing](https://reader031.vdocuments.us/reader031/viewer/2022013012/56814acb550346895db7e155/html5/thumbnails/19.jpg)
Synchronization Optimization Trade-Off
• Advantage: • Reduces Number of Executed Acquires and Releases• Reduces Acquire and Release Overhead
• Disadvantage: May Introduce False Exclusion• Multiple Processors Attempt to Acquire Same Lock• Processor Holding the Lock is Executing Code that
was Originally in No Mutual Exclusion Region
![Page 20: Synchronization Transformations for Parallel Computing](https://reader031.vdocuments.us/reader031/viewer/2022013012/56814acb550346895db7e155/html5/thumbnails/20.jpg)
False Exclusion Policy
Goal: Limit Potential Severity of False Exclusion
Mechanism: Constrain the Application of Basic
Transformations
• Original: Never Apply Transformations• Bounded: Apply Transformations only on
Cycle-Free Subgraphs of ICFG
• Aggressive: Always apply Transformations
![Page 21: Synchronization Transformations for Parallel Computing](https://reader031.vdocuments.us/reader031/viewer/2022013012/56814acb550346895db7e155/html5/thumbnails/21.jpg)
Experimental Results
• Automatic Parallelizing Compiler Based on Commutativity Analysis [PLDI’96]
• Set of Complete Scientific Applications (C++ subset)• Barnes-Hut N-Body Solver (1500 lines of Code)• Liquid Water Simulation Code (1850 lines of Code)• Seismic Modeling String Code (2050 lines of Code)
• Different False Exclusion Policies
• Performance of Generated Parallel Code on Stanford DASH Shared-Memory Multiprocessor
![Page 22: Synchronization Transformations for Parallel Computing](https://reader031.vdocuments.us/reader031/viewer/2022013012/56814acb550346895db7e155/html5/thumbnails/22.jpg)
Lock Overhead
0
20
40
60
Perc
enta
ge L
ock
Ove
rhea
d
Barnes-Hut (16K Particles)
Original
Bounded
Aggressive
Percentage of Time that the Single Processor Execution Spends Acquiring and Releasing Mutual Exculsion Locks
0
20
40
60
Perc
enta
ge L
ock
Ove
rhea
d
Water (512 Molecules)
Original
BoundedAggressive
0
20
40
60
Perc
enta
ge L
ock
Ove
rhea
d
String (Big Well Model)
OriginalAggressive
![Page 23: Synchronization Transformations for Parallel Computing](https://reader031.vdocuments.us/reader031/viewer/2022013012/56814acb550346895db7e155/html5/thumbnails/23.jpg)
Contention OverheadC
onte
ntio
n Pe
rcen
tage
Percentage of Time that Processors Spend Waiting to Acquire Locks Held by Other Processors
100
0
25
50
75
0 4 8 12 16Processors
Barnes-Hut (16K Bodies)
0
25
50
75
100
0 4 8 12 16Processors
Water (512 Molecules)
0
25
50
75
100
0 4 8 12 16Processors
String (Big Well Model)
OriginalBoundedAggressive
![Page 24: Synchronization Transformations for Parallel Computing](https://reader031.vdocuments.us/reader031/viewer/2022013012/56814acb550346895db7e155/html5/thumbnails/24.jpg)
0
2
4
6
8
10
12
14
16
Spe
edup
0 2 4 6 8 10 12 14 16Number of Processors
Ideal
Aggressive
Bounded
Original
Barnes-Hut (16384 bodies)
Performance Results : Barnes-Hut
![Page 25: Synchronization Transformations for Parallel Computing](https://reader031.vdocuments.us/reader031/viewer/2022013012/56814acb550346895db7e155/html5/thumbnails/25.jpg)
Performance Results: Water
Ideal
Aggressive
Bounded
Original
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10 12 14 16
Spe
edup
Number of Processors
Water (512 Molecules)
![Page 26: Synchronization Transformations for Parallel Computing](https://reader031.vdocuments.us/reader031/viewer/2022013012/56814acb550346895db7e155/html5/thumbnails/26.jpg)
Performance Results: String
String (Big Well Model)
Spe
edup
Number of Processors
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10 12 14 16
Ideal
Original
Aggressive
![Page 27: Synchronization Transformations for Parallel Computing](https://reader031.vdocuments.us/reader031/viewer/2022013012/56814acb550346895db7e155/html5/thumbnails/27.jpg)
Choosing Best Policy
• Best False Exclusion Policy May Depend On• Topology of Data Structures• Dynamic Schedule Of Computation
• Information Required to Choose Best Policy Unavailable at Compile Time
• Complications• Different Phases May Have Different Best Policy• In Same Phase, Best Policy May Change Over Time
![Page 28: Synchronization Transformations for Parallel Computing](https://reader031.vdocuments.us/reader031/viewer/2022013012/56814acb550346895db7e155/html5/thumbnails/28.jpg)
Solution: Dynamic Feedback
• Generated Code Consists of• Sampling Phases: Measure Performance of Different
Policies• Production Phases : Use Best Policy From Sampling
Phase
• Periodically Resample to Discover Changes in Best Policy
• Guaranteed Performance Bounds
![Page 29: Synchronization Transformations for Parallel Computing](https://reader031.vdocuments.us/reader031/viewer/2022013012/56814acb550346895db7e155/html5/thumbnails/29.jpg)
Dynamic Feedback
AggressiveOriginalBounded
Time
Ove
rhea
d
Sampling Phase Production Phase Sampling Phase
AggressiveCodeVersion
![Page 30: Synchronization Transformations for Parallel Computing](https://reader031.vdocuments.us/reader031/viewer/2022013012/56814acb550346895db7e155/html5/thumbnails/30.jpg)
Dynamic Feedback : Barnes-Hut
0
2
4
6
8
10
12
14
16
Spe
edup
0 2 4 6 8 10
12
14
16Number of Processors
Ideal
Aggressive
Dynamic Feedback
Bounded
Original
Barnes-Hut (16384 bodies)
![Page 31: Synchronization Transformations for Parallel Computing](https://reader031.vdocuments.us/reader031/viewer/2022013012/56814acb550346895db7e155/html5/thumbnails/31.jpg)
Dynamic Feedback : Water
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10 12 14 16
Spe
edup
Number of Processors
Ideal
Bounded
Original
Aggressive
Dynamic Feedback
Water (512 Molecules)
![Page 32: Synchronization Transformations for Parallel Computing](https://reader031.vdocuments.us/reader031/viewer/2022013012/56814acb550346895db7e155/html5/thumbnails/32.jpg)
Dynamic Feedback : String
String (BigWell Model)
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10 12 14 16
Spe
edup
Number of Processors
Ideal
Original
Aggressive
Dynamic Feedback
![Page 33: Synchronization Transformations for Parallel Computing](https://reader031.vdocuments.us/reader031/viewer/2022013012/56814acb550346895db7e155/html5/thumbnails/33.jpg)
Related Work
• Parallel Loop Optimizations (e.g. [Tseng:PPoPP95])
• Array-based Scientific Computations• Barriers vs. Cheaper Mechanisms
• Concurrent Object-Oriented Programs (e.g. [PZC:POPL95])
• Merge Access Regions for Invocations of Exclusive Methods
• Concurrent Constraint Programming• Bring Together Ask and Tell Constructs
• Efficient Synchronization Algorithms• Efficient Implementations of Synchronization
Primitives
![Page 34: Synchronization Transformations for Parallel Computing](https://reader031.vdocuments.us/reader031/viewer/2022013012/56814acb550346895db7e155/html5/thumbnails/34.jpg)
Conclusions
• Synchronization Optimizations• Basic Synchronization Transformations for Locks• Synchronization Optimization Algorithm
• Integrated into Prototype Parallelizing Compiler• Object-Based Programs with Dynamic Data Structures• Commutativity Analysis
• Experimental Results• Optimizations Have a Significant Performance Impact• With Optimizations, Applications Perform Well
• Dynamic Feedback