parallel programming - paginas.fe.up.pt
TRANSCRIPT
![Page 1: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/1.jpg)
Parallel Programming
Parallel algorithms
Sorting
![Page 2: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/2.jpg)
2010@FEUP Parallel Algorithms - Sorting 2
Sequential Quicksort
17 14 65 4 22 63 11
Unordered list of values
![Page 3: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/3.jpg)
2010@FEUP Parallel Algorithms - Sorting 3
Sequential Quicksort
17 14 65 4 22 63 11
Choose pivot value
![Page 4: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/4.jpg)
2010@FEUP Parallel Algorithms - Sorting 4
Sequential Quicksort
1714 654 22 6311
Low list
( 17)
High list
(> 17)
![Page 5: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/5.jpg)
2010@FEUP Parallel Algorithms - Sorting 5
Sequential Quicksort
174 6511 22 6314
Recursively
apply quicksort
to low list
![Page 6: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/6.jpg)
2010@FEUP Parallel Algorithms - Sorting 6
Sequential Quicksort
174 2211 63 6514
Recursively
apply quicksort
to high list
![Page 7: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/7.jpg)
2010@FEUP Parallel Algorithms - Sorting 7
Sequential Quicksort
174 2211 63 6514
Sorted list of values
![Page 8: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/8.jpg)
2010@FEUP Parallel Algorithms - Sorting 8
Attributes of Sequential Quicksort
• Average-case time complexity: (n log n)
• Worst-case time complexity: (n2)
• Occurs when low, high lists maximally
unbalanced at every partitioning step
• Can make worst-case less probable by
using sampling to choose pivot value
• Example: “Median of 3” technique
![Page 9: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/9.jpg)
2010@FEUP Parallel Algorithms - Sorting 9
Quicksort Good Starting Point for
Parallel Algorithm
• Speed
• Generally recognized as fastest sort in average
case
• Preferable to base parallel algorithm on fastest
sequential algorithm
• Natural concurrency
• Recursive sorts of low, high lists can be done
in parallel
![Page 10: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/10.jpg)
2010@FEUP Parallel Algorithms - Sorting 10
Definitions of “Sorted”
• Definition 1: Sorted list held in memory of
a single processor
• Definition 2:
• Portion of list in every processor’s memory is
sorted
• Value of last element on Pi’s list is less than or
equal to value of first element on Pi+1
’s list
• We adopt Definition 2: Allows problem size
to scale with number of processors
![Page 11: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/11.jpg)
2010@FEUP Parallel Algorithms - Sorting 11
Parallel Quicksort
75, 91, 15, 64, 21, 8, 88, 54
50, 12, 47, 72, 65, 54, 66, 22
83, 66, 67, 0, 70, 98, 99, 82
20, 40, 89, 47, 19, 61, 86, 85
P0
P1
P2
P3
![Page 12: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/12.jpg)
2010@FEUP Parallel Algorithms - Sorting 12
Parallel Quicksort
75, 91, 15, 64, 21, 8, 88, 54
50, 12, 47, 72, 65, 54, 66, 22
83, 66, 67, 0, 70, 98, 99, 82
20, 40, 89, 47, 19, 61, 86, 85
P0
P1
P2
P3
Process P0
chooses and broadcasts
randomly chosen pivot value
![Page 13: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/13.jpg)
2010@FEUP Parallel Algorithms - Sorting 13
Parallel Quicksort
75, 91, 15, 64, 21, 8, 88, 54
50, 12, 47, 72, 65, 54, 66, 22
83, 66, 67, 0, 70, 98, 99, 82
20, 40, 89, 47, 19, 61, 86, 85
P0
P1
P2
P3
Exchange “lower half” and “upper half” values
![Page 14: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/14.jpg)
2010@FEUP Parallel Algorithms - Sorting 14
Parallel Quicksort
75, 15, 64, 21, 8, 54, 66, 67, 0, 70
50, 12, 47, 72, 65, 54, 66,
22, 20, 40, 47, 19, 61
83, 98, 99, 82, 91, 88
89, 86, 85
P0
P1
P2
P3
After exchange step
Lower
“half”
Upper
“half”
![Page 15: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/15.jpg)
2010@FEUP Parallel Algorithms - Sorting 15
Parallel Quicksort
75, 15, 64, 21, 8, 54, 66, 67, 0, 70
50, 12, 47, 72, 65, 54, 66,
22, 20, 40, 47, 19, 61
83, 98, 99, 82, 91, 88
89, 86, 85
P0
P1
P2
P3
Processes P0 and P2 choose and
broadcast randomly chosen pivots
Lower
“half”
Upper
“half”
![Page 16: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/16.jpg)
2010@FEUP Parallel Algorithms - Sorting 16
Parallel Quicksort
75, 15, 64, 21, 8, 54, 66, 67, 0, 70
50, 12, 47, 72, 65, 54, 66,
22, 20, 40, 47, 19, 61
83, 98, 99, 82, 91, 88
89, 86, 85
P0
P1
P2
P3
Exchange values
Lower
“half”
Upper
“half”
![Page 17: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/17.jpg)
2010@FEUP Parallel Algorithms - Sorting 17
Parallel Quicksort
15, 21, 8, 0, 12, 20, 19
50, 47, 72, 65, 54, 66, 22, 40,
47, 61, 75, 64, 54, 66, 67, 70
83, 82, 91, 88, 89, 86, 85
98, 99
P0
P1
P2
P3
After exchanging values
Lower “half”
of lower “half”
Lower “half”
of upper “half”
Upper “half”
of lower “half”
Upper “half”
of upper “half”
![Page 18: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/18.jpg)
2010@FEUP Parallel Algorithms - Sorting 18
Parallel Quicksort
0, 8, 12, 15, 19, 20, 21
22, 40, 47, 47, 50, 54, 54, 61,
64, 65, 66, 66, 67, 70, 72, 75
82, 83, 85, 86, 88, 89, 91
98, 99
P0
P1
P2
P3
Each processor sorts values it controls
Lower “half”
of lower “half”
Lower “half”
of upper “half”
Upper “half”
of lower “half”
Upper “half”
of upper “half”
![Page 19: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/19.jpg)
2010@FEUP Parallel Algorithms - Sorting 19
Analysis of Parallel Quicksort
• Execution time dictated by when last
process completes
• Algorithm likely to do a poor job balancing
number of elements sorted by each
process
• Cannot expect pivot value to be true
median
• Can choose a better pivot value
![Page 20: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/20.jpg)
2010@FEUP Parallel Algorithms - Sorting 20
Hyperquicksort
• Start where parallel quicksort ends: each
process sorts its sublist
• First “sortedness” condition is met
• To meet second, processes must still
exchange values
• Process can use median of its sorted list
as the pivot value
• This is much more likely to be close to the
true median
![Page 21: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/21.jpg)
2010@FEUP Parallel Algorithms - Sorting 21
Hyperquicksort
75, 91, 15, 64, 21, 8, 88, 54
50, 12, 47, 72, 65, 54, 66, 22
83, 66, 67, 0, 70, 98, 99, 82
20, 40, 89, 47, 19, 61, 86, 85
P0
P1
P2
P3
Number of processors is a power of 2
![Page 22: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/22.jpg)
2010@FEUP Parallel Algorithms - Sorting 22
Hyperquicksort
8, 15, 21, 54, 64, 75, 88, 91
12, 22, 47, 50, 54, 65, 66, 72
0, 66, 67, 70, 82, 83, 98, 99
19, 20, 40, 47, 61, 85, 86, 89
P0
P1
P2
P3
Each process sorts values it controls
![Page 23: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/23.jpg)
2010@FEUP Parallel Algorithms - Sorting 23
Hyperquicksort
8, 15, 21, 54, 64, 75, 91, 88
12, 22, 47, 50, 54, 65, 66, 72
0, 66, 67, 70, 82, 83, 98, 99
19, 20, 40, 47, 61, 85, 86, 89
P0
P1
P2
P3
Process P0
broadcasts its median value
![Page 24: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/24.jpg)
2010@FEUP Parallel Algorithms - Sorting 24
Hyperquicksort
8, 15, 21, 54, 64, 75, 91, 88
12, 22, 47, 50, 54, 65, 66, 72
0, 66, 67, 70, 82, 83, 98, 99
19, 20, 40, 47, 61, 85, 86, 89
P0
P1
P2
P3
Processes will exchange “low”, “high” lists
![Page 25: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/25.jpg)
2010@FEUP Parallel Algorithms - Sorting 25
Hyperquicksort
0, 8, 15, 21, 54
12, 19, 20, 22, 40, 47, 47, 50, 54
64, 66, 67, 70, 75, 82, 83, 88, 91, 98, 99
61, 65, 66, 72, 85, 86, 89
P0
P1
P2
P3
Processes merge kept and received values.
![Page 26: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/26.jpg)
2010@FEUP Parallel Algorithms - Sorting 26
Hyperquicksort
0, 8, 15, 21, 54
12, 19, 20, 22, 40, 47, 47, 50, 54
64, 66, 67, 70, 75, 82, 83, 88, 91, 98, 99
61, 65, 66, 72, 85, 86, 89
P0
P1
P2
P3
Processes P0and P
2broadcast median values.
![Page 27: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/27.jpg)
2010@FEUP Parallel Algorithms - Sorting 27
Hyperquicksort
0, 8, 15, 21, 54
12, 19, 20, 22, 40, 47, 47, 50, 54
64, 66, 67, 70, 75, 82, 83, 88, 91, 98, 99
61, 65, 66, 72, 85, 86, 89
P0
P1
P2
P3
Communication pattern for second exchange
![Page 28: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/28.jpg)
2010@FEUP Parallel Algorithms - Sorting 28
Hyperquicksort
0, 8, 12, 15
19, 20, 21, 22, 40, 47, 47, 50, 54, 54
61, 64, 65, 66, 66, 67, 70, 72, 75, 82
83, 85, 86, 88, 89, 91, 98, 99
P0
P1
P2
P3
After exchange-and-merge step
![Page 29: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/29.jpg)
2010@FEUP Parallel Algorithms - Sorting 29
Complexity Analysis Assumptions
• Average-case analysis
• Lists stay reasonably balanced
• Communication time dominated by
message transmission time, rather than
message latency
![Page 30: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/30.jpg)
2010@FEUP Parallel Algorithms - Sorting 30
Complexity Analysis
• Initial quicksort step has time complexity
((n/p) log (n/p))
• Total comparisons needed for log p merge
steps: ((n/p) log p)
• Total communication time for log p
exchange steps: ((n/p) log p)
![Page 31: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/31.jpg)
2010@FEUP Parallel Algorithms - Sorting 31
Isoefficiency Analysis
• Sequential time complexity: (n log n)
• Parallel overhead: (n log p)
• Isoefficiency relation:
n log n C n log p log n C log p n pC
• The value of C determines the scalability.
Scalability depends on ratio of communication
speed to computation speed.
1//)( CCC pppppM
![Page 32: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/32.jpg)
2010@FEUP Parallel Algorithms - Sorting 32
Another Scalability Concern
• Our analysis assumes lists remain
balanced
• As p increases, each processor’s share of
list decreases
• Hence as p increases, likelihood of lists
becoming unbalanced increases
• Unbalanced lists lower efficiency
• Would be better to get sample values from
all processes before choosing median
![Page 33: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/33.jpg)
2010@FEUP Parallel Algorithms - Sorting 33
Parallel Sorting by Regular Sampling
(PSRS Algorithm)
• Each process sorts its share of elements
• Each process selects regular sample of
sorted list
• One process gathers and sorts samples,
chooses pivot values from sorted sample
list, and broadcasts these pivot values
• Each process partitions its list into p
pieces, using pivot values
• Each process sends partitions to other
processes
• Each process merges its partitions
![Page 34: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/34.jpg)
2010@FEUP Parallel Algorithms - Sorting 34
PSRS Algorithm
75, 91, 15, 64, 21, 8, 88, 54
50, 12, 47, 72, 65, 54, 66, 22
83, 66, 67, 0, 70, 98, 99, 82
P0
P1
P2
Number of processors does not
have to be a power of 2.
![Page 35: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/35.jpg)
2010@FEUP Parallel Algorithms - Sorting 35
PSRS Algorithm
Each process sorts its list using quicksort.
8, 15, 21, 54, 64, 75, 88, 91
12, 22, 47, 50, 54, 65, 66, 72
0, 66, 67, 70, 82, 83, 98, 99
P0
P1
P2
![Page 36: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/36.jpg)
2010@FEUP Parallel Algorithms - Sorting 36
PSRS Algorithm
Each process chooses p regular samples.
8, 15, 21, 54, 64, 75, 88, 91
12, 22, 47, 50, 54, 65, 66, 72
0, 66, 67, 70, 82, 83, 98, 99
P0
P1
P2
![Page 37: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/37.jpg)
2010@FEUP Parallel Algorithms - Sorting 37
PSRS Algorithm
One process collects p2
regular samples.
15, 54, 75, 22, 50, 65, 66, 70, 83
8, 15, 21, 54, 64, 75, 88, 91
12, 22, 47, 50, 54, 65, 66, 72
0, 66, 67, 70, 82, 83, 98, 99
P0
P1
P2
![Page 38: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/38.jpg)
2010@FEUP Parallel Algorithms - Sorting 38
PSRS Algorithm
The same process sorts the p2
regular samples.
15, 22, 50, 54, 65, 66, 70, 75, 83
8, 15, 21, 54, 64, 75, 88, 91
12, 22, 47, 50, 54, 65, 66, 72
0, 66, 67, 70, 82, 83, 98, 99
P0
P1
P2
![Page 39: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/39.jpg)
2010@FEUP Parallel Algorithms - Sorting 39
PSRS Algorithm
The process then chooses p-1 pivot values.
15, 22, 50, 54, 65, 66, 70, 75, 83
8, 15, 21, 54, 64, 75, 88, 91
12, 22, 47, 50, 54, 65, 66, 72
0, 66, 67, 70, 82, 83, 98, 99
P0
P1
P2
![Page 40: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/40.jpg)
2010@FEUP Parallel Algorithms - Sorting 40
PSRS Algorithm
The process broadcasts the p-1 pivot values.
15, 22, 50, 54, 65, 66, 70, 75, 83
8, 15, 21, 54, 64, 75, 88, 91
12, 22, 47, 50, 54, 65, 66, 72
0, 66, 67, 70, 82, 83, 98, 99
P0
P1
P2
![Page 41: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/41.jpg)
2010@FEUP Parallel Algorithms - Sorting 41
PSRS Algorithm
Each process divides its list, based on the pivot values.
8, 15, 21, 54, 64, 75, 88, 91
12, 22, 47, 50, 54, 65, 66, 72
0, 66, 67, 70, 82, 83, 98, 99
P0
P1
P2
![Page 42: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/42.jpg)
2010@FEUP Parallel Algorithms - Sorting 42
PSRS Algorithm
Each process sends its partitions to
the correct destination process.
8, 15, 21 12, 22, 47, 50 0
54, 64 54, 65, 66 66
75, 88, 91 72 67, 70, 82, 83, 98, 99
P0
P1
P2
![Page 43: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/43.jpg)
2010@FEUP Parallel Algorithms - Sorting 43
PSRS Algorithm
Each process merges its p partitions.
0, 8, 12, 15, 21, 22, 47, 50
54, 54, 64, 65, 66, 66
67, 70, 72, 75, 82, 83, 88, 91, 98, 99
P0
P1
P2
![Page 44: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/44.jpg)
2010@FEUP Parallel Algorithms - Sorting 44
Assumptions
• Each process ends up merging close to n/p
elements
• Experimental results show this is a valid
assumption
• Processor interconnection network
supports p simultaneous message
transmissions at full speed (full-duplex
full switch)
![Page 45: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/45.jpg)
2010@FEUP Parallel Algorithms - Sorting 45
Time Complexity Analysis
• Computations
• Initial quicksort: ((n/p)log(n/p))
• Sorting regular samples: (p2
log p)
• Merging sorted sublists: ((n/p)log p
• Overall: ((n/p)(log n + log p) + p2log p)
• Communications
• Gather samples, broadcast pivots: (log p)
• All-to-all exchange: (n/p)
• Overall: (n/p)
![Page 46: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/46.jpg)
2010@FEUP Parallel Algorithms - Sorting 46
Isoefficiency Analysis
• Sequential time complexity: (n log n)
• Parallel overhead: (n log p)
• Isoefficiency relation:
n log n Cn log p log n C log p
• Scalability function same as for
hyperquicksort
• Scalability depends on ratio of
communication to computation speeds
![Page 47: Parallel Programming - paginas.fe.up.pt](https://reader031.vdocuments.us/reader031/viewer/2022012519/61941bb89f12463c4f314ec6/html5/thumbnails/47.jpg)
2010@FEUP Parallel Algorithms - Sorting 47
Summary
• Three parallel algorithms based on
quicksort
• Keeping list sizes balanced
• Parallel quicksort: poor
• Hyperquicksort: better
• PSRS algorithm: excellent
• Average number of times each key moved:
• Parallel quicksort and hyperquicksort: log p / 2
• PSRS algorithm: (p-1)/p