parallelising dynamic programming
TRANSCRIPT
![Page 1: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/1.jpg)
Parallelising Dynamic Programming
Raphael Reitzig
University of KaiserslauternDepartment of Computer ScienceAlgorithms and Complexity Group
September 27th, 2012
![Page 2: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/2.jpg)
VisionCompile dynamic programming recurrences into efficient parallelcode.
![Page 3: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/3.jpg)
Goal 1Understand what efficiency means in parallel algorithms.
Goal 2Characterise dynamic programming recurrences in a suitable way.
Goal 3Find and implement efficient parallel algorithms for DP.
![Page 4: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/4.jpg)
Goal 1Understand what efficiency means in parallel algorithms.
Goal 2Characterise dynamic programming recurrences in a suitable way.
Goal 3Find and implement efficient parallel algorithms for DP.
![Page 5: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/5.jpg)
Goal 1Understand what efficiency means in parallel algorithms.
Goal 2Characterise dynamic programming recurrences in a suitable way.
Goal 3Find and implement efficient parallel algorithms for DP.
![Page 6: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/6.jpg)
Analysing Parallelism
![Page 7: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/7.jpg)
Complexity theory
Classifies problems
Focuses on inherent parallelism
Answers: How many processors do you need to be really faston inputs of a given size?
But......p grows with n – no statement about constant p and growing n!
![Page 8: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/8.jpg)
Complexity theory
Classifies problems
Focuses on inherent parallelism
Answers: How many processors do you need to be really faston inputs of a given size?
But......p grows with n – no statement about constant p and growing n!
![Page 9: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/9.jpg)
Complexity theory
Classifies problems
Focuses on inherent parallelism
Answers: How many processors do you need to be really faston inputs of a given size?
But......p grows with n – no statement about constant p and growing n!
![Page 10: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/10.jpg)
Complexity theory
Classifies problems
Focuses on inherent parallelism
Answers: How many processors do you need to be really faston inputs of a given size?
But......p grows with n – no statement about constant p and growing n!
![Page 11: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/11.jpg)
Amdahl’s law
Parallel speedup ≤ 11−γ+ γ
p.
Answers: How many processors can you utilise on given inputs?
But......does not capture growth of n!
![Page 12: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/12.jpg)
Amdahl’s law
Parallel speedup ≤ 11−γ+ γ
p.
Answers: How many processors can you utilise on given inputs?
But......does not capture growth of n!
![Page 13: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/13.jpg)
Amdahl’s law
Parallel speedup ≤ 11−γ+ γ
p.
Answers: How many processors can you utilise on given inputs?
But......does not capture growth of n!
![Page 14: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/14.jpg)
Work and depth
Work W = T A1 and depth D = T A
∞
Brent’s Law: A with Wp ≤ T A
p < Wp + D is possible in a certain
setting.
But......has limited applicability and D can be slippery!
![Page 15: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/15.jpg)
Work and depth
Work W = T A1 and depth D = T A
∞
Brent’s Law: A with Wp ≤ T A
p < Wp + D is possible in a certain
setting.
But......has limited applicability and D can be slippery!
![Page 16: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/16.jpg)
Work and depth
Work W = T A1 and depth D = T A
∞
Brent’s Law: A with Wp ≤ T A
p < Wp + D is possible in a certain
setting.
But......has limited applicability and D can be slippery!
![Page 17: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/17.jpg)
Relative runtimes
Speedup SAp :=
TA1
TAp
Efficiency EAp := TB
p·TAp
But......what are good values?
Clear: SAp ∈ [0, p] and EA
p ∈ [0, 1]
– but we can certainly not alwayshit the optima!
![Page 18: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/18.jpg)
Relative runtimes
Speedup SAp :=
TA1
TAp
Efficiency EAp := TB
p·TAp
But......what are good values?
Clear: SAp ∈ [0, p] and EA
p ∈ [0, 1]
– but we can certainly not alwayshit the optima!
![Page 19: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/19.jpg)
Relative runtimes
Speedup SAp :=
TA1
TAp
Efficiency EAp := TB
p·TAp
But......what are good values?
Clear: SAp ∈ [0, p] and EA
p ∈ [0, 1]
– but we can certainly not alwayshit the optima!
![Page 20: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/20.jpg)
Relative runtimes
Speedup SAp :=
TA1
TAp
Efficiency EAp := TB
p·TAp
But......what are good values?
Clear: SAp ∈ [0, p] and EA
p ∈ [0, 1]
– but we can certainly not alwayshit the optima!
![Page 21: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/21.jpg)
Relative runtimes
Speedup SAp :=
TA1
TAp
Efficiency EAp := TB
p·TAp
But......what are good values?
Clear: SAp ∈ [0, p] and EA
p ∈ [0, 1] – but we can certainly not alwayshit the optima!
![Page 22: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/22.jpg)
Proposal: Asymptotic relative runtimes
Definition
SAp(∞) := lim inf
n→∞SAp(n)
?= p
EAp (∞) := lim inf
n→∞EAp (n)
?= 1
GoalFind parallel algorithms that are asymptotically as scalable andefficient as possible for all p.
![Page 23: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/23.jpg)
Proposal: Asymptotic relative runtimes
Definition
SAp(∞) := lim inf
n→∞SAp(n)
?= p
EAp (∞) := lim inf
n→∞EAp (n)
?= 1
GoalFind parallel algorithms that are asymptotically as scalable andefficient as possible for all p.
![Page 24: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/24.jpg)
Disclaimer
This means:A good parallel algorithm can utilise any number of processors ifthe inputs are large enough.
Not:More processors are always better.
Just as in sequential algorithmics.
![Page 25: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/25.jpg)
Disclaimer
This means:A good parallel algorithm can utilise any number of processors ifthe inputs are large enough.
Not:More processors are always better.
Just as in sequential algorithmics.
![Page 26: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/26.jpg)
Disclaimer
This means:A good parallel algorithm can utilise any number of processors ifthe inputs are large enough.
Not:More processors are always better.
Just as in sequential algorithmics.
![Page 27: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/27.jpg)
Afterthoughts
Machine modelKeep it simple: (P)RAM with p processors and spawn/join.
Which quantities to analyse?
Elementary operations, memory accesses, inter-threadcommunication, ...
Implicit interaction – blocking, communication via memory, ... – isinvisible in code!
![Page 28: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/28.jpg)
Afterthoughts
Machine modelKeep it simple: (P)RAM with p processors and spawn/join.
Which quantities to analyse?
Elementary operations, memory accesses, inter-threadcommunication, ...
Implicit interaction – blocking, communication via memory, ... – isinvisible in code!
![Page 29: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/29.jpg)
Afterthoughts
Machine modelKeep it simple: (P)RAM with p processors and spawn/join.
Which quantities to analyse?
Elementary operations, memory accesses, inter-threadcommunication, ...Implicit interaction – blocking, communication via memory, ... – isinvisible in code!
![Page 30: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/30.jpg)
Attacking Dynamic Programming
![Page 31: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/31.jpg)
Disclaimer
Only two dimensions
Only finite domains
Only rectangular domains
Memoisation-table point-of-view
![Page 32: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/32.jpg)
Reducing to dependencies
e(i , j) :=
0 i = j = 0
j i = 0 ∧ j > 0
i i > 0 ∧ j = 0
min
e(i − 1, j) + 1
e(i , j − 1) + 1
e(i − 1, j − 1) + [ vi 6= wj ]
else
![Page 33: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/33.jpg)
Reducing to dependencies
e(i , j) :=
0 i = j = 0
j i = 0 ∧ j > 0
i i > 0 ∧ j = 0
min
e(i − 1, j) + 1
e(i , j − 1) + 1
e(i − 1, j − 1) + [ vi 6= wj ]
else
![Page 34: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/34.jpg)
Gold standard
![Page 35: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/35.jpg)
?
![Page 36: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/36.jpg)
?
![Page 37: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/37.jpg)
?
![Page 38: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/38.jpg)
?
![Page 39: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/39.jpg)
?
![Page 40: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/40.jpg)
![Page 41: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/41.jpg)
![Page 42: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/42.jpg)
![Page 43: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/43.jpg)
?
![Page 44: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/44.jpg)
![Page 45: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/45.jpg)
![Page 46: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/46.jpg)
![Page 47: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/47.jpg)
![Page 48: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/48.jpg)
Simplification
DL D DR
UL U UR
L R
![Page 49: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/49.jpg)
Three cases
Impossible
Possible
![Page 50: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/50.jpg)
Three cases
Impossible
Possible
![Page 51: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/51.jpg)
Three cases
Assuming dependencies are area-complete and uniform, there areonly three cases up to symmetry:
![Page 52: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/52.jpg)
Facing Reality
![Page 53: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/53.jpg)
Challenges
Contention
Method of synchronisation
Metal issues (moving threads, cache sync)
![Page 54: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/54.jpg)
Challenges
Contention
Method of synchronisation
Metal issues (moving threads, cache sync)
![Page 55: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/55.jpg)
Challenges
Contention
Method of synchronisation
Metal issues (moving threads, cache sync)
![Page 56: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/56.jpg)
Performance Examples
Edit distance on two-core shared memory machine:
0 0.2 0.4 0.6 0.8 1 1.2 1.4·105
0
0.5
1
1.5
2
2.5
0 0.2 0.4 0.6 0.8 1 1.2 1.4·105
0
0.5
1
1.5
2
2.5
![Page 57: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/57.jpg)
Performance Examples
Edit distance on four-core NUMA machine:
0 1 2 3 4·105
0
1
2
3
4
0 1 2 3 4·105
0
1
2
3
4
![Page 58: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/58.jpg)
Performance Examples
Pseudo-Bellman-Ford on two-core shared memory machine:
0 0.2 0.4 0.6 0.8 1 1.2 1.4·105
0
0.5
1
1.5
2
2.5
0 0.2 0.4 0.6 0.8 1 1.2 1.4·105
0
1
2
3
4
![Page 59: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/59.jpg)
Performance Examples
Pseudo-Bellman-Ford on four-core NUMA machine:
0 1 2 3 4·105
0
1
2
3
4
0 1 2 3 4·105
0
2
4
6
8
![Page 60: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/60.jpg)
Future Work
Fill gaps in theory (caching and communication).
Generalise theory to more dimensions and interleaved DPs.
Improve and extend implementations.
More experiments (different problems, more diverse machines).
Improve compiler integration (detection, backtracing, result
functions).
Integrate with other tools.
![Page 61: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/61.jpg)
Future Work
Fill gaps in theory (caching and communication).
Generalise theory to more dimensions and interleaved DPs.
Improve and extend implementations.
More experiments (different problems, more diverse machines).
Improve compiler integration (detection, backtracing, result
functions).
Integrate with other tools.
![Page 62: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/62.jpg)
Future Work
Fill gaps in theory (caching and communication).
Generalise theory to more dimensions and interleaved DPs.
Improve and extend implementations.
More experiments (different problems, more diverse machines).
Improve compiler integration (detection, backtracing, result
functions).
Integrate with other tools.
![Page 63: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/63.jpg)
Future Work
Fill gaps in theory (caching and communication).
Generalise theory to more dimensions and interleaved DPs.
Improve and extend implementations.
More experiments (different problems, more diverse machines).
Improve compiler integration (detection, backtracing, result
functions).
Integrate with other tools.
![Page 64: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/64.jpg)
Future Work
Fill gaps in theory (caching and communication).
Generalise theory to more dimensions and interleaved DPs.
Improve and extend implementations.
More experiments (different problems, more diverse machines).
Improve compiler integration (detection, backtracing, result
functions).
Integrate with other tools.
![Page 65: Parallelising Dynamic Programming](https://reader031.vdocuments.us/reader031/viewer/2022030311/58eff3c91a28abb97f8b45cf/html5/thumbnails/65.jpg)
Future Work
Fill gaps in theory (caching and communication).
Generalise theory to more dimensions and interleaved DPs.
Improve and extend implementations.
More experiments (different problems, more diverse machines).
Improve compiler integration (detection, backtracing, result
functions).
Integrate with other tools.