effectively mapping linguistic abstractions for message...
TRANSCRIPT
![Page 1: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/1.jpg)
Effectively Mapping Linguistic Abstractions for Message-passing Concurrency to Threads on
the Java Virtual Machine
Supported in part by the NSF grants CCF- 08-46059, CCF-11-17937, and CCF-14-23370.
![Page 2: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/2.jpg)
Effectively Mapping Linguistic Abstractions for Message-passing Concurrency to Threads on
the Java Virtual Machine
Supported in part by the NSF grants CCF- 08-46059, CCF-11-17937, and CCF-14-23370.
1) Local computation and communication behavior of a concurrent
entity can help predict the performance. 2) A modular and static technique can solve the problem.
![Page 3: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/3.jpg)
Problem
a1 a2
a3
a4 a5
Mapping
core0 core1
core2 core3
MPC System Architecture
MPC: Message-passing Concurrency, a1, a2, a3, a4 are MPC abstractions
2
![Page 4: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/4.jpg)
Problem
a1 a2
a3
a4 a5
core0 core1
core2 core3
MPC System Architecture
2
MPC: Message-passing Concurrency, a1, a2, a3, a4 are MPC abstractions
JVM threads
Mapping OS Scheduler
![Page 5: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/5.jpg)
Problem
a1 a2
a3
a4 a5
core0 core1
core2 core3
MPC System Architecture
2
MPC: Message-passing Concurrency, a1, a2, a3, a4 are MPC abstractions
JVM threads
Mapping OS Scheduler
In this work : Mapping Abstractions to JVM threads
![Page 6: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/6.jpg)
Problem
a1 a2
a3
a4 a5
core0 core1
core2 core3
MPC System Architecture
2
MPC: Message-passing Concurrency, a1, a2, a3, a4 are MPC abstractions
JVM threads
Mapping OS Scheduler
PROBLEM: Mapping Abstractions to JVM threads
GOAL: Concurrency
![Page 7: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/7.jpg)
Problem
a1 a2
a3
a4 a5
core0 core1
core2 core3
MPC System Architecture
2
MPC: Message-passing Concurrency, a1, a2, a3, a4 are MPC abstractions
JVM threads
Mapping OS Scheduler
PROBLEM: Mapping Abstractions to JVM threads
GOAL: Concurrency + Performance (Fast and Efficient)
![Page 8: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/8.jpg)
• State of the art: Abstractions to Threads mapping is performed by programmers.
Motivation
3
![Page 9: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/9.jpg)
• Abstractions to Threads mapping is performed by programmers
• MPC frameworks (Akka, Scala Actors, SALSA) provide Schedulers and Dispatchers to programmers for mapping Abstractions to Threads
Motivation
3
![Page 10: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/10.jpg)
Dispatchers in Akka • default • pinned • balancing • calling-thread
Akka
Kilim
Scala Actors SALSA
Jetlang Actors Guild ActorFoundry
JVM-based MPC Frameworks
4
Schedulers in Scala Actors • thread-based • event-based
Availability of wide-variety of Schedulers and Dispatchers suggests that. - Programmers can chose the ones that works best for their applications - And, perform the mapping carefully
![Page 11: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/11.jpg)
• Abstractions to Threads mapping is performed by programmers
• MPC frameworks (Akka, Scala Actors, SALSA) provide Schedulers and Dispatchers to programmers for mapping Abstractions to Threads
• Programmers find it hard to manually perform the mapping.
– Start with an initial mapping and incrementally improve the mapping. – This process can be tedious and time consuming.
Motivation
3
![Page 12: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/12.jpg)
• Abstractions to Threads mapping is performed by programmers • MPC frameworks (Akka, Scala Actors, SALSA) provide Schedulers and
Dispatchers to programmers for mapping Abstractions to Threads • SO discussions about configuring and fine tuning the mapping suggests that,
– Randomly tweaking the mapping without finding the root cause of performance problem doesn’t help and
– Without knowing the nature of the task performed by the abstractions, the mapping task becomes hard.
Motivation
3
![Page 13: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/13.jpg)
When manual tuning is hard, • Programmers use default mappings (default schedulers/dispatchers)
Motivation
5
![Page 14: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/14.jpg)
When manual tuning is hard, • Programmers use default mappings (default schedulers/dispatchers) • Problem: a single default mapping may not work across programs
Motivation
5
![Page 15: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/15.jpg)
When manual tuning is hard, • Programmers use default mappings (default schedulers/dispatchers) • Problem: a single default mapping may not work across programs
Motivation
5
LogisticMap (computes logistic map using a recurrence relation)
ScratchPad (counts lines for all files in a directory)
0
1
2
3
4
5
2 4 8 12
Exec
utio
n tim
e (s
)
Core setting
thread round-robin random work-stealing
0
1
2
3
4
5
6
7
2 4 8 12
Exec
utio
n tim
e (s
)
Core setting
thread round-robin random work-stealing
![Page 16: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/16.jpg)
When manual tuning is hard, • Programmers use default mappings (default schedulers/dispatchers) • Problem: a single default mapping may not work across programs
Motivation
5
LogisticMap (computes logistic map using a recurrence relation)
0
1
2
3
4
5
2 4 8 12
Exec
utio
n tim
e (s
)
Core setting
thread round-robin random work-stealing
![Page 17: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/17.jpg)
When manual tuning is hard, • Programmers use default mappings (default schedulers/dispatchers) • Problem: a single default mapping may not work across programs
Motivation
5
ScratchPad (counts lines for all files in a directory)
0
1
2
3
4
5
6
7
2 4 8 12
Exec
utio
n tim
e (s
)
Core setting
thread round-robin random work-stealing
![Page 18: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/18.jpg)
When manual tuning is hard, • Programmers use default mappings (default schedulers/dispatchers) • Problem: a single default mapping may not work across programs
Motivation
5
ScratchPad (counts lines for all files in a directory)
0
1
2
3
4
5
6
7
2 4 8 12
Exec
utio
n tim
e (s
)
Core setting
thread round-robin random work-stealing
![Page 19: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/19.jpg)
When manual tuning is hard, • Programmers use default mappings (default schedulers/dispatchers) • Problem: a single default mapping may not work across programs
Motivation
5
ScratchPad (counts lines for all files in a directory)
0
1
2
3
4
5
6
7
2 4 8 12
Exec
utio
n tim
e (s
)
Core setting
thread round-robin random work-stealing
![Page 20: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/20.jpg)
When manual tuning is hard, • Programmers use default mappings (default schedulers/dispatchers) • Problem: a single default mapping may not work across programs
Motivation
5
LogisticMap (computes logistic map using a recurrence relation)
ScratchPad (counts lines for all files in a directory)
0
1
2
3
4
5
2 4 8 12
Exec
utio
n tim
e (s
)
Core setting
thread round-robin random work-stealing
0
1
2
3
4
5
6
7
2 4 8 12
Exec
utio
n tim
e (s
)
Core setting
thread round-robin random work-stealing
![Page 21: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/21.jpg)
When manual tuning is hard and default mappings may not produce the desired performance,
• Brute force technique that tries all possible combinations of Abstractions to Threads mapping could be used
Motivation
6
![Page 22: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/22.jpg)
When manual tuning is hard and default mappings may not produce the desired performance,
• brute force technique that tries all possible combinations of Abstractions to Threads mapping could be used
• Problem: Combinatorial explosion
Motivation
6
![Page 23: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/23.jpg)
When manual tuning is hard and default mappings may not produce the desired performance,
• brute force technique that tries all possible combinations of Abstractions to Threads mapping could be used
• Problem: Combinatorial explosion • For example: an MPC program with 8 kinds of abstractions, trying all possible
combinations of 4 kinds of schedulers/dispatchers requires exploring 65536 (4^8) different combinations (some may even violate the concurrency correctness property)
Motivation
6
![Page 24: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/24.jpg)
When manual tuning is hard and default mappings may not produce the desired performance,
• brute force technique that tries all possible combinations of Abstractions to Threads mapping could be used
• Problem: Combinatorial explosion • For example: an MPC program with 8 kinds of abstractions, trying all possible
combinations of 4 kinds of schedulers/dispatchers requires exploring 65536 (4^8) different combinations (some may even violate the concurrency correctness property)
• Also, a small change to the program may require re-doing the mapping.
Motivation
6
![Page 25: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/25.jpg)
When manual tuning is hard and default mappings may not produce the desired performance,
• brute force technique that tries all possible combinations of Abstractions to Threads mapping could be used
• Problem: Combinatorial explosion • For example: an MPC program with 8 kinds of abstractions, trying all possible
combinations of 4 kinds of schedulers/dispatchers requires exploring 65536 (4^8) different combinations
• Also, a small change to the program may require redoing the mapping.
A mapping solution that yields significant performance improvement over default mappings is desirable
Motivation
6
![Page 26: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/26.jpg)
1) Local computation and communication behavior of concurrent entity is predictive for determining globally beneficial mapping
Key Ideas
7
![Page 27: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/27.jpg)
1) Local computation and communication behavior of concurrent entity is predictive for determining globally beneficial mapping
• Computation and Communication behaviors, – externally blocking behavior – local state – computational workload – message send/receive pattern – inherent parallelism
Key Ideas
7
![Page 28: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/28.jpg)
1) Local computation and communication behavior of concurrent entity is predictive for determining globally beneficial mapping
• Computation and Communication behaviors, – externally blocking behavior – local state – computational workload – message send/receive pattern – inherent parallelism
2) Determining these behavior at a coarse/abstract level is sufficient to solve the mapping problem
Key Ideas
7
![Page 29: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/29.jpg)
• Represent Computation and Communication behavior of MPC abstractions (language-agnostic manner),
Solution Outline
8
Input Program
![Page 30: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/30.jpg)
• Represent Computation and Communication behavior of MPC abstractions (language-agnostic manner),
• Perform local program analyses statically to determine behaviors (proposed solution is both modular and static)
Solution Outline
8
cVector Analysis
Input Program
cVector
![Page 31: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/31.jpg)
• Represent Computation and Communication behavior of MPC abstractions (language-agnostic manner)
• Perform local program analyses statically to determine behaviors (proposed solution is both modular and static)
• A Mapping function that takes represented behaviors as input and produces an execution policy for each abstraction Execution policy: describes how messages of MPC abstraction are processed (in detail later)
Solution Outline
8
Mapping Function
cVector Analysis
Input Program
Execution Policy
cVector
![Page 32: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/32.jpg)
Panini Capsules
Mapping Function
cVector Analysis
9
Execution Policy
cVector Input Program
General MPC framework MPC Abstraction Message Handlers
Panini Capsules Capsule Procedures
![Page 33: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/33.jpg)
Panini Capsules
Mapping Function
cVector Analysis
9
Execution Policy
cVector Input Program
State
p0
p1
pn
...
Capsule
capsule Ship { short state = 0; int x = 5; void die() { state = 2; } void fire() { state = 1; } void moveLeft() { if (x>0) x--; } void moveRight() { if (x<10) x++; } }
General MPC framework MPC Abstraction Message Handlers
Panini Capsules Capsule Procedures
![Page 34: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/34.jpg)
Solution Outline
Mapping Function
cVector Analysis
9
Execution Policy
cVector Input Program
May-Block Analysis
State Analysis
Call-claim Analysis
Communication Summary Analysis
Computational Workload Analysis
State
p0
p1
pn
...
for each procedure pi
Procedure Behavior
Composition
cVector
Capsule
Static Analyses
![Page 35: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/35.jpg)
Solution Outline
Mapping Function
cVector Analysis
9
Execution Policy
cVector Input Program
May-Block Analysis
State Analysis
Call-claim Analysis
Communication Summary Analysis
Computational Workload Analysis
State
p0
p1
pn
...
for each procedure pi
Procedure Behavior
Composition
cVector
Capsule
Static Analyses
![Page 36: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/36.jpg)
Solution Outline
Mapping Function
cVector Analysis
9
Execution Policy
cVector Input Program
May-Block Analysis
State Analysis
Call-claim Analysis
Communication Summary Analysis
Computational Workload Analysis
State
p0
p1
pn
...
for each procedure pi
Procedure Behavior
Composition
cVector Mapping Function
Capsule
EP
![Page 37: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/37.jpg)
• Blocking behavior (β) – represents externally blocking behavior due to I/O, socket or db primitives – dom(beta): {true, false}
• Local state (σ) – local state variables – dom(sigma): {nil, primitive, large}
• Inherent parallelism (π) – inherent parallelism exposed by capsule when it communicates with other
capsules – dom(pi): {sync, async, future}
• Computational workload (ω) – represents computations performed by capsule – dom(omega): {math, io}
• Communication Pattern ( ρ, ρ )
Representing Behaviors
10
Characteristics Vector (cVector)
<β, σ, π, ρ, ρ, ω>
![Page 38: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/38.jpg)
• Blocking behavior (β) – represents externally blocking behavior due to I/O, socket or db primitives – dom(beta): {true, false}
• Local state (σ) – local state variables – dom(sigma): {nil, primitive, large}
• Inherent parallelism (π) – inherent parallelism exposed by capsule when it communicates with other
capsules – dom(pi): {sync, async, future}
• Computational workload (ω) – represents computations performed by capsule – dom(omega): {math, io}
• Communication Pattern ( ρ, ρ )
Representing Behaviors
10
Characteristics Vector (cVector)
<β, σ, π, ρ, ρ, ω>
All behaviors <β, π, ρ, ρ, ω> except σ is defined for capsule procedures and combined to form behaviors for capsule using behavior composition (described later)
![Page 39: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/39.jpg)
• Blocking behavior (β) – represents externally blocking behavior due to I/O, socket or db primitives – dom (β) : {true, false}
• Local state (sigma) – local state variables – dom(sigma): {nil, primitive, large}
• Inherent parallelism (pi) – inherent parallelism exposed by capsule when it communicates with other
capsules – dom(pi): {sync, async, future}
• Communication Pattern (rho) • Computational workload (omega)
– represents computations performed by capsule – dom(omega): {math, io}
Representing Behaviors
Characteristics Vector (cVector)
10
BufferedReader br = new BufferedReader(new InputStreamReader(System.in)); try { userName = br.readLine(); // blocking } catch (IOException ioe) { }
<β, σ, π, ρ, ρ, ω>
May Block Analysis • Input : Manually created dictionary of blocking library calls • Analysis : Flow analysis with message receives as sources and blocking library calls as
sinks
![Page 40: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/40.jpg)
• Blocking behavior (beta) – represents externally blocking behavior due to I/O, socket or db primitives – dom(beta): {true, false}
• Local state (σ) – local state variables – dom (σ): {nil, fixed, variable}
• Inherent parallelism (pi) – inherent parallelism exposed by capsule when it communicates with other
capsules – dom(pi): {sync, async, future}
• Communication Pattern (rho) • Computational workload (omega)
– represents computations performed by capsule – dom(omega): {math, io}
Representing Behaviors
10
capsule Receiver { void receive() { } }
capsule Receiver { int state1; int state2; void receive() { } }
capsule Receiver { Map<String,String> dataMap = new HashMap<>(); void receive() { } }
Characteristics Vector (cVector)
<β, σ, π, ρ, ρ, ω>
State Analysis • Input : State variables • Analysis : Checks the type of state variables that composed capsule state for primitive or
collection types.
![Page 41: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/41.jpg)
• Blocking behavior (beta) – represents externally blocking behavior due to I/O, socket or db primitives – dom(beta): {true, false}
• Local state (sigma) – local state variables – dom(sigma): {nil, primitive, large}
• Inherent parallelism (π) – kind of parallelism exposed by capsule while communicating – dom (π) : {sync, async, future}
• Communication Pattern (rho) • Computational workload (omega)
– represents computations performed by capsule – dom(omega): {math, io}
Representing Behaviors
Characteristics Vector (cVector)
10
capsule Receiver (Sender sender) { void receive() { // sync int i = sender.get(); // print i; } }
capsule Receiver (Sender sender) { void receive() { sender.done(); // async } }
capsule Receiver (Sender sender) { void receive() { // future int i = sender.get(); // some computation // print i; }}
<β, σ, π, ρ, ρ, ω>
![Page 42: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/42.jpg)
• Blocking behavior (beta) – represents externally blocking behavior due to I/O, socket or db primitives – dom(beta): {true, false}
• Local state (sigma) – local state variables – dom(sigma): {nil, primitive, large}
• Inherent parallelism (π) – dom (π) : {sync, async, future}
• Computational workload – represents computations performed by capsule – dom(omega): {math, io}
• Communication Pattern
Representing Behaviors
Characteristics Vector (cVector)
10
<β, σ, π, ρ, ρ, ω>
Inherent Parallelism Analysis
![Page 43: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/43.jpg)
• Blocking behavior (beta) – represents externally blocking behavior due to I/O, socket or db primitives – dom(beta): {true, false}
• Local state (sigma) – local state variables – dom(sigma): {nil, primitive, large}
• Inherent parallelism (pi) – dom(pi): {sync, async, future}
• Computational workload (ω) – represents computations performed by capsule – dom (ω) : {math, io} – math : computation-to-wait > 1 – io : computation-to-wait <= 1
Representing Behaviors
Characteristics Vector (cVector)
10
<β, σ, π, ρ, ρ, ω>
Computational Workload Analysis • Computation summary : recursive calls, high
cost library calls, unbounded loops, complete read/write to state that is variable size.
![Page 44: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/44.jpg)
• Communication pattern – Message send pattern
dom ( ρ ): {leaf, router, scatter}
– Message receive pattern • dom ( ρ ): {gather, request-reply}
Representing Behaviors
11
leaf : no outgoing communication router : one-to-one communication scatter : batch communication
gather : recv-to-send > 1 request-reply : recv-to-send <= 1
![Page 45: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/45.jpg)
• Communication pattern – Message send pattern
dom ( ρ ): {leaf, router, scatter}
– Message receive pattern • dom ( ρ ): {gather, request-reply}
Representing Behaviors
11
leaf : no outgoing communication router : one-to-one communication scatter : batch communication
gather : recv-to-send > 1 request-reply : recv-to-send <= 1
– Builds Communication Summary which abstracts away expressions except message send/receive, state read/writes.
– Analysis is a function that takes communication summary of a procedure and produces message send/receive pattern tuple as output.
Communication Pattern Analysis
![Page 46: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/46.jpg)
Procedure Behavior Composition
Mapping Function
cVector Analysis
9
Execution Policy
cVector Input Program
May-Block Analysis
State Analysis
Call-claim Analysis
Communication Summary Analysis
Computational Workload Analysis
State
p0
p1
pn
...
for each procedure pi
Procedure Behavior
Composition
cVector Mapping Function
Capsule
EP
![Page 47: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/47.jpg)
Procedure Behavior Composition ( )
12
• Capsule may have multiple procedures
• Behavior of the capsule is determined by combining the behaviors of its procedures
• For instance, a capsule has blocking behavior is any of its procedures are blocking
• Key idea : Capsule behavior is predominantly defined by the procedure that executes often
![Page 48: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/48.jpg)
Mapping Function
cVector Analysis
9
Execution Policy
cVector Input Program
![Page 49: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/49.jpg)
Mapping Function
cVector Analysis
9
Execution Policy
cVector Input Program
![Page 50: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/50.jpg)
• THREAD, capsule is assigned a dedicated thread,
• TASK, capsule is assigned to a task-pool and the shared thread of the task-pool will process the messages,
• SEQ/MONITOR, calling capsule’s thread itself will execute the behavior at callee capsule.
A
Q Thread
B A B
Thread Thread
A B Thread
A B Thread
A: Th, B:Th A: Ta, B:Ta
A: Th, B:S A: Th, B:M
Th: Thread Ta: Task S: Sequential M: Monitor
Execution Policies
13
![Page 51: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/51.jpg)
Mapping Function
cVector Analysis
9
Execution Policy
cVector Input Program
![Page 52: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/52.jpg)
• Encodes several intuitions about MPC abstractions
Mapping Heuristics
Heuristic Execution Policy
Blocking Th
Heavy Th
HighCPU Ta
LowCPU M
Hub Ta
Affinity M/S
Master Ta
Worker Ta
Th - Thread, Ta - Task, M - Monitor and S - Sequential
14
![Page 53: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/53.jpg)
• Encodes several intuitions about MPC abstractions
• Examples – Blocking Heuristics
• capsules with externally blocking behaviors • should be assigned Th execution policy • rationale : other policies may lead to blocking of
the executing thread, starvation and system deadlocks.
Mapping Heuristics
Heuristic Execution Policy
Blocking Th
Heavy Th
HighCPU Ta
LowCPU M
Hub Ta
Affinity M/S
Master Ta
Worker Ta
Th - Thread, Ta - Task, M - Monitor and S - Sequential
14
![Page 54: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/54.jpg)
• Encodes several intuitions about MPC abstractions
• Examples – Blocking Heuristics
• capsules with externally blocking behaviors • should be assigned Th execution policy • rationale : other policies may lead to blocking of
the executing thread, starvation and system deadlocks.
Mapping Heuristics
Heuristic Execution Policy
Blocking Th
Heavy Th
HighCPU Ta
LowCPU M
Hub Ta
Affinity M/S
Master Ta
Worker Ta
Th - Thread, Ta - Task, M - Monitor and S - Sequential
14
The goal of heuristics, Ø Reduce mailbox contentions, message
passing and processing overheads and cache-misses
![Page 55: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/55.jpg)
Mapping Function: Flow Diagram
15
• Mapping function takes cVector as input and assigns an execution policy • Encodes several intuitions (heuristics) as shown in the figure • It is complete and assigns a single policy
![Page 56: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/56.jpg)
Evaluation
• Benchmark programs (15 total) – that exhibits data, task, and pipeline parallelism at coarse and fine
granularities. • Comparing cVector mapping against thread-all and round-robin-task-
all, random-task-all, and work-stealing-task-all • Measured reduction in program execution time and CPU
consumption over default mappings on different core settings.
16
![Page 57: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/57.jpg)
0"
20"
40"
60"
80"
100"
120"
2" 4" 8" 12" 2" 4" 8" 12" 2" 4" 8" 12" 2" 4" 8" 12" 2" 4" 8" 12" 2" 4" 8" 12" 2" 4" 8" 12"
bang" mbrot" serialmsg" FileSearch" ScratchPad" Polynomial" fasta"
%"ru
n&me"im
provem
ent"
!250%
!200%
!150%
!100%
!50%
0%
50%
100%
2% 4% 8% 12%
DCT%
%"ru
n&me"im
provem
ent"
!60$
!40$
!20$
0$
20$
40$
60$
80$
100$
2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$
Knucleo0de$ Fannkuchred$ RayTracer$ Breamformer$ logmap$ concdict$ concsll$
%"ru
n&me"im
provem
ent"
% runtime improvement over default mappings, for fifteen benchmarks. For each benchmark there are four core settings (2, 4, 8, 12-cores) and for each core setting there are four bars (Ith, Irr, Ir, Iws) showing improvement over four default mappings (thread, round-robin, random, work-stealing). Higher bars are better.
Results: Improvement In Program Runtime
17
![Page 58: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/58.jpg)
0"
20"
40"
60"
80"
100"
120"
2" 4" 8" 12" 2" 4" 8" 12" 2" 4" 8" 12" 2" 4" 8" 12" 2" 4" 8" 12" 2" 4" 8" 12" 2" 4" 8" 12"
bang" mbrot" serialmsg" FileSearch" ScratchPad" Polynomial" fasta"
%"ru
n&me"im
provem
ent"
!250%
!200%
!150%
!100%
!50%
0%
50%
100%
2% 4% 8% 12%
DCT%
%"ru
n&me"im
provem
ent"
!60$
!40$
!20$
0$
20$
40$
60$
80$
100$
2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$
Knucleo0de$ Fannkuchred$ RayTracer$ Breamformer$ logmap$ concdict$ concsll$
%"ru
n&me"im
provem
ent"
% runtime improvement over default mappings, for fifteen benchmarks. For each benchmark there are four core settings (2, 4, 8, 12-cores) and for each core setting there are four bars (Ith, Irr, Ir, Iws) showing improvement over four default mappings (thread, round-robin, random, work-stealing). Higher bars are better.
Results: Improvement In Program Runtime
17
X-axis : core settings (2,4,8,12) Y-axis : % runtime improvement
![Page 59: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/59.jpg)
0"
20"
40"
60"
80"
100"
120"
2" 4" 8" 12" 2" 4" 8" 12" 2" 4" 8" 12" 2" 4" 8" 12" 2" 4" 8" 12" 2" 4" 8" 12" 2" 4" 8" 12"
bang" mbrot" serialmsg" FileSearch" ScratchPad" Polynomial" fasta"
%"ru
n&me"im
provem
ent"
!250%
!200%
!150%
!100%
!50%
0%
50%
100%
2% 4% 8% 12%
DCT%
%"ru
n&me"im
provem
ent"
!60$
!40$
!20$
0$
20$
40$
60$
80$
100$
2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$
Knucleo0de$ Fannkuchred$ RayTracer$ Breamformer$ logmap$ concdict$ concsll$
%"ru
n&me"im
provem
ent"
% runtime improvement over default mappings, for fifteen benchmarks. For each benchmark there are four core settings (2, 4, 8, 12-cores) and for each core setting there are four bars (Ith, Irr, Ir, Iws) showing improvement over four default mappings (thread, round-robin, random, work-stealing). Higher bars are better.
Results: Improvement In Program Runtime
17
4 bars indicating % runtime improvements over thread-all, round-robin, random, and work-stealing default strategies
![Page 60: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/60.jpg)
0"
20"
40"
60"
80"
100"
120"
2" 4" 8" 12" 2" 4" 8" 12" 2" 4" 8" 12" 2" 4" 8" 12" 2" 4" 8" 12" 2" 4" 8" 12" 2" 4" 8" 12"
bang" mbrot" serialmsg" FileSearch" ScratchPad" Polynomial" fasta"
%"ru
n&me"im
provem
ent"
!250%
!200%
!150%
!100%
!50%
0%
50%
100%
2% 4% 8% 12%
DCT%
%"ru
n&me"im
provem
ent"
!60$
!40$
!20$
0$
20$
40$
60$
80$
100$
2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$
Knucleo0de$ Fannkuchred$ RayTracer$ Breamformer$ logmap$ concdict$ concsll$
%"ru
n&me"im
provem
ent"
% runtime improvement over default mappings, for fifteen benchmarks. For each benchmark there are four core settings (2, 4, 8, 12-cores) and for each core setting there are four bars (Ith, Irr, Ir, Iws) showing improvement over four default mappings (thread, round-robin, random, work-stealing). Higher bars are better.
Results: Improvement In Program Runtime
o 12 of 15 programs showed improvements
o On average 40.56%, 30.71%, 59.50%, and 40.03% improvements (execution time) over thread, round-robin, random and work-stealing mappings respectively
o 3 programs showed no improvements (data parallel programs)
17
![Page 61: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/61.jpg)
0"
20"
40"
60"
80"
100"
120"
2" 4" 8" 12" 2" 4" 8" 12" 2" 4" 8" 12" 2" 4" 8" 12" 2" 4" 8" 12" 2" 4" 8" 12" 2" 4" 8" 12"
bang" mbrot" serialmsg" FileSearch" ScratchPad" Polynomial" fasta"
%"ru
n&me"im
provem
ent"
!250%
!200%
!150%
!100%
!50%
0%
50%
100%
2% 4% 8% 12%
DCT%
%"ru
n&me"im
provem
ent"
!60$
!40$
!20$
0$
20$
40$
60$
80$
100$
2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$
Knucleo0de$ Fannkuchred$ RayTracer$ Breamformer$ logmap$ concdict$ concsll$
%"ru
n&me"im
provem
ent"
% runtime improvement over default mappings, for fifteen benchmarks. For each benchmark there are four core settings (2, 4, 8, 12-cores) and for each core setting there are four bars (Ith, Irr, Ir, Iws) showing improvement over four default mappings (thread, round-robin, random, work-stealing). Higher bars are better.
Analysis: Improvement In Program Runtime
1) Reduced mailbox contentions 2) Reduced message-passing and processing overheads 3) Reduced mailbox contentions and cache-misses
17
![Page 62: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/62.jpg)
• We presented cVector mapping technique for capsules, however the technique should be applicable to other MPC frameworks
• Proof of concept : evaluation on Akka – Similar results for most benchmarks – Data parallel applications show no improvements (as in Panini)
Applicability
18
!60$
!40$
!20$
0$
20$
40$
60$
80$
100$
2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$ 2$ 4$ 8$ 12$
bang$ mbrot$ serialmsg$ fasta$ scratchpad$ logmap$ concdict$ concsll$
%"execu'o
n"'m
e"im
provem
ents"
Benchmarks"and"core"se6ngs"
Ith$ Ita$ I;$
![Page 63: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/63.jpg)
• Placing Erlang actors on multicore efficiently, Francesquini et al. [Erlang’13 Workshop]
– considers only hub-affinity behavior that is annotated by programmer
– out technique takes care of many other behaviors
• Mapping task graphs to cores, Survey [DAC’13]
– not directly applicable to JVM- based MPC frameworks, because threads to cores mapping is left to OS scheduler
• Efficient strategies for mapping threads to cores for
OpenMP multi-threaded programs, Tousimojarad and Vanderbauwhede [Journal of Parallel Computing’14]
– our technique maps capsules to threads and not threads to cores
Related Works
19
![Page 64: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/64.jpg)
Placing Erlang actors on multicore efficiently, Francesquini et al. [Erlang’13 Workshop]
considers only hub-affinity behavior that is annotated by programmer out technique takes care of many other behaviors
Mapping task graphs to cores, Survey [DAC’13]
not directly applicable to JVM- based MPC frameworks, because threads to cores mapping is left to OS scheduler
Efficient strategies for mapping threads to cores for OpenMP multi-threaded programs, Tousimojarad and Vanderbauwhede [Journal of Parallel Computing’14]
our technique maps capsules to threads and not threads to cores
Related Works
19
o Not directly applicable to JVM-based MPC frameworks o Non-automatic
![Page 65: Effectively Mapping Linguistic Abstractions for Message ...web.cs.iastate.edu/~ganeshau/mapping/oopsla2015.pdf · Effectively Mapping Linguistic Abstractions for Message-passing Concurrency](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0727207e708231d41b91a9/html5/thumbnails/65.jpg)
Conclusion
Ganesha Upadhyaya and Hridesh Rajan {ganeshau,hridesh}@iastate.edu
Supported in part by the NSF grants CCF- 08-46059, CCF-11-17937, and CCF-14-23370.
Questions?
a1 a2
a3
a4 a5
core0 core1
core2 core3
MPC System Architecture
MPC: Message-passing Concurrency, a1, a2, a3, a4 are MPC abstractions
JVM threads
Mapping OS Scheduler
Mapping Abstractions to JVM threads Evaluated on Panini and Akka