parallex - louisiana state universitystellar.cct.lsu.edu/pubs/cetraro.pdf · 2011. 10. 9. ·...
TRANSCRIPT
![Page 2: ParalleX - Louisiana State Universitystellar.cct.lsu.edu/pubs/Cetraro.pdf · 2011. 10. 9. · Cetraro Workshop 2011 2 6/28/2011 . Technology Demands new Response Cetraro Workshop](https://reader036.vdocuments.us/reader036/viewer/2022071119/60190beb2db0c0039c398431/html5/thumbnails/2.jpg)
Tianhe-1A 2.566 Petaflops Rmax
Heterogeneous Architecture:
• 14,336 Intel Xeon CPUs
• 7,168 Nvidia Tesla M2050 GPUs
• More than 100 racks
• 4.04 megawatts
Cetraro Workshop 2011
2
6/28/2011
![Page 3: ParalleX - Louisiana State Universitystellar.cct.lsu.edu/pubs/Cetraro.pdf · 2011. 10. 9. · Cetraro Workshop 2011 2 6/28/2011 . Technology Demands new Response Cetraro Workshop](https://reader036.vdocuments.us/reader036/viewer/2022071119/60190beb2db0c0039c398431/html5/thumbnails/3.jpg)
Technology Demands new Response
Cetraro Workshop 2011
3
6/28/2011
![Page 4: ParalleX - Louisiana State Universitystellar.cct.lsu.edu/pubs/Cetraro.pdf · 2011. 10. 9. · Cetraro Workshop 2011 2 6/28/2011 . Technology Demands new Response Cetraro Workshop](https://reader036.vdocuments.us/reader036/viewer/2022071119/60190beb2db0c0039c398431/html5/thumbnails/4.jpg)
Technology Demands new Response
Cetraro Workshop 2011
4
6/28/2011
![Page 5: ParalleX - Louisiana State Universitystellar.cct.lsu.edu/pubs/Cetraro.pdf · 2011. 10. 9. · Cetraro Workshop 2011 2 6/28/2011 . Technology Demands new Response Cetraro Workshop](https://reader036.vdocuments.us/reader036/viewer/2022071119/60190beb2db0c0039c398431/html5/thumbnails/5.jpg)
Amdahl’s Law
• P: Proportion of parallel code
• N: Number of processors
1
1 − 𝑃 + 𝑃𝑁
Figure courtesy of Wikipedia (http://en.wikipedia.org/wiki/Amdahl's_law)
Cetraro Workshop 2011
5
6/28/2011
![Page 6: ParalleX - Louisiana State Universitystellar.cct.lsu.edu/pubs/Cetraro.pdf · 2011. 10. 9. · Cetraro Workshop 2011 2 6/28/2011 . Technology Demands new Response Cetraro Workshop](https://reader036.vdocuments.us/reader036/viewer/2022071119/60190beb2db0c0039c398431/html5/thumbnails/6.jpg)
The 4 Horsemen of the Apocalypse: SLOW
• Starvation
• Latencies
•Overheads
•Waiting for Contention resolution
Cetraro Workshop 2011
6
6/28/2011
![Page 7: ParalleX - Louisiana State Universitystellar.cct.lsu.edu/pubs/Cetraro.pdf · 2011. 10. 9. · Cetraro Workshop 2011 2 6/28/2011 . Technology Demands new Response Cetraro Workshop](https://reader036.vdocuments.us/reader036/viewer/2022071119/60190beb2db0c0039c398431/html5/thumbnails/7.jpg)
Efficiency Factors
• Starvation
▫ Insufficient concurrent work to maintain high utilization of resources Inadequate global or local parallelism due to poor load balancing
• Latency
▫ Time-distance delay of remote resource access and services E.g., memory access and system-wide message passing
• Overhead
▫ Critical path work for management of parallel actions and resources
▫ Work not necessary for sequential variant
• Waiting for contention resolution
▫ Delay due to lack of availability of oversubscribed shared resource Bottlenecks in the system, e.g., memory bank access, and network bandwidth
7
Cetraro Workshop 2011 6/28/2011
![Page 8: ParalleX - Louisiana State Universitystellar.cct.lsu.edu/pubs/Cetraro.pdf · 2011. 10. 9. · Cetraro Workshop 2011 2 6/28/2011 . Technology Demands new Response Cetraro Workshop](https://reader036.vdocuments.us/reader036/viewer/2022071119/60190beb2db0c0039c398431/html5/thumbnails/8.jpg)
Efficiency Factors
• Starvation
▫ Insufficient concurrent work to maintain high utilization of resources Inadequate global or local parallelism due to poor load balancing
• Latency
▫ Time-distance delay of remote resource access and services E.g., memory access and system-wide message passing
• Overhead
▫ Critical path work for management of parallel actions and resources
▫ Work not necessary for sequential variant
• Waiting for contention resolution
▫ Delay due to lack of availability of oversubscribed shared resource Bottlenecks in the system, e.g., memory bank access, and network bandwidth
8
Cetraro Workshop 2011 6/28/2011
![Page 9: ParalleX - Louisiana State Universitystellar.cct.lsu.edu/pubs/Cetraro.pdf · 2011. 10. 9. · Cetraro Workshop 2011 2 6/28/2011 . Technology Demands new Response Cetraro Workshop](https://reader036.vdocuments.us/reader036/viewer/2022071119/60190beb2db0c0039c398431/html5/thumbnails/9.jpg)
A Game Changer
Cetraro Workshop 2011
9
6/28/2011
![Page 10: ParalleX - Louisiana State Universitystellar.cct.lsu.edu/pubs/Cetraro.pdf · 2011. 10. 9. · Cetraro Workshop 2011 2 6/28/2011 . Technology Demands new Response Cetraro Workshop](https://reader036.vdocuments.us/reader036/viewer/2022071119/60190beb2db0c0039c398431/html5/thumbnails/10.jpg)
Adaptive Mesh Refinement (AMR)
6/28/2011 Cetraro Workshop 2011
10
![Page 11: ParalleX - Louisiana State Universitystellar.cct.lsu.edu/pubs/Cetraro.pdf · 2011. 10. 9. · Cetraro Workshop 2011 2 6/28/2011 . Technology Demands new Response Cetraro Workshop](https://reader036.vdocuments.us/reader036/viewer/2022071119/60190beb2db0c0039c398431/html5/thumbnails/11.jpg)
Why Adaptive Mesh Refinement (AMR)
• From 31 Mar 2010 to 31 Mar 2011 at least 68,394,791 SU’s were dedicated on Teragrid to finite difference based AMR applications (out of ~1.407 billion SU’s allocated) -- about 5% of runs
• Nearly all of the publicly available AMR toolkits use MPI
• Strong scaling of AMR applications is typically very poor
• ParalleX functionality fits nicely with the AMR algorithm: global address space, “work stealing”, parallelism discovery, dynamic threads, implicit load balancing
6/28/2011 Cetraro Workshop 2011
11
![Page 12: ParalleX - Louisiana State Universitystellar.cct.lsu.edu/pubs/Cetraro.pdf · 2011. 10. 9. · Cetraro Workshop 2011 2 6/28/2011 . Technology Demands new Response Cetraro Workshop](https://reader036.vdocuments.us/reader036/viewer/2022071119/60190beb2db0c0039c398431/html5/thumbnails/12.jpg)
Constraint based Synchronization for AMR
Cetraro Workshop 2011
12
6/28/2011
• Compute dependencies at task instantiation time
• No global barriers, uses constraint based synchronization
• Computation flows at its own pace • Message driven • Symmetry between local and
remote task creation/execution
![Page 13: ParalleX - Louisiana State Universitystellar.cct.lsu.edu/pubs/Cetraro.pdf · 2011. 10. 9. · Cetraro Workshop 2011 2 6/28/2011 . Technology Demands new Response Cetraro Workshop](https://reader036.vdocuments.us/reader036/viewer/2022071119/60190beb2db0c0039c398431/html5/thumbnails/13.jpg)
What’s ParalleX ?
• Active global address space (AGAS) instead of PGAS • Message driven instead of message passing • Lightweight control objects instead of global
barriers • Latency hiding instead of latency avoidance • Adaptive locality control instead of static data
distribution • Fine grained parallelism of lightweight threads
instead of Communicating Sequential Processes (CSP/MPI)
• Moving work to data instead of moving data to work
6/28/2011 Cetraro Workshop 2011
13
![Page 14: ParalleX - Louisiana State Universitystellar.cct.lsu.edu/pubs/Cetraro.pdf · 2011. 10. 9. · Cetraro Workshop 2011 2 6/28/2011 . Technology Demands new Response Cetraro Workshop](https://reader036.vdocuments.us/reader036/viewer/2022071119/60190beb2db0c0039c398431/html5/thumbnails/14.jpg)
The Runtime System – A Game Changer
• Runtime system ▫ is: ephemeral, dedicated to and exists only with an application
▫ is not: the OS, persistent and dedicated to the hardware system
• Moves us from static to dynamic operational regime ▫ Exploits situational awareness for causality-driven adaptation
▫ Guided-missile with continuous course correction rather than a fired projectile with fixed-trajectory
• Based on foundational assumption ▫ Untapped system resources to be harvested
▫ More computational work will yield reduced time and lower power
▫ Opportunities for enhanced efficiencies discovered only in flight
▫ New methods of control to deliver superior scalability
• “Undiscovered Country” – adding a dimension of systematics ▫ Adding a new component to the system stack
▫ Path-finding through the new trade-off space
Cetraro Workshop 2011
14
6/28/2011
![Page 15: ParalleX - Louisiana State Universitystellar.cct.lsu.edu/pubs/Cetraro.pdf · 2011. 10. 9. · Cetraro Workshop 2011 2 6/28/2011 . Technology Demands new Response Cetraro Workshop](https://reader036.vdocuments.us/reader036/viewer/2022071119/60190beb2db0c0039c398431/html5/thumbnails/15.jpg)
HPX Runtime System Design
• Current version of HPX provides the following infrastructure on conventional systems as defined by the ParalleX execution model
▫ Active Global Address Space (AGAS)
▫ ParalleX Threads and ParalleX Thread Management
▫ Parcel Transport and Parcel Management
▫ Local Control Objects (LCOs)
Cetraro Workshop 2011 6/28/2011
15
![Page 16: ParalleX - Louisiana State Universitystellar.cct.lsu.edu/pubs/Cetraro.pdf · 2011. 10. 9. · Cetraro Workshop 2011 2 6/28/2011 . Technology Demands new Response Cetraro Workshop](https://reader036.vdocuments.us/reader036/viewer/2022071119/60190beb2db0c0039c398431/html5/thumbnails/16.jpg)
HPX Runtime System Design
• Current version of HPX provides the following infrastructure on conventional systems as defined by the ParalleX execution model
▫ Active Global Address Space (AGAS)
▫ ParalleX Threads and ParalleX Thread Management
▫ Parcel Transport and Parcel Management
▫ Local Control Objects (LCOs)
Cetraro Workshop 2011 6/28/2011
16
![Page 17: ParalleX - Louisiana State Universitystellar.cct.lsu.edu/pubs/Cetraro.pdf · 2011. 10. 9. · Cetraro Workshop 2011 2 6/28/2011 . Technology Demands new Response Cetraro Workshop](https://reader036.vdocuments.us/reader036/viewer/2022071119/60190beb2db0c0039c398431/html5/thumbnails/17.jpg)
Main Runtime System Tasks
• Manage parallel execution for application Starvation ▫ Delineating parallelism, runtime adaptive management of parallelism ▫ Synchronizing parallel tasks ▫ Thread scheduling, static and dynamic load balancing
• Mitigate latencies for application Latencies ▫ Latency hiding through overlap of computation and communication ▫ Latency avoidance through locality management ▫ Dynamic copy semantic support
• Reduce overhead for application Overheads ▫ Synchronization, scheduling, load balancing, communication, context
switching, memory management, address translation
• Resolve contention for application Contention ▫ Adaptive routing, resource scheduling, load balancing ▫ Localized request buffering for logical resources
17
Cetraro Workshop 2011 6/28/2011
![Page 18: ParalleX - Louisiana State Universitystellar.cct.lsu.edu/pubs/Cetraro.pdf · 2011. 10. 9. · Cetraro Workshop 2011 2 6/28/2011 . Technology Demands new Response Cetraro Workshop](https://reader036.vdocuments.us/reader036/viewer/2022071119/60190beb2db0c0039c398431/html5/thumbnails/18.jpg)
Active Global Address Space
• Global Address Space throughout the system ▫ Removes dependency on static data distribution ▫ Enables dynamic load balancing of application and system data
• AGAS assigns global names (identifiers, unstructured 128 bit integers to all entities managed by HPX.
• Unlike PGAS allows mechanisms to resolving global identifiers into corresponding local virtual addresses (LVA) ▫ LVAs comprise – Locality ID, Type of Entity being referred to and its local
memory address ▫ Moving an entity to a different locality updates this mapping. ▫ Current implementation is based on centralized database storing the
mappings which are accessible over the local area network. ▫ Local caching policies have been implemented to prevent bottlenecks and
minimize the number of required round-trips.
• Current implementation allows autonomous creation of globally unique ids in the locality where the entity is initially located and supports memory pooling of similar objects to minimize overhead
Cetraro Workshop 2011
18
6/28/2011
![Page 19: ParalleX - Louisiana State Universitystellar.cct.lsu.edu/pubs/Cetraro.pdf · 2011. 10. 9. · Cetraro Workshop 2011 2 6/28/2011 . Technology Demands new Response Cetraro Workshop](https://reader036.vdocuments.us/reader036/viewer/2022071119/60190beb2db0c0039c398431/html5/thumbnails/19.jpg)
Thread Management
• Thread manager is modular and implements a work-queue based management as specified by PX Execution model
• Threads are cooperatively scheduled at user level without requiring a kernel transition
• Specially designed synchronization primitives such as semaphores, mutexes etc. allow synchronization of HPX threads in the same way as conventional threads
• Thread management currently supports several key modes ▫ Global Thread Queue
▫ Local Queue (work stealing)
▫ Local Priority Queue (work stealing)
Cetraro Workshop 2011
19
6/28/2011
![Page 20: ParalleX - Louisiana State Universitystellar.cct.lsu.edu/pubs/Cetraro.pdf · 2011. 10. 9. · Cetraro Workshop 2011 2 6/28/2011 . Technology Demands new Response Cetraro Workshop](https://reader036.vdocuments.us/reader036/viewer/2022071119/60190beb2db0c0039c398431/html5/thumbnails/20.jpg)
Parcel Management
• Any inter-locality messaging is based on Parcels
▫ In HPX implementation parcels are represented as polymorphic objects
▫ An HPX entity on creating a parcel object sends it to the parcel handler.
• The parcel handler serializes the parcel where all dependent data is bundled along with the parcel.
• At the receiving locality the parcel is received using the standard TCP/IP protocols,
• The action manager de-serializes the parcel and creates HPX threads out of the specification
Cetraro Workshop 2011
20
Locality 2Locality 1
Parcel Handler
parcel
object
Action Manager
HPX Threads
put()
Serialized Parcel De-serialized Parcel
6/28/2011
![Page 21: ParalleX - Louisiana State Universitystellar.cct.lsu.edu/pubs/Cetraro.pdf · 2011. 10. 9. · Cetraro Workshop 2011 2 6/28/2011 . Technology Demands new Response Cetraro Workshop](https://reader036.vdocuments.us/reader036/viewer/2022071119/60190beb2db0c0039c398431/html5/thumbnails/21.jpg)
Exemplar LCO: Futures
• In HPX Futures LCO refers to an object that acts as a proxy for the result that is initially not known.
• When a user code invokes a future (using future.get() ) the thread can do one of 2 activities ▫ If the remote data /arguments are
available then the future.get() operation fetches the data and the execution of the thread continues
▫ If the remote data is NOT available the thread may continue until it requires the actual value; then the thread suspends allowing other threads to continue execution. The original thread re-activates as soon as the data data dependency is resolved
6/28/2011 Cetraro Workshop 2011
21
Locality 1
Locality 2
future.get()suspend thread 1
reactivate thread 1
execute thread 2
Note: Thread 1 is suspended only if the results from locality 2are not readily available. If results are available Tread 1 continues to complete execution.
![Page 22: ParalleX - Louisiana State Universitystellar.cct.lsu.edu/pubs/Cetraro.pdf · 2011. 10. 9. · Cetraro Workshop 2011 2 6/28/2011 . Technology Demands new Response Cetraro Workshop](https://reader036.vdocuments.us/reader036/viewer/2022071119/60190beb2db0c0039c398431/html5/thumbnails/22.jpg)
Based on HPX – An exemplar implementation of ParalleX for conventional systems
Cetraro Workshop 2011
22
6/28/2011
![Page 23: ParalleX - Louisiana State Universitystellar.cct.lsu.edu/pubs/Cetraro.pdf · 2011. 10. 9. · Cetraro Workshop 2011 2 6/28/2011 . Technology Demands new Response Cetraro Workshop](https://reader036.vdocuments.us/reader036/viewer/2022071119/60190beb2db0c0039c398431/html5/thumbnails/23.jpg)
Starvation: Non-uniform Workload
0.000
0.002
0.004
0.006
0.008
0.010
4 5 6 7 8 9 10 11 12
Wav
e A
mp
litu
de
Computational Domain (Radius)
AMR Example Mesh Structure
0 LoR
1 LoR
2 LoR
Cetraro Workshop 2011
23
6/28/2011
![Page 24: ParalleX - Louisiana State Universitystellar.cct.lsu.edu/pubs/Cetraro.pdf · 2011. 10. 9. · Cetraro Workshop 2011 2 6/28/2011 . Technology Demands new Response Cetraro Workshop](https://reader036.vdocuments.us/reader036/viewer/2022071119/60190beb2db0c0039c398431/html5/thumbnails/24.jpg)
Starvation: Non-uniform Workload
0.000
0.002
0.004
0.006
0.008
0.010
4 5 6 7 8 9 10 11 12
Wav
e A
mp
litu
de
Computational Domain (Radius)
AMR Example Mesh Structure
0 LoR
1 LoR
2 LoR
Cetraro Workshop 2011
24
6/28/2011
![Page 25: ParalleX - Louisiana State Universitystellar.cct.lsu.edu/pubs/Cetraro.pdf · 2011. 10. 9. · Cetraro Workshop 2011 2 6/28/2011 . Technology Demands new Response Cetraro Workshop](https://reader036.vdocuments.us/reader036/viewer/2022071119/60190beb2db0c0039c398431/html5/thumbnails/25.jpg)
Starvation: Non-uniform Workload
Cetraro Workshop 2011
25
6/28/2011
![Page 26: ParalleX - Louisiana State Universitystellar.cct.lsu.edu/pubs/Cetraro.pdf · 2011. 10. 9. · Cetraro Workshop 2011 2 6/28/2011 . Technology Demands new Response Cetraro Workshop](https://reader036.vdocuments.us/reader036/viewer/2022071119/60190beb2db0c0039c398431/html5/thumbnails/26.jpg)
Grain Size: The New Freedom
6/28/2011 Cetraro Workshop 2011
26
![Page 27: ParalleX - Louisiana State Universitystellar.cct.lsu.edu/pubs/Cetraro.pdf · 2011. 10. 9. · Cetraro Workshop 2011 2 6/28/2011 . Technology Demands new Response Cetraro Workshop](https://reader036.vdocuments.us/reader036/viewer/2022071119/60190beb2db0c0039c398431/html5/thumbnails/27.jpg)
Overhead: Load Balancing
6/28/2011 Cetraro Workshop 2011
27
Competing effects for optimal grain size: overheads vs. load balancing (starvation)
![Page 28: ParalleX - Louisiana State Universitystellar.cct.lsu.edu/pubs/Cetraro.pdf · 2011. 10. 9. · Cetraro Workshop 2011 2 6/28/2011 . Technology Demands new Response Cetraro Workshop](https://reader036.vdocuments.us/reader036/viewer/2022071119/60190beb2db0c0039c398431/html5/thumbnails/28.jpg)
Overhead: Load Balancing
6/28/2011 Cetraro Workshop 2011
28
Competing effects for optimal grain size: overheads vs. load balancing (starvation)
![Page 29: ParalleX - Louisiana State Universitystellar.cct.lsu.edu/pubs/Cetraro.pdf · 2011. 10. 9. · Cetraro Workshop 2011 2 6/28/2011 . Technology Demands new Response Cetraro Workshop](https://reader036.vdocuments.us/reader036/viewer/2022071119/60190beb2db0c0039c398431/html5/thumbnails/29.jpg)
0
20
40
60
80
100
120
0 4 8 12 16 20 24 28 32 36 40 44 48
Exe
cuti
on
Tim
e [
s]
Number of OS Threads (Cores)
Execution Time [s] (1,000,000 PX Threads)
0μs
3.5μs
7μs
14.5μs
29μs
58μs
115μs
Overhead: Threads
Cetraro Workshop 2011
29
6/28/2011
![Page 30: ParalleX - Louisiana State Universitystellar.cct.lsu.edu/pubs/Cetraro.pdf · 2011. 10. 9. · Cetraro Workshop 2011 2 6/28/2011 . Technology Demands new Response Cetraro Workshop](https://reader036.vdocuments.us/reader036/viewer/2022071119/60190beb2db0c0039c398431/html5/thumbnails/30.jpg)
0
20
40
60
80
100
120
0 4 8 12 16 20 24 28 32 36 40 44 48
Exe
cuti
on
Tim
e [
s]
Number of OS Threads (Cores)
Execution Time [s] (1,000,000 PX Threads)
0μs
3.5μs
7μs
14.5μs
29μs
58μs
115μs
Overhead: Threads
Cetraro Workshop 2011
30
6/28/2011
![Page 31: ParalleX - Louisiana State Universitystellar.cct.lsu.edu/pubs/Cetraro.pdf · 2011. 10. 9. · Cetraro Workshop 2011 2 6/28/2011 . Technology Demands new Response Cetraro Workshop](https://reader036.vdocuments.us/reader036/viewer/2022071119/60190beb2db0c0039c398431/html5/thumbnails/31.jpg)
Scaling: AMR using MPI and HPX
0
2
4
6
8
10
12
14
0 1 2 3 4 5 6 7 8
Scal
ing
(no
rmal
ize
d t
o 1
co
re)
Levels of AMR refinement
Scaling of MPI AMR application
1 core
2 cores
4 cores
10 cores
20 cores
Cetraro Workshop 2011
31
6/28/2011
![Page 32: ParalleX - Louisiana State Universitystellar.cct.lsu.edu/pubs/Cetraro.pdf · 2011. 10. 9. · Cetraro Workshop 2011 2 6/28/2011 . Technology Demands new Response Cetraro Workshop](https://reader036.vdocuments.us/reader036/viewer/2022071119/60190beb2db0c0039c398431/html5/thumbnails/32.jpg)
Scaling: AMR using MPI and HPX
0
2
4
6
8
10
12
14
0 1 2 3 4 5 6 7 8
Scal
ing
(no
rmal
ize
d t
o 1
co
re)
Levels of AMR refinement
Scaling of MPI AMR application
1 core
2 cores
4 cores
10 cores
20 cores
0
2
4
6
8
10
12
14
0 1 2 3 4 5 6 7 8
Scal
ing
(no
rmal
ize
d t
o 1
co
re)
Levels of AMR refinement
Scaling of HPX AMR application
1 core
2 cores
4 cores
10 cores
20 cores
Cetraro Workshop 2011
32
6/28/2011
![Page 33: ParalleX - Louisiana State Universitystellar.cct.lsu.edu/pubs/Cetraro.pdf · 2011. 10. 9. · Cetraro Workshop 2011 2 6/28/2011 . Technology Demands new Response Cetraro Workshop](https://reader036.vdocuments.us/reader036/viewer/2022071119/60190beb2db0c0039c398431/html5/thumbnails/33.jpg)
Performance: AMR using MPI and HPX
0.5
1
1.5
2
2.5
3
3.5
4
1 core 2 cores 5 cores 10 cores 20 cores 30 cores
Wal
lclo
ck t
ime
rat
io M
PI/
HP
X (
HP
X =
1)
Number of cores
Wallclock time ratio MPI/HPX (Depending on levels of refinement - LoR, pollux.cct.lsu.edu, 32 cores)
0 LoR
1 LoR
2 LoR
3 LoR
Cetraro Workshop 2011
33
6/28/2011
![Page 34: ParalleX - Louisiana State Universitystellar.cct.lsu.edu/pubs/Cetraro.pdf · 2011. 10. 9. · Cetraro Workshop 2011 2 6/28/2011 . Technology Demands new Response Cetraro Workshop](https://reader036.vdocuments.us/reader036/viewer/2022071119/60190beb2db0c0039c398431/html5/thumbnails/34.jpg)
A Cure for Scaling Impaired Parallel Applications ?
Cetraro Workshop 2011
34
6/28/2011
![Page 35: ParalleX - Louisiana State Universitystellar.cct.lsu.edu/pubs/Cetraro.pdf · 2011. 10. 9. · Cetraro Workshop 2011 2 6/28/2011 . Technology Demands new Response Cetraro Workshop](https://reader036.vdocuments.us/reader036/viewer/2022071119/60190beb2db0c0039c398431/html5/thumbnails/35.jpg)
ParalleX – Is it a Cure?
• Not completely sure yet
▫ Half way through
▫ Promising results on SMP systems
▫ First (promising) results on distributed Systems
• No code changes required!
• Current projects
▫ Custom hardware (FPGAs) accelerating systems functionality
▫ Improving performance of AGAS, Parcel transport, …
▫ Redefining I/O
Cetraro Workshop 2011
35
6/28/2011
![Page 36: ParalleX - Louisiana State Universitystellar.cct.lsu.edu/pubs/Cetraro.pdf · 2011. 10. 9. · Cetraro Workshop 2011 2 6/28/2011 . Technology Demands new Response Cetraro Workshop](https://reader036.vdocuments.us/reader036/viewer/2022071119/60190beb2db0c0039c398431/html5/thumbnails/36.jpg)
ParalleX – Is it a Cure?
• ParalleX execution model can be implemented without adding significantly more overhead than what MPI does
• Implicit load balancing for AMR simulations based on finer grained parallelism highly beneficial
• There are regimes and applications that can benefit from this highly parallel model
• Runtime granularity control is crucial for optimal scaling
Cetraro Workshop 2011
36
6/28/2011