glaf: a visual programming and auto- tuning framework for...
TRANSCRIPT
synergy.cs.vt.edu
GLAF: A Visual Programming and Auto-
Tuning Framework for Parallel Computing
Student: Konstantinos Krommydas
Collaborator: Dr. Ruchira Sasanka (Intel)
Advisor: Dr. Wu-chun Feng
synergy.cs.vt.edu
2
High-performance computing is crucial
in a broad range of scientific domains:
• Engineering, math, physics, biology, …
Parallel
programming
revolution has made
high-performance
computing
accessible to the
broader masses
Motivation
synergy.cs.vt.edu
Many different programming languages
Challenges
Many different computing platforms
Many different architectures
Many different optimization strategies
Domain experts should not (need to) know all these details
• but rather focus on their science
synergy.cs.vt.edu
Challenges
Domain experts need to collaborate with computer scientists
4
pseudocode
unoptimized/naïve
serial code
highly optimized
parallel code
Innovation
slow-down
Limited access to
parallel computing
• Communication overhead & errors
• Need to exchange domain-
specific/programming knowledge
synergy.cs.vt.edu
Contributions
• Realize a programming abstraction & development
framework for domain experts to provide a balance
between performance and programmability
– i.e., obtain fast performance from algorithms that have been
programmed easily
5
*GL
AF
auto-parallelizable, optimizable, tunable
intuitive, familiar, minimalistic syntax
data-visual and interactive
able to integrate with existing legacy code
Desired features
*Grid-based Language and Auto-tuning Framework
synergy.cs.vt.edu
Programming Using GLAF: a Simple Example
6
synergy.cs.vt.edu
7
Graphical User InterfaceComments
(in step,
statements,
grids, …)
Module
name
Function
name
Step number
(within function)
Click-based
interface
synergy.cs.vt.edu
8
Grid-Based Data Structures
• GLAF variables are based on the concept of grids:
• familiar abstraction (e.g., images, matrices,
spreadsheets)
• regular format that facilitates code generation,
optimizations, and parallelism detection
• Grid-based programming puts the focus on the
relation, rather than the implementation details
• Example:
• Scalar variable: 0D grid
• 1D array: one-dimensional grid
synergy.cs.vt.edu
9
Grid-Based Data Structures
Data typeDimension title
Grid name
synergy.cs.vt.edu
10
Grid-Based Data Structures
Different data
types across
a dimension
synergy.cs.vt.edu
11
Grid-Based Data Structures
synergy.cs.vt.edu
12
GLAF Infrastructure
Browser Web/Cloud
• GLAF Programming GUI
• Data visualization
• Fortran/C/JS/OpenCL code
generation
• Auto-parallelization
• Compilation/auto-tuning
script generation
• Data storage• Compilation
• Auto-tuning
• Execution
synergy.cs.vt.edu
13
Code Generation
synergy.cs.vt.edu
14
Code Generation
synergy.cs.vt.edu
15
Auto-TuningGenerates
platform-specific
binaries &
optimizationsSelects the
languages in
which to auto-
generate code Selects one or
more code “starting
points” for each
languageSelects
optimizations for
each combination
of language and
code “starting point”
synergy.cs.vt.edu
16
Auto-Tuning
synergy.cs.vt.edu
Multi-level
Auto-Tuning
Approach
17
Fortran C OpenCL
Serial GLAF-Parallel Compiler Parallel
GLAF
program
Data-layout
transformsLoop collapse
transforms
Loop interchange
transforms…
synergy.cs.vt.edu
18
Visualization
• Data visualization facilitates:
– understanding the algorithm being developed
– revealing bugs at an early stage
“Show Data”
synergy.cs.vt.edu
19
Visualization
• Data visualization facilitates:
– understanding the algorithm being developed
– revealing bugs at an early stage
“Colorize”
synergy.cs.vt.edu
20
Visualization
• Data visualization facilitates:
– understanding the algorithm being developed
– revealing bugs at an early stage
“Image Map”
synergy.cs.vt.edu
21
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
serial compiler-parallelized GLAF-parallelized(col(3)) GLAF-parallelized(col(1))
Speed-upoverserialfortran_C
PU
Implementa on
fortran_CPU
C_CPU_norestr
C_CPU_restr
fortran_XP
C_XP_norestr
C_XP_restr
Results: 3D finite difference algorithm
synergy.cs.vt.edu
22
0
5
10
15
20
25
serialSoA serialAoS compiler-parallelizedSoAcompiler-parallelizedAoS GLAF-parallelizedSoA GLAF-parallelizedAoS
Speed-upoverserialSoAfortran_CPU
Implementa on
fortran_CPU fortran_CPU_tmp C_CPU_norestr C_CPU_restr_tmp C_CPU_restr_tmp_powf
fortran_XP fortran_XP_tmp C_XP_norestr C_XP_restr_tmp C_XP_restr_tmp_powf
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Results: N-body algorithm
synergy.cs.vt.edu
Related Work
23
Implicit Explicit
Parallelization
Compilers:
gcc, Intel,
Cray, PGI,
…
Extensions/
libraries:
Pthreads, OpenMP,
OpenACC, Chapel
…
+ Includes loop-level parallelism & data-level
parallelism (vectorization), and potential
optimizations
- Requires certain programming knowledge
- (Often) conservative nature
Problem
Solving
Environments
Domain-specific:
computational biology,
physics,
dense linear algebra,
Fast-fourier transforms,
…
+ High-performance auto-tuning
- Restrictive in nature
- Restrictive in terms of target
language and/or platform
Domain-
Specific
Languages
synergy.cs.vt.edu
Future Work
• Improve tool’s robustness
• Enabling more languages/extensions:
– OpenCL, OpenACC
– Support of distributed programming (MPI)
• Dynamic feedback/advice on parallelism issues
• Extend auto-tuning, auto-parallelization/auto-
vectorization capabilities
• Implement more dwarfs and provide back-end support
for common programming pitfalls in code generated
for supported languages
synergy.cs.vt.edu
Conclusion
GLAF targets domain experts and provides a fine balance between
performance and programmability
• auto-parallelization, optimization and auto-tuning
• helps avoid common programming pitfalls
GLAF allows systematic generation
of multiple starting points for
different
languages/platforms/optimizations
Different seeds in a state-
space search algorithm
Analogous to:In summary:
“GLAF: A Visual Programming and Auto-Tuning
Framework for Parallel Computing”Krommydas, Sasanka, Feng
Leads to overall better performance Global vs. local minimum