tmpa-2017: using functional directives to analyze code complexity and communication
TRANSCRIPT
:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
:: Daniel Rubio BonillaTMPA 2017
Using Functional Directives to Analyze code Complexity and Communication
Daniel Rubio BonillaHLRS – University of Stuttgart
:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
:: 2
CPU Evolution
Daniel Rubio BonillaTMPA 2017
:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
:: 3
Hazel Hen
CPU
E5-2680 v312 Cores30MiB Cache2.5 GhZ
Node 2 CPUs – 24C128 GB
Comp. Nodes 7712
Total Cores 185,088
Performance 7420 TFlops
Storage ~10 PB
Weight 61.5 T
Power 3200 KW
Daniel Rubio BonillaTMPA 2017
:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
:: 4
Amdahl Law
# processing units
speedup100% parallelizable
98% parallelizable
90% parallelizable
50% parallelizable
Daniel Rubio BonillaTMPA 2017
:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
:: 5
Real Amdahl Law
# processing units
speedup100% parallelizable
98% parallelizable
90% parallelizable
50% parallelizable
Daniel Rubio BonillaTMPA 2017
:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
:: 6
The Problem
Daniel Rubio Bonilla
In High Performance Computing…
• Performance is increased by• Integrating more cores (millions!?)• Using heterogeneous accelerators (GPU, FPGA, ...)
• Issues• Programmability• Portability
TMPA 2017
:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
:: 7
New Approaches
Different Programming Model
• Focused on mathematical problems• Engineering • Science
• To enable:• Parallelization and concurrency• Portability across different hardware and accelerators
Daniel Rubio BonillaTMPA 2017
:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
:: 8
Our Approach
To obtain the structural information of the application by annotating the imperative code with a functional-like
directives (mathematical / algorithmic structure)
• The main difficulty in this approach are:• “deriving” the structure of the application• matching the structure to the source code
Daniel Rubio BonillaTMPA 2017
:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
:: 9
Higher Order Functions
• Higher Order functions are mathematical functions • Takes one or more function as an argument• Can return a function as a result
• Clear repetitive execution structure
• These structures can be transformed to equivalent ones• But with different non-functional properties
Daniel Rubio BonillaTMPA 2017
:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
:: 10
map :: (a -> b) -> [a] -> [b]
map (*2) [1,2,3,4] = [2,4,6,8]
Higher Order Functions
• Apply to all:
Daniel Rubio BonillaTMPA 2017
:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
:: 11
foldl :: (a -> b -> a) -> a -> [b] -> a
foldl (+) 0 [1,2,3,4] = 10
map :: (a -> b) -> [a] -> [b]
map (*2) [1,2,3,4] = [2,4,6,8]
Higher Order Functions
• Apply to all:
• Reduction:
Daniel Rubio BonillaTMPA 2017
:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
:: 12
Other Higher Order Functions
Daniel Rubio BonillaTMPA 2017
:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
:: 13
Higher Order Functions Transformations
total = foldl (+) 0 vs
One possible transformation
Only if the operation is associative and we knowits neutral element
Daniel Rubio BonillaTMPA 2017
:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
:: 14
Transformations
• Changes in the mathematical formulation• Or the algorithm execution
• Produce equivalent code• Change computing load• Change memory distribution• Modify communication
• Allow adaptation to different architectures
• While maintaining correctness!
Daniel Rubio BonillaTMPA 2017
:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
:: 15
Hierarchical Structures
• Functional annotations allow the construction of multiple structural levels:• Emerging complexity of the structural information
• We distinguish between:• Output of one Higher Order Function is input of another
• This can be achieved by analyzing the data dependencies between the functions
• The operator of one (Higher Order) Function is composed of other functions
Daniel Rubio BonillaTMPA 2017
:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
:: 16
Flat Structure
Graph of a Complex Structure of two same level Higher Order Functions (HOFs)• The output of one HOF is the input for another HOF
foldl (+) 0 (map (*2) [0..n-1])
foldl :: (a -> b -> a) -> a -> [b] -> amap :: (a -> b) -> [a] -> [b]
Daniel Rubio BonillaTMPA 2017
:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
:: 17
Hierarchical Structure
Graph of a Complex Hierarchical Structure of two different level Higher Order Functions (HOF)• The operator of one HOF is another HOF
map (foldl (+) 0) [[..]..[..]]
foldl :: (a -> b -> a) -> a -> [b] -> amap :: (a -> b) -> [a] -> [b]
Daniel Rubio BonillaTMPA 2017
:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
:: 18
Other requirements
• Strong binding between directives and code
• Description of memory organization
Daniel Rubio BonillaTMPA 2017
:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
:: 19
Example - Heat
t
Daniel Rubio Bonilla
1-D heat dissipation function
Discretization
TMPA 2017
:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
:: 20
Complexity
Daniel Rubio BonillaTMPA 2017
O(N_ELEM)
O(N_ITER)
O(N_ELEM)
O(1)
O(1)
O(1)
:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
:: 21
Complexity
Daniel Rubio BonillaTMPA 2017
O(N_ELEM)
O(N_ITER)
O(N_ELEM)
O(1)
O(1)
O(1)
O(1) + O(1) + O(N_ITER) * (O(N_ELEM)*O(1) + O(N_ELEM))
O(N_ITER * N_ELEM)
:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
:: 22
Transformations – Partitioning 1
let heatDiffusion = itn HEATTIMESTEP hm_array N_ITERPAR1 v = stencil1D TKernel 1 v where TKernel x y z = y + K * (x - 2*y + z)HEATTIMESTEP vs = map PAR1 vs
Daniel Rubio BonillaTMPA 2017
:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
:: 23
Transformations – Partitioning 2
let heatDiffusion = itn HEATTIMESTEP hm_array N_ITERPAR2 v = stencil1D TKernel 1 v where TKernel x y z = y + K * (x - 2*y + z)PAR1 vs = map PAR2 vsHEATTIMESTEP vss = map PAR1 vss
Daniel Rubio BonillaTMPA 2017
:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
:: 24
Platform Specific Transformations
• OpenMP:• Relatively straightforward
• MPI:• Communication• Halos
Daniel Rubio BonillaTMPA 2017
:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
:: 25
Transformed Code
if (rank < size - 1) MPI_Send(&hm[LOCAL_N_ELEM],1, MPI_FLOAT, rank + 1, 0, MPI_COMM_WORLD);if (rank > 0) MPI_Recv(&hm[0], 1, MPI_FLOAT, rank-1, 0, MPI_COMM_WORLD, &status);if (rank > 0) MPI_Send(&hm[1], 1, MPI_FLOAT, rank-1, 1, MPI_COMM_WORLD );if (rank < size - 1) MPI_Recv(&hm[LOCAL_N_ELEM+1],1,MPI_FLOAT, rank+1, 1, MPI_COMM_WORLD, \ &status);
#pragma polca stencil1D 1 G hm hm_tmp#pragma omp parallel forfor(i=1; i<LOCAL_N_ELEM+1; i++){#pragma polca G#pragma polca input (hm[i-1] hm[i] hm[i+1]) output(hm_tmp[i]) hm_tmp[i] = hm[i] + K * (hm[i-1] + hm[i+1] - 2 * hm[i]);}
Daniel Rubio BonillaTMPA 2017
:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
:: 26
Example - NBody
t
Daniel Rubio Bonilla
N-Body Problem
TMPA 2017
Three steps1) Calculate Forces2) Update Velocities3) Calculate Position
:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
:: 27
Structure
Daniel Rubio BonillaTMPA 2017
O(1)
O(1)
O(1)
O(nIters)
O(nBodies)
O(nBodies)
O(nBodies)
O(nBodies)
:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
:: 28
Structure
Daniel Rubio BonillaTMPA 2017
O(nIters)
O(nBodies2)
O(nBodies)
O(nBodies)
:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
:: 29
Structure
Daniel Rubio BonillaTMPA 2017
O(nIters) * (O(nBodies2) + 2*O(nBodies))
O(nIters * nBodies2)
O(nBodies2)
O(nBodies)
O(nBodies)
:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
:: 30
Communication
Daniel Rubio BonillaTMPA 2017
Parallel
Parallel
Sequential
Parallel (with caution)
:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
:: 31
Conclusion
• Functional semantics can enable code:• Transformation• Adaptation
• But also...• Algorithmic complexity analysis• Communication patterns
• This information helps to predict application’s behavior
Daniel Rubio BonillaTMPA 2017
:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
:: 32
Questions
Thank you!
Contact:[email protected]
Projects:POLCA www.polca-project.euSmart-Dash www.dash-project.orgCλaSH www.clash-lang.org
Daniel Rubio BonillaTMPA 2017