university of maryland towards automated tuning of parallel programs jeffrey k. hollingsworth...
TRANSCRIPT
University of Maryland
Towards Automated Tuning of Parallel Programs
Jeffrey K. Hollingsworth
[email protected] of Computer Science
University of Maryland, College Park, MD 20742
University of Maryland
Why Automated Performance Tuning?
Software commonly have parameters that impact their performance.
Optimal parameter values are usually variable and un-predictable.
Automated Parameter tuning can be used for adaptive parameter tuning
in complex software.
University of Maryland
Automated Performance Tuning
Goal: Maximize achieved performance
Problems:– Large number of parameters to tune– Shape of objective function unknown– Multiple libraries and coupled applications– Analytical model may not be available
Requirements:– Runtime tuning for long running programs– Don’t try too many configurations– Avoid gradients
University of Maryland
Active Harmony
Runtime performance optimization– Can also support training runs
Automatic library selection (code)– Monitor library performance– Switch library if necessary
Automatic performance tuning (parameter)– Monitor system performance– Adjust runtime parameters
Hooks for Compiler Frameworks– Working to integrate USC/ISI Chill– Looking at others too
University of Maryland
Example: Cluster Based Web Server
3-tier system Harmony Provides
– Parameter updates for DB, and App Severs
TPC-W Benchmark– Transactional web benchmark – Mimic operations of an e-commerce site– Uses Java implementation from Univ. of
Wisconsin– Performance metrics
• Web Interaction Per Second (WIPS)
University of Maryland
Cluster-Based Web Service Tuning
Best configuration after 200 iterations
Browsing
Shopping
Ordering
Improvements compared to the default
configuration
15%
16%
5%
0102030405060
Browsing Shopping Ordering
Workload Applied
Perf
orm
ance
(W
IPS)
Original configuration Best configuration for Browsing
Best configuration for Shopping Best configuration for Ordering
University of Maryland
Importance of various parameters
0
50
100
150
200
250
300
PROXY C
ache
Mem
AJP A
ccep
t Cou
nt
MYSQL
Net B
uffe
r Len
gth
AJP M
ax P
roce
ssor
s
PROXY M
ax O
bject
in M
emor
y
HTTP B
uffe
r Size
HTTP A
ccep
t Cou
nt
MYSQL
Delaye
d Que
ue
PROXY M
in Obj
ect
MYSQL
Max
Con
nect
ions
Parameter
Se
nsi
tivity
Shopping Ordering
University of Maryland
A Bit More About Harmony Search
Pre-execution– Sensitivity Discovery Phase– Used to Order not Eliminate search
dimensions
Online– Use Parallel Rank Order Search
• Different configurations on different nodes
University of Maryland
Benefits of Searching in ParallelPRO vs. Nelder-Mead (HPL)
40
45
50
55
60
65
70
75
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
Iterations
% P
eak
PRO-minNelder-Mead-min
PRO vs. Nelder Mead for POP
30
35
40
45
50
55
60
65
70
75
80
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
Iterations
Tim
e (s
ec)
Nelder-Mead
PRO-min
PRO-max
Performance for Two Programs– High Performance Linpack– POP Ocean Code
University of Maryland11
Performance of Matrix Multiply ActiveHarmony + CHiLL
vs ATLAS and MKL
0500
100015002000250030003500400045005000
0 1000 2000 3000
Matrix Size
MF
lop
s
Native
Simple Search
Harmony + Chill
Atlas
MKL
University of Maryland
Must Coordinate Auto Tuners
Problem: Warring auto tuning systems
• Multiple components “auto tuning” at once
• Tuning based on multiple changes at once
Solution:– Need some level of coordination– Possible Answer:
• Exposing different tuning systems– Part of PERI Auto-tuning Effort
University of Maryland
Conclusion
Active Harmony– An infrastructure for runtime tuning– Automatic tuning and environment
adaptation– It works !
Auto Tuning Integration– Need to integrate multiple tools
• Compilers, runtime, application