OutlineMotivations
Runtime energy performance optimizationEvaluation and preliminary results
Summary
A Three Step Blind Approach for Improving HPCSystems’ Energy Performance
Ghislain Landry Tsafack, Laurent Lefevre, Patricia [email protected]
EE-LSDS –April, 2013
1
1This work is supported through HemeraG.L. Tsafack, L. Lefevre, P. Stolf EE-LSDS 2013
OutlineMotivations
Runtime energy performance optimizationEvaluation and preliminary results
Summary
1 MotivationsHigh Performance Computing (HPC) systems’ design
2 Runtime energy performance optimizationOn-the-fly system adaptation
3 Evaluation and preliminary resultsExperimental platform description
4 Summary
G.L. Tsafack, L. Lefevre, P. Stolf EE-LSDS 2013
OutlineMotivations
Runtime energy performance optimizationEvaluation and preliminary results
Summary
High Performance Computing (HPC) systems’ design
High Performance Computing (HPC) systems design
place great emphasis on a few components: processorarchitecture, memory subsystems, storage subsystems andcommunication subsystems, management framework, platformarchitecture
performance of cpu-intensive workloads depends on processorarchitecture
guarantee good performance on average over a wide verity ofworkloads.
can result in power dissipation for some workloads or executionphases of a specific workload
most of them allow system reconfiguration (at least theprocessor)
G.L. Tsafack, L. Lefevre, P. Stolf EE-LSDS 2013 3 / 15
OutlineMotivations
Runtime energy performance optimizationEvaluation and preliminary results
Summary
On-the-fly system adaptation
On-line system reconfiguration
Overview of the methodology
phase detection
phase characterization
phase identification and system reconfiguration
reuse of configuration information for recurring phases
def. phase is define as a region of execution of theprogram/system relatively stable with respect to a given metric
G.L. Tsafack, L. Lefevre, P. Stolf EE-LSDS 2013 4 / 15
OutlineMotivations
Runtime energy performance optimizationEvaluation and preliminary results
Summary
On-the-fly system adaptation
On-line system reconfiguration (cont.)
Example:
Phase Detection
Phase Characterization
Compute & memory
intensive
Compute
intensiveMemory
intensive
Identification and decision making
Match any new EV with a known phase if any and make the appropriate
decisionmemory intensive: slow down the
Processor...
Characterization of detected phases
get the characteristics of the new vectorIf identification successful
Detect new phases
Phase detection
Figure: Summary of the methodology on a system which successivelyruns five di↵erent workloads.
G.L. Tsafack, L. Lefevre, P. Stolf EE-LSDS 2013 5 / 15
OutlineMotivations
Runtime energy performance optimizationEvaluation and preliminary results
Summary
On-the-fly system adaptation
Phase changes detection
Execution Vectors (EV) based approach
column vector whose entries are sensors – including hardwareperformance counters, network bytes sent/received and diskread/write counts
example
0
BBB@
cache refbranch ins
...byteSent
1
CCCA
G.L. Tsafack, L. Lefevre, P. Stolf EE-LSDS 2013 6 / 15
OutlineMotivations
Runtime energy performance optimizationEvaluation and preliminary results
Summary
On-the-fly system adaptation
Phase changes detection (cont.)
Similarity/resemblance between EVs is used for phase detection
the manhattan distance between consecutive EVs is theresemblance metric
phase changes occur when the distance between consecutiveEVs exceeds a threshold (varies throughout the system’slife-cycle)
sensor-1
sensor-2
≤ TH↕
EVs represented as points in the 2-dimensional space generated by sensor-1 and
sensor-2
EVs belong to the same phase
TH ≥
G.L. Tsafack, L. Lefevre, P. Stolf EE-LSDS 2013 7 / 15
OutlineMotivations
Runtime energy performance optimizationEvaluation and preliminary results
Summary
On-the-fly system adaptation
Phase characterization or labelling
Represented by reference vector
closest EV to the centroid of the group of EVs belonging tothe phase
Characterization lies on last level cache references per instructionratio (LLCRIR)
Table: Order of magnitude of LLC references per instruction ratio andassociated labels.
Workload label order of magnitude of LLCRIR
compute intensive 10�4
memory bound � 10�2
mixed (both memory compute intensive) 10�3
G.L. Tsafack, L. Lefevre, P. Stolf EE-LSDS 2013 8 / 15
OutlineMotivations
Runtime energy performance optimizationEvaluation and preliminary results
Summary
On-the-fly system adaptation
On-the-fly system adaptation
Key idea: reuse of configuration information for reoccurringphases/workloads
instead of identifying complete phasesmatch each newly sampled EV with known phases and makethe reconfiguration decision for the next sampling interval
basic principle:if at time t the system is in a phase labelled label , it is likelyto be running the same type of phase at time t + 1
G.L. Tsafack, L. Lefevre, P. Stolf EE-LSDS 2013 9 / 15
OutlineMotivations
Runtime energy performance optimizationEvaluation and preliminary results
Summary
On-the-fly system adaptation
On-the-fly system adaptation (cont.)
Table: Phase labels and associated energy reduction schemes.
Phase label Possible reconfiguration decisions
compute intensive switch o↵ memory banks; send disks to sleep;scale the processor up; put NICs into LPI mode
memory intensive scale the processor down; decrease disksor send them to sleep; switch on memory banks
mixed switch on memory banks; scale the processor upsend disks to sleep; put NICs into LPI mode
communication switch o↵ memory banks; scale the processor downintensive switch on disks
I/O intensive switch on memory banks; scale the processor down;increase disks (if needed)
G.L. Tsafack, L. Lefevre, P. Stolf EE-LSDS 2013 10 / 15
OutlineMotivations
Runtime energy performance optimizationEvaluation and preliminary results
Summary
Experimental platform description
Experimental platform description
25 node cluster of Intel Xeon X3440 set up on Grid5000
Linux kernel 2.6.35 runs on each node, where sensors arecollected on a per second basis
Association between labels and processors frequencies:CPU intensive: 2.53Ghzmixed: 2Ghzmemory intensive: 1.87Ghz
consider Benchmarks including MG, EP, BT, SP, FT, CG, and ISfrom NPB-3.2 benchmark suite
G.L. Tsafack, L. Lefevre, P. Stolf EE-LSDS 2013 11 / 15
OutlineMotivations
Runtime energy performance optimizationEvaluation and preliminary results
Summary
Experimental platform description
Methodology and results
randomly run each workload and let the system react accordingly
Table: summary of system’s decisions for 5 instances of above workloads
InstancesWorkload categories Workloads 1st 2nd 3rd 4th 5th
compute intensive (CI) MG CI CI CI CI CIEP CI MI MI CI CI
mixed (MIX) BT CI MIX MIX MIX MIXSP CI MIX MIX MIX MIXFT CI MIX MIX MIX MIX
memory intensive (MI) CG CI MI MI MI MIIS CI CI MI MI MI
G.L. Tsafack, L. Lefevre, P. Stolf EE-LSDS 2013 12 / 15
OutlineMotivations
Runtime energy performance optimizationEvaluation and preliminary results
Summary
Experimental platform description
Impact on performance
0
0.2
0.4
0.6
0.8
1
1.2
1.4
EP MG BT SP FT CG IS
Energ
y co
nsu
mptio
n (
j * 1
06)
12345
(a) Energy consumption
0
50
100
150
200
250
300
350
400
450
1 2 3 4 5
Exe
cutio
n t
ime
(s)
Workloads instance (1 for the 1rst instance, 2 for the 2nd and so on)
BTCGEPFT
MGIS
SP
(b) Execution time
Figure: Variation of energy consumption and execution time of recurringworkloads with respect to system reconfiguration decisions.
G.L. Tsafack, L. Lefevre, P. Stolf EE-LSDS 2013 13 / 15
OutlineMotivations
Runtime energy performance optimizationEvaluation and preliminary results
Summary
Summary
introduce an on-line general purpose methodology forimproving energy performance of HPC systems
processor, disk, and network interconnect (experiments in thepaper concentrate on the processor)
the approach can easily be extended to a large number ofenergy-aware clusters
does not require any specific knowledge of the application orany user intervention
future directions: investigating more power saving schemes;extending the number of instances of workloads in theexperiments.
G.L. Tsafack, L. Lefevre, P. Stolf EE-LSDS 2013 14 / 15
OutlineMotivations
Runtime energy performance optimizationEvaluation and preliminary results
Summary
Summary
Thank you for your attention!
Questions?
G.L. Tsafack, L. Lefevre, P. Stolf EE-LSDS 2013 15 / 15