resource management with slurm
TRANSCRIPT
Introduction Slurm Energy and Power Measurement Upcoming Summary Literature
Resource management with Slurm
Tobias Weßeler
Arbeitsbereich Wissenschaftliches RechnenFachbereich Informatik
Fakultät für Mathematik, Informatik und NaturwissenschaftenUniversität Hamburg
2016-01-18
Tobias Weßeler Resource management with Slurm 1 / 31
Introduction Slurm Energy and Power Measurement Upcoming Summary Literature
Agenda
1 Introduction
2 Slurm
3 Energy and Power Measurement
4 Upcoming
5 Summary
6 Literature
Tobias Weßeler Resource management with Slurm 2 / 31
Introduction Slurm Energy and Power Measurement Upcoming Summary Literature
What is job scheduling / resource management?
Introduction
Resource managerMonitors resource utilization (CPU, RAM, etc.)and allocates them to the users’ jobsMonitors power consumptionSwitches off unused resourcesCommunicates with job scheduler
Job schedulerUses information from resource manager to prioritize jobsSchedules jobs to efficiently use resourcesInforms resource manager about hardware needs
Tobias Weßeler Resource management with Slurm 3 / 31
Introduction Slurm Energy and Power Measurement Upcoming Summary Literature
Job scheduling examples
Introduction
Figure: Naive queue, figure based on: [Ada14]
Tobias Weßeler Resource management with Slurm 4 / 31
Introduction Slurm Energy and Power Measurement Upcoming Summary Literature
Job scheduling examples
Introduction
Figure: Backfill example, figure based on: [Ada14]
Tobias Weßeler Resource management with Slurm 5 / 31
Introduction Slurm Energy and Power Measurement Upcoming Summary Literature
Job scheduling examples
Introduction
Figure: No backfill possible, figure based on: [Ada14]
Tobias Weßeler Resource management with Slurm 6 / 31
Introduction Slurm Energy and Power Measurement Upcoming Summary Literature
Software
SLURM
Simple Linux Utility for Resource ManagementCombined resource manager and job schedulerOpen source, fault-tolerant, highly scalableSupports plugins (dynamically linked objects at runtime)Active development by community as well as SchedMDwide-spread use in HPC
”As of the June 2015 Top 500 computer list, Slurm wasperforming workload management on six of the ten mostpowerful computers in the world including the number 1system, Tianhe-2 with 3,120,000 computing cores.”-SchedMD website
Tobias Weßeler Resource management with Slurm 7 / 31
Introduction Slurm Energy and Power Measurement Upcoming Summary Literature
Architecture
Daemons of slurm
Slurmctld – controller daemonMonitors and allocates resourcesManages job queuesHas optional backup with automatic fail-over
Slurmdbd – database daemonStores accounting and configuration informationAlso has an optional automatic fail-overAttached database can be mysql, postgresql or text format
Slurmd – compute node daemonLaunches and manages tasksVery light-weightQuiet (except for optional accounting)
SlurmstepdManages job steps and I/OSpawned for each jobstep and terminated after
Tobias Weßeler Resource management with Slurm 8 / 31
Introduction Slurm Energy and Power Measurement Upcoming Summary Literature
Architecture
Daemons of slurm
Figure: Multi-cluster environment
Figure based on Introduction to Slurm: [Sch16]Tobias Weßeler Resource management with Slurm 9 / 31
Introduction Slurm Energy and Power Measurement Upcoming Summary Literature
Architecture
What is a plugin?
Also called addins, addons or extensionsA plugin is an optional software module that can extend orchange the functionality of an existing programUsually plugins are very specific and only work with a certainprogram - just like a puzzle piece only fits into its own puzzleSoftware gets more customizable and becomes extensibleOften represented as a puzzle pieceDescribed via interface or API
Tobias Weßeler Resource management with Slurm 10 / 31
Introduction Slurm Energy and Power Measurement Upcoming Summary Literature
Architecture
Plugins
Over 80 plugins (as of dec 2012)Objects that are dynamically linked duringruntimeCurrently 26 well-defined APIs /programmer’s guides
Tobias Weßeler Resource management with Slurm 11 / 31
Introduction Slurm Energy and Power Measurement Upcoming Summary Literature
The How and What...
Energy accounting
Measure the power and energy consumed by nodes or jobsPower profiling:analyse power demands of cluster and utilization of resourcesImprove energy efficiency
Power:P = I · V (Product of Current and Voltage)SI: watt (1 joule over 1 second)Energy consumption:P · t (Product of Power and Time)SI: watt-hours (3600 joule)
Tobias Weßeler Resource management with Slurm 12 / 31
Introduction Slurm Energy and Power Measurement Upcoming Summary Literature
The How and What...
Motivation
MoneyDKRZ consumes over 17 GWh per yearIt costs over 1,850,000 eThat is roughly 11 cents per KWhFor comparison:Average energy consumption per Person - 2,000 KWhFactor: 8,500,000Environmental awareness
Tobias Weßeler Resource management with Slurm 13 / 31
Introduction Slurm Energy and Power Measurement Upcoming Summary Literature
The How and What...
How to measure energy
PM = power meterdifferent possible locationsLibrary / APIProgram (or plugin) to uselibrary or APISamples are collected byclient and then processed
Figure: Concept drawing [Wes]Tobias Weßeler Resource management with Slurm 14 / 31
Introduction Slurm Energy and Power Measurement Upcoming Summary Literature
Example Energy Plugins
Slurm Energy Plugin - RAPL
Running Average Power LimitSamples are estimated from apower consumption model based onhardware countersEstimates seem to be very accurateOnly 2 readings necessary -> lowoverhead
Figure: Concept drawing [Wes]
Tobias Weßeler Resource management with Slurm 15 / 31
Introduction Slurm Energy and Power Measurement Upcoming Summary Literature
Example Energy Plugins
Slurm Energy Plugin - IPMI
Intelligent Platform Management InterfaceProtocol to read from sensorsLights Out Management:enables remote control and management of machineBaseboard Management ControllerSpecial microcontroller connected to sensors on hardwarePhyiscal Interfaces:SM Buses, Serial Port, IMPBBMC communicates with BMU(Baseboard Management Controller Management Utility)
Tobias Weßeler Resource management with Slurm 16 / 31
Introduction Slurm Energy and Power Measurement Upcoming Summary Literature
Example Energy Plugins
Slurm Energy Plugin - IPMI
Figure: Concept drawing [Wes]
Tobias Weßeler Resource management with Slurm 17 / 31
Introduction Slurm Energy and Power Measurement Upcoming Summary Literature
Example Energy Plugins
Slurm Energy Plugin - Config
Slurm.conf (main config file)AcctGatherEnergyTypeSpecifies which plugin should be used.AcctGatherNodeFreqTime interval between pollings in seconds.
Acct_gather.conf (same dir as slurm.conf)Contains configuration for acct_gather related pluginsE.g. EnergyIPMIFrequency:number of seconds between BMC access samples
Tobias Weßeler Resource management with Slurm 18 / 31
Introduction Slurm Energy and Power Measurement Upcoming Summary Literature
Example Energy Plugins
Slurm Energy Plugin - ext_sensors
New infrastructure -> HDEEMHigh Definition Energy Efficiency MonitoringHigh resolution: ~1000 samples per secondPlugin needs to be written to utilize functionalityExt_sensors plugin works independently from acct_gatherpluginsStandard config files only allows up to 1 call per second
Tobias Weßeler Resource management with Slurm 19 / 31
Introduction Slurm Energy and Power Measurement Upcoming Summary Literature
Example Energy Plugins
HDEEM
Figure: Concept drawing [unk]
Tobias Weßeler Resource management with Slurm 20 / 31
Introduction Slurm Energy and Power Measurement Upcoming Summary Literature
Example Energy Plugins
Challenges
CPU current and voltage are highly dynamicFrequency is much higher than sampling ratePower meter returns instant values - need to be convertedMany conversion steps required - sources of inaccuracy:
Voltage and current sensorsAnalog-digital-converterLowpass filtersdata formatsaverage calculations
Calculation of energy correct avg power values over time period
Tobias Weßeler Resource management with Slurm 21 / 31
Introduction Slurm Energy and Power Measurement Upcoming Summary Literature
Example Energy Plugins
Configuration cases
node energy monitoringAcctGatherEnergyType=acct_gather_energy/ipmi or raplAcctGatherNodeFreq=<seconds>orExtSensorsType=ext_sensors/rrd
ExtSensorsFreq=<seconds>
job/step energy accountingJobAcctGatherType=jobacct_gather/linux or cgroupAcctGatherEnergyType=acct_gather_energy/ipmi or raplJobAcctGatherFrequency=task=<seconds>orJobAcctGatherType=jobacct_gather/linux or cgroup
ExtSensorsType=ext_sensors/rrd
job/step power profilingAcctGatherEnergyType=acct_gather_energy/ipmi or raplAcctGatherProfileType=acct_gather_profile/hdf5
JobAcctGatherFrequency=energy=<seconds>
Tobias Weßeler Resource management with Slurm 22 / 31
Introduction Slurm Energy and Power Measurement Upcoming Summary Literature
Storing Accounting and Profiling Data
Slurm Energy Plugin - File Format
Efficient file format needed to store collected dataHDF5 (Hierarchical Data Format version 5)Represents a wide variety of data structures within a single fileSupports very complex dataHigh-level interfaces for C, C++, Fortran 90 and Java
Tobias Weßeler Resource management with Slurm 23 / 31
Introduction Slurm Energy and Power Measurement Upcoming Summary Literature
Storing Accounting and Profiling Data
HDF5
Figure: Features [The16]
Tobias Weßeler Resource management with Slurm 24 / 31
Introduction Slurm Energy and Power Measurement Upcoming Summary Literature
Storing Accounting and Profiling Data
HDF5 - Requirements
Shared filesystem for compute nodesSlurm.conf:Uses HDF5 Profile PluginAcct_gather.conf:Root directory of profiling data (must be in shared filesystem)Each slurmstepd keeps his own fileFiles are merged after job completion
Tobias Weßeler Resource management with Slurm 25 / 31
Introduction Slurm Energy and Power Measurement Upcoming Summary Literature
Storing Accounting and Profiling Data
HDF5 Slurm integration
Figure: Workflow [unk]
Tobias Weßeler Resource management with Slurm 26 / 31
Introduction Slurm Energy and Power Measurement Upcoming Summary Literature
New features in slurm 16.05
What’s new?Supports asymmetric resource allocation
Different amount of resources for each process / rankEnables MPMD approach
Classical approach: SPMDExample call: mpirun -np 2 a.out : -np 2 b.out
Figure: Concept drawing [Wes]
Tobias Weßeler Resource management with Slurm 27 / 31
Introduction Slurm Energy and Power Measurement Upcoming Summary Literature
Summary
HPC software stackSlurm is a good choiceMore possibilities for resource management in near future
Energy AccountingRange of available plugins is growingEnergy consumption and power profiles become increasinglyimportant due to high costs in HPCAccurate power profiling is difficult
Tobias Weßeler Resource management with Slurm 28 / 31
Introduction Slurm Energy and Power Measurement Upcoming Summary Literature
Literature
[Ada14] Adaptive Computing Enterprises. Maui SchedulerAdministrator’s Guide, 1999-2014.http://docs.adaptivecomputing.com/maui/8.2backfill.php.
[ea14] Daniel Hackenberg et al. HDEEM: High Definition EnergyEfficiency Monitoring. 2014.
[Mar13] Martin Perry (Bull). Energy Accounting and ExternalSensors Plugins, 2013. http://www.schedmd.com/.
[Sch16] SchedMD LLC. Slurm Commercial Support andDevelopment, 2011-2016. http://www.schedmd.com/.
[The16] The HDF Group. Hierarchical Data Format, version 5,1997-2016. http://www.hdfgroup.org/HDF5/.
[unk] unknown. materials provided by R. Heidari.
[Wes] Tobias Wesseler. based on understanding of the topic.
[Yia12] Yiannis Georgiou (Bull). Enhancing SLURM with EnergyConsumption Control and Monitoring Features, 2012.http://www.schedmd.com/.
Tobias Weßeler Resource management with Slurm 29 / 31
Introduction Slurm Energy and Power Measurement Upcoming Summary Literature
Thank you!
Questions?
Tobias Weßeler Resource management with Slurm 30 / 31
Introduction Slurm Energy and Power Measurement Upcoming Summary Literature
Also read for information:
[ea14][Mar13][Yia12]
Tobias Weßeler Resource management with Slurm 31 / 31