capacity planning - tu-dresden.de · capacity planning . zellescher weg 12 raum wil a113 tel. +49...
TRANSCRIPT
Zellescher Weg 12
Raum WIL A113
Tel. +49 351 - 463 - 39835
Matthias Müller ([email protected])
Center for Information Services and High Performance Computing (ZIH)
Leistungsanalyse von Rechnersystemen
Capacity Planning
Zellescher Weg 12
Raum WIL A113
Tel. +49 351 - 463 - 39835
Matthias Müller ([email protected])
Center for Information Services and High Performance Computing (ZIH)
Capacity Planning
Holger Brunst, Matthias Müller: Leistungsanalyse
Two quotes
Do not plan a bridge capacity by counting the number of people who swim across the river today
Heard at a presentation, according to Raj Jain
Prediction is very difficult, especially about the future
Nils Bohr
Holger Brunst, Matthias Müller: Leistungsanalyse
Terms
Capacity planning:
– Ensuring that adequate computer resources will be available to meet the future workload demands
– Alternative: just buy tons of equipment
Capacity management:
– Ensuring that the currently available computer resources are used to provide the highest performance
– Alternatives:
• Adjust usage
• Rearrange configuration
• Change system parameters (performance tuning)
Holger Brunst, Matthias Müller: Leistungsanalyse
Steps in capacity planning – one men show
Instrument
the system
Monitor usage
Characterize workload
Forecast workload
System model
Change system
parameters
Cost and performance
Ok? Done
No Yes
Holger Brunst, Matthias Müller: Leistungsanalyse
Steps in capacity planning – procurement process
Instrument
the system
Monitor usage
Characterize workload
Forecast workload
System model
Change system
parameters
Cost and performance
Ok? No Yes
Make offer
Evaluate offer(s)
vendor
customer
Holger Brunst, Matthias Müller: Leistungsanalyse
Problems in capacity planning
1. Different capacity planning tools use different terminology
2. There is no standard definition of capacity
Maximum throughput (jobs per seconds, transactions per second)
Maximum number of users meeting specified performance
3. Different capacities (nominal, usable, knee)
4. No standard workload unit
5. Forecasting future applications is difficult
6. No uniformity among systems from different vendors, same workload takes different amount of resources on different systems
7. Model input parameters cannot always be measured (e.g. think time )
8. Validating model projections is difficult
1. Baseline validation (reproduce the measurement)
2. Projection validation (verify that your model is predictive)
9. Distributed environments are too complex to model
10. Performance is only a small part of the price/performance game (TCO is complex)
Holger Brunst, Matthias Müller: Leistungsanalyse
Contributions to TCO (total cost of ownership)
Cost of hardware
Cost of software
Installation
Maintenance
Personnel (sys admins, support staff)
Floor space (building infrastructure)
Power
Climate (temperature, humidity)
Insurance
Cost Distribution for HRSK-I (2006) and HRSK-II (2013)
Matthias S. Müller
Holger Brunst, Matthias Müller: Leistungsanalyse
Benchmarking games
Different configurations may be used to run a workload
Compilers may be wired to optimize the workload
Test specifications might be biased towards one machine
A synchronized job sequence might be used (e.g smart mixture of I/O bound and CPU bound jobs)
Workload might be random
Benchmarks might be too small
Benchmarks might measure the benchmarker rather than the machine
Holger Brunst, Matthias Müller: Leistungsanalyse
Vendor problems in procurement process of HPC systems
Competition is unknown
Time difference between offer and delivery:
– New clock frequencies of CPUs
– New generation of CPUs
– New system generation
Capability prediction often requires a difficult scaling analysis
– Scaling of application is more difficult than Amdahls law
– Large size difference between benchmarking and real system
– Caching effects difficult to understand, especially in combination with new CPU generations
If the performance prediction is too conservative you don t win the RFP
If the performance prediction is too aggressive you might have to pay penalties
Holger Brunst, Matthias Müller: Leistungsanalyse
Customer problems in procurement process of HPC systems
It is difficult to predict future workloads
It is difficult to create unbiased workloads that are representative for your user base, since your users are using your current system
It is difficult to be unbiased without opening the possibility to measure the benchmarking team rather than the computer (source code modifications)
If the procurement is too small you have to do the one man show procurement style
Holger Brunst, Matthias Müller: Leistungsanalyse
Poor mans capacity planning technology
In many site the future workload is so unknown that more sophisticated prediction techniques may not be of great help
Simple rules of thumb factor x every y years
Often your budget is fixed and what you get is determined by the market
Holger Brunst, Matthias Müller: Leistungsanalyse
Proc
60%/yr.
(2X/1.5yr)
DRAM
9%/yr.
(2X/10 yrs)
Moore s Law
CPU
DRAM
Improvement over factor x every y years: factors x_i
Zellescher Weg 12
Raum WIL A113
Tel. +49 351 - 463 - 39835
Matthias Müller ([email protected])
Center for Information Services and High Performance Computing (ZIH)
Examples from real life
Zellescher Weg 12
Raum WIL A113
Tel. +49 351 - 463 - 39835
Matthias Müller ([email protected])
Center for Information Services and High Performance Computing (ZIH)
Backup Volume at ZIH
Holger Brunst, Matthias Müller: Leistungsanalyse
Prediction of Backup Volume
Holger Brunst, Matthias Müller: Leistungsanalyse
Prediction of Backup Volume
Holger Brunst, Matthias Müller: Leistungsanalyse
Prediction of Backup Volume
Real growth of backup usage
Holger Brunst, Matthias Müller: Leistungsanalyse
0.00
200.00
400.00
600.00
800.00
1000.00
1200.00
1400.00
1600.00
Jan-06 Jul-06 Jan-07 Jul-07 Jan-08 Jul-08 Jan-09 Jul-09 Jan-10
Kapazitive Auslastung der TSM-Systeme in TB (Brutto)
Real growth of backup usage (log scale)
Holger Brunst, Matthias Müller: Leistungsanalyse
10.00
100.00
1000.00
10000.00
Jan-06 Jul-06 Jan-07 Jul-07 Jan-08 Jul-08 Jan-09 Jul-09 Jan-10
Kapazitive Auslastung der TSM-Systeme in TB (Brutto)
Zellescher Weg 12
Raum WIL A113
Tel. +49 351 - 463 - 39835
Matthias Müller ([email protected])
Center for Information Services and High Performance Computing (ZIH)
Performance Prediction of Future NEC System
Holger Brunst, Matthias Müller: Leistungsanalyse
Comparison SX-6 versus SX-8
SX-6 SX-8
CPU
Clock cycle 1GHz 2GHz
Peak Vector Performance 8GF 16GF
Peak Scalar Performance 1GF 2GF
Memory 8GB 16GB
Memory Bandwidth 32GB/s 64GB/s
LSI Process 0.15um 90nm
NODE
No. of CPUs 8 8
Peak Vector Performance 64GF 128GF
Memory 64GB 128GB
Memory Bandwidth(aggr.) 256GB/s 512GB/s
Inter-node Bandwidth 8GB/s x 2 16GB/s x 2
I/O Bandwidth 8GB/s 12.8GB/s
Holger Brunst, Matthias Müller: Leistungsanalyse
Some performance expectation
SX6+ SX8 Factor
Frequency 563 MHz 1 GHz 1.78
Memory BW 36 GB/s 64 GB/s 1.78
Memory Lat ? ? ~1
IXS BW 8 GB/s 16 GB/s 2
IXS Latency 6.9 micro s 5.9 micro s 1.17
SQRT 300 MFlops 1500 MFlops 5
Holger Brunst, Matthias Müller: Leistungsanalyse
Performance values from Phase I
Promised Delivered
Memory Band/CPU 50 GB/s 63 GB/s
Memory Band/Node 320 GB/s 360 GB/s
Bisection 230 GB/s 560 GB/s
InterNode MPI Latency 8 micro 4,61 micro
InterNode MPI Bandwidth 12 GB/s 14,2 GB/s
Fenfloss 45 GF/s 45,55 GF/s
Uranus 46 GF/s 49,17 GF/s
N3D 50 GF/s 52,6 GF/s
Holger Brunst, Matthias Müller: Leistungsanalyse
Prediction from 16 to 72 nodes on next generation system
Holger Brunst, Matthias Müller: Leistungsanalyse
SPEC OMP – single node results
Holger Brunst, Matthias Müller: Leistungsanalyse
HPCC 4 node results
Zellescher Weg 12
Raum WIL A113
Tel. +49 351 - 463 - 39835
Matthias Müller ([email protected])
Power Estimation
Holger Brunst, Matthias Müller: Leistungsanalyse
Power Estimation
Management Question: What is the power consumption of a 15 Million Euro system in 2011/2012?
Holger Brunst, Matthias Müller: Leistungsanalyse
Top500 for extrapolation of performance
Daniel Hackenberg
Stromverbrauch in kW existierender Systeme auf Platz 50 der TOP500
Nov-06 Jun-07 Nov-07 Jun-08 Nov-08 Jun-09 Wert aus TOP500 247,03 283,34 375,55 503,90 Wert aus TOP100 136,11 183,45 237,04 365,04 Wert aus TOP50 111,56 175,86 247,62 345,41 Wert aus TOP50 ohne CELL, BG 163,56 253,45 349,61 598,40 Deimos 250
Der Stromverbrauch nimmt mit der Zeit deutlich zu
Starke Abhängigkeit von betrachteten Systemen
Weltweit große Anstrengungen die Zunahme des Energieverbrauchs zu begrenzen, allerdings ist in den nächsten drei Jahren nur mit begrenztem Erfolg zu rechnen
Daniel Hackenberg
First Approach: TOP500 Based Extrapolation
Extrapolation from available (limited) data points
~2.5 MW for two rank 50 systems in 2012
Problems:
Only energy efficient data centers submit their power measurements?
Blue Gene or Cell Systems have significant influence on average
Summary of Estimation
Original question: Power consumption of a 15 Million Euro System in the future?
Assumptions:
– The money spent for the TOP500 systems is constant
– Exponential growth of performance will continue
– No major technology break-through in power efficiency in the next 3 years
Approach:
– Modified question: What is the power consumption of the system ranked at position 50 in the TOP500 list?
– Extrapolation of performance with exponential growth
– Estimation of power efficiency based on subset of TOP500 list
Holger Brunst, Matthias Müller: Leistungsanalyse
Zellescher Weg 12
Raum WIL A113
Tel. +49 351 - 463 - 39835
Matthias Müller ([email protected])
Capacity Planing for Next Generation Disc Room
Holger Brunst, Matthias Müller: Leistungsanalyse
Disc room planning
Original question: What is the required size of the disc room of the next infrastructure? It should be sufficient for the next 10 years of storage requirements. Options:
– 50 m^2
– 100 m^2
– 200 m^2
What is needed? – Grow rate of demand
– Capacity of disc (grow rate)
– “Package density” of discs. How?
– What else?
Holger Brunst, Matthias Müller: Leistungsanalyse
0,1
1
10
100
1000
10000
1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
Entwicklung der Bruttokapazitäten in TB
Disc capacity ZIH
75%
Hard disc capacity (100x in ten years, 58% per year)
From Wikipedia, the free encyclopedia
Density of discs in machine room
High Density:
–48 U
–600 discs in 10 4RU drawers
Low Density:
–12 discs in 2RU
–Max 240 discs in 40 U rack
ZIH Disc Capacity Trends
Zellescher Weg 12
Raum WIL A113
Tel. +49 351 - 463 - 39835
Matthias Müller ([email protected])
Disc Performance
Holger Brunst, Matthias Müller: Leistungsanalyse
What about disc performance?
Proportional to density?
–40-60% per year?
–No! Why not? Only 10-15% ! –See: IBM White paper: Richard Freitas, Joseph Slember, Wayne Sawdon, and Lawrence Chiu. GPFS Scans 10 Billion Files in 43 Minutes. 2011.
Zellescher Weg 12
Raum WIL A113
Tel. +49 351 - 463 - 39835
Matthias Müller ([email protected])
Linpack
Some properties needed for the excercise
Holger Brunst, Matthias Müller: Leistungsanalyse
The Linpack Benchmark is a measure of a computer s floating-point rate of execution.
It is determined by running a computer program that solves a dense system of linear equations.
Over the years the characteristics of the benchmark has changed a bit.
In fact, there are three benchmarks included in the Linpack Benchmark report.
LINPACK Benchmark Dense linear system solve with LU factorization using partial pivoting Operation count is: 2/3 n3 + O(n2)
Benchmark Measure: MFlop/s Original benchmark measures the execution rate for a Fortran program on a matrix of size 100x100.
Linpack Efficiency vs. Problem Size
Holger Brunst, Matthias Müller: Leistungsanalyse
(E. Strohmeier, ISC’09)
Linpack EfficiencySize
(E. Strohmeier, ISC’09)
Linpack Efficiency vs. Network
Remarks about Performance of AMD CPUs
AMD Athlon X2 240: 2,8 GHz x 2 x 4 DP/cycle = 22,4 GFflops
AMD Phenom X2 545 3,0 GHz x 2 x 4 DP/cycle = 24 GFlops
AMD Phenom X4 925 2,8 GHz x 4 4 DP/cycle = 44,8 GFlops
Holger Brunst, Matthias Müller: Leistungsanalyse
Zellescher Weg 12
Raum WIL A113
Tel. +49 351 - 463 - 39835
Matthias Müller ([email protected])
Center for Information Services and High Performance Computing (ZIH)
Thank you !