the search for energy- efficient building blocks for the data center laura keys, suzanne rivoire,...
TRANSCRIPT
The Search for Energy-Efficient Building Blocks for the Data Center Laura Keys,Suzanne Rivoire, andJohn D. [email protected], Microsoft Research Silicon Valley
22
Data Center Energy CostFacility: ~$200M for 15MW facility (15-year amort.)Servers: ~$2k/each, roughly 50,000 (3-year amort.)Average server power draw at 30% utilization: 80%Commercial Power: ~$0.07/KWhr
Observations:$2.3M/month from charges functionally related to powerPower related costs trending flat or up while server costs trending down
Details at: http://perspectives.mvdirona.com/2008/11/28/CostOfPowerInLargeScaleDataCenters.aspx
Courtesy: James Hamilton, ISCA 2009
$2,997,090
$1,296,902
$1,042,440
$284,682
Monthly CostsServers
Power & Cool-ing Infrastruc-ture
Power
33
Energy Efficient Data CentersDecreasing Power Usage Effectiveness
(PUE)
Non-IT equipment being handled more efficientlyEnergy-efficiency in DC now depends on HW and SW being run!
Reduce Waste
Data for pie chart from http://www.42u.com/green-data-center.htm
44
Research LandscapeTrend: low-end processors + SSDs for energy efficiency
FAWN (embedded, desktop, server)Amdahl Blades (embedded, server)CEMS (desktop)
No systematic comparison across all processor classesUsually focused on a single benchmark
55
Paper SummaryCompare 4 system classes
Embedded, mobile, desktop, and server
On single-machine and cluster workloads
Different mixes of processor, memory, I/O
Goal: understand where each system class is best and where it falls short
66
OutlineMotivationHardware systemsBenchmarksResults
Single machine5-node clusters
CaveatsConclusions
77
Hardware SystemsSystem Under Test CPU Memory Disk(s) System Information Approx. cost
1A (embedded)Intel Atom N230, 1-core, 1.6 GHz, 4W TDP
4 GB DDR2-800 1 SSD Acer AspireRevo $600
1B (embedded)Intel Atom N330, 2-core, 1.6 GHz, 8W TDP
4 GB DDR2-800 1 SSD Zotac IONITX-A-U $600
1C (embedded)Via Nano U2250, 1-core, 1.6 GHz
2.37 GB DDR2-800*
1 SSDVia VX855
sample
1D (embedded)Via Nano L2200, 1-core, 1.6 GHz
2.86 GB DDR2-800*
1 SSDVia CN896/VT8237S
sample
2 (mobile)Intel Core2 Duo, 2-core, 2.26 GHz, 25W TDP
4 GB DDR3-1066
1 SSD Mac Mini $1200
3 (desktop)AMD Athlon, 2-core, 2.2 GHz, 65W TDP
8 GB DDR2-800 1 SSD MSI AA-780E sample
4 (server)AMD Opteron, 4-core, 2.0 GHz, 50W TDP
32 GB DDR2-800
2 10K RPMSupermicro AS-1021M-T2+B
$1900
88
BenchmarksSingle Machine
CPUEaterSPEC CPU2006 IntegerSPEC Power 2008JouleSort
5-node Cluster (DryadLINQ)SortStaticRankPrimeWordCount
9
Results
1010
System powerChipset power dominates embedded system power
Idle 100% CPU Utilization0
50
100
150
200
250
300 Atom (1-core), SUT 1A
Atom (2-cores), SUT 1B
Via U2250, SUT 1C
Intel Core2 Duo, SUT 2
Via L2200, SUT 1D
AMD Athlon Dual core, SUT 3
AMD Opteron (2x4), SUT 4
AMD Opteron (2x2)
AMD Opteron (2x1)
Wa
tts
1111
Spec CPU 2006 IntegerNormalized per core performanceCore 2 Duo on par or exceeds server cores
perlb
ench
bzip
2gc
cm
cf
gobm
k
hmm
er
sjen
g
libqu
antu
m
h264
ref
omne
tpp
asta
r
xala
ncbm
k0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0Opteron (2x4), SUT 4
Opteron (2x2)
Opteron (2x1)
Athlon, SUT 3
Core2Duo, SUT 2
Atom N230, SUT 1A
Atom N330, SUT 1B
Nano U2250, SUT 1C
Nano L2200, SUT 1D
Norm
alized S
PEC
CP
U2006 I
NT
1212
Spec Power 2008
Idle 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Intel Core2Duo, SUT 2
AMD Opteron (2x4), SUT 4
Atom (2-core) SUT 1 B
AMD Athlon, SUT 3
AMD Opteron (2x2)
Atom (1-core), SUT 1A
AMD Opteron (2x1)
CPU Utilization
Perf
orm
ance
to P
ow
er
Rati
o
(SS
J opera
tions/W
)
1313
Single Machine SummaryChipset power is the limiting factor
for embedded systemsHigh-end mobile cores have the right mix of power and performanceDesktop cores not competitive from total system power perspectiveServer system becoming more efficient
Cluster investigation → High-end mobile, Server & embedded
1414
Cluster Energy Efficiency
sort-5
p
Sort...
Prim
es
Stati...
Wor
d... G.
0
1
2
3
4
5
6
1.61.8
4.3
1.8
0.8
1.8
4.64.9
3.1
4.8 4.8
4.4
Core 2 Duo, SUT2 Atom, SUT 1B Opteron, SUT4
Benchmarks
Norm
alize
d e
nerg
y u
sage
1515
CaveatsLimited by real mobile/embedded HW
Memory: no ECC, limited capacityI/O: limited ports and bandwidthChipset/other components: not energy-efficient, dominate system power
Cluster benchmarks scaled for small systems
Increased task overhead on serversMain memory over provisioned on servers
1616
ConclusionsCan improve energy-efficiency by 2-4XAlmost no performance degradation (QoS)Ideal machine can do better
High-end mobile processorLarge capacity ECC-protected DRAMLow-power chipsetMore I/O ports and higher bandwidth
17
© 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after
the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
1818
Processor vs. I/O Subsystem
1919
JouleSort Results
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
0 20 40 60 80 100 120 140
Reco
rds S
orte
d / J
oule
Power (W)
CoolSort
CoolSort
Atom, SUT 1B
Core2 Duo, SUT 2
Opteron 1-pass, 4C
Opteron 2-pass, 4E
Athlon 2-pass, SUT 3C
Core2 Extrap.
OzSort
Future Performance Plateau?
SSD Performance Plateau
HDD-based Old JouleSort Performance