highlightsofearlyfacilityexperienceatthe newhighefficiencyncar … · 2020-05-22 · sat sun mon...
TRANSCRIPT
1
Highlights of Early Facility Experience at the
New High Efficiency NCAR Wyoming Supercompu?ng Center
Aaron Andersen
Deputy Director Opera?ons & Services May 2013
2
3
General Facility Informa?on
• 153,000 Gross Sq.Ft. • 2 -‐ 12,000 Sq.Ft. Data Halls • 4 – 4MW electrical modules 16MW possible • 98~99% annual hours Evapora?ve Water Side Economizer • Cri?cal Systems are on UPS and Generator Backup
– Network, Cybersecurity, Storage, Archive – HPC Nodes are not on UPS
• 10 ^. Raised Floor System – Primary reason was large liquid cooled systems – Secondary reason limit copper distances
• Heat Pump System Recover Heat from Compu?ng for Building Use
4
Facility Highlights
• All about Energy Efficiency – Opera?onal cost savings – Emissions reduc?on
• Minimize risk – Not increasing inlet temperatures or using outside air
– U?lize liquid cooling where cost effec?ve • Efficient use of Capital
– Efficiency efforts had to show 5 year payback
5
Big Picture Focus On Efficiency • U?lize the regions cool, dry
climate to minimize energy use – Very low pressure drop
• Minimize bends • Oversized pipe
– Elevated chilled water temp • 65 degree
• U?lize liquid cooled computer solu?ons where prac?cal – HPC Systems
• U?lize hot aisle containment for commodity equipment
• Focus on the biggest losses – Compressor based cooling – Transformer losses
0.1%
15.0% 6.0%
13.0% 65.9%
Typical Modern Data Center
Lights
Cooling
Fans
Electrical Losses
IT Load
91.9%
NWSC Design
Lights
Cooling
Fans
Electrical Losses
IT Load
6
65 F (18.3 C) Degree Chilled Water Evapora?ve Solu?on
7
Cooling Tower • Very high efficiency tower
• U?liza?on of outside condi?ons to op?mize performance – Water use reduc+on by not running the fans all year
– Energy use reduc+on
Integral Group
NCAR Supercomputing
LEED EAp2 & EAc1 Narrative
03.15.2011
5
Figure�3:�Cooling�tower�fan�does�not�need�to�operate�at�low�wetbulb�temperatures�
��Unlike�the�baseline�DX�units,�the�proposed�cooling�system�takes�advantage�of�the�particular� climate�of�Cheyenne� to�deliver� savings.�With�most�of� the� year�cool�and�dry,� the� cooling� towers�are�able� to�operate�without� the�need� for�a�chiller�for�a�much�greater�percentage�of�the�year�than�in�many�other�locations.��Cooling� towers� operate� based� on�wetbulb� temperature,�which� is� the� lowest�temperature�to�which�air�can�be�reduced�to�through�the�evaporation�of�water�at�constant�pressure.�The� lower�the�wetbulb�temperature,�the� less�fan�energy�necessary� to� cool�water� to� a� given� temperature.� Figure�4,�opposite,�outlines�this�point�using�3�regions.�Region�1�on�the�left�is�the�number�of�hours�per�year�the� cooling� tower� is� able� to� operate� without� the� use� of� fan� energy.� This�amounts� to� 4,000� hours,� or� 46%� of� the� year.� Region� 2� in� the�middle� is� the�number�of�hours�per�year�the�cooling�towers�need�to�use�fans�to�cool�the�water�but�still�do�not�need�a�chiller.�This�amounts�to�4,200�hours,�or�48%�of�the�year.�On� the� right� is� region� 3,�which� represents� the� number� of� hours� per� year� a�chiller� is� necessary.� The� chiller� runs� 600�hours� per� year,� and� of� those� hours�operates�above�20%�load�for�only�41�hours.�This�is�0.5%�of�the�year.���
Figure�4:�Cheyenne�weather�conditions�and�cooling�tower�operation�
��
Region�1:�Cooling�tower�without�fans,�46%�of�the�year�Region�2:�Cooling�tower�with�fans,�48%�of�the�year�Region�3:�Chiller�operates,�7%�of�the�year�
�Lastly,�the�cooling�tower�is�designed�to�operate�in�parallel�with�the�chiller�as�an�Mintegrated� water�side� economizerO.� Therefore,� even� when� the� chiller�operates,�the�cooling�tower�is�providing�cooling�directly�to�the�cooling� loop�as�well� as� to� the� chillerPs� condenser� loop.�Doing� so�minimizes� the� load� on� the�chiller�to�the�greatest�extent�possible.�Figure�5�on�the�following�page�illustrates�the�operation�of�the�water�side�economizer.��� �
8
Hot Aisle Containment
650 × 366 -‐ weather.com
9
Fan Wall and 10 ^. (3 m.) Raised Floor
10
Yellowstone Water Cooling
11
IT Load and Facility Tuning Increasing�IT�Load�Improves�PUE
0
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
2.4
2.6
2.8
3
3.2
5/12 6/12 7/12 8/12 9/12 10/12 11/12 12/12 1/13 2/13 3/13
IT�Loa
d�(kW)
PUE
Time�(month/year)
PUE�vs.�IT�Load
PUE
PUE�Average
IT�Load�Avg
IT�Loa
d
PUE
12
Actual Annual PUE Trending Toward Design PUE Actual�Annual�Data�Center�PUE
Trending�Toward�Design�PUE
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
2.4
2.6
2.8
3
3.2
5/12 6/12 7/12 8/12 9/12 10/12 11/12 12/12 1/13 2/13 3/13
PUE
Time�(month/year)
NWSC Data�Center�PUE
PUE
Integral�AVG
1.22PUE
1.09PUE
13
Two Main Sources of Variability
• Computer Load Variability – Power saving computer system at scale – Peak Demand ~1700W – Typical ~700kW – 1500kW – Idle ~300kW
• Weather Variability – Hea?ng Load increase in winter months affects PUE
– Real cost savings in part dependent on natural gas prices
14
NCAR Data-Centric Architecture
DAV Resource “geyser” & “caldera”
Data Transfer Services
CFDS Resource “glade”
Science Gateways RDA, ESG
HPC Resource “yellowstone”
High Bandwidth Low Latency HPC and I/O Networks FDR InfiniBand and 10Gb Ethernet
Data CollecLons Project Spaces
Scratch User & Home
Archive Interface
Partner Sites XSEDE Sites Remote Vis
1Gb/10Gb Ethernet (40Gb+ future)
1.50 PF Peak 29 bluefire-‐equivalents
NCAR HPSS Archive 100 PB capacity ~15 PB/yr growth
login Filesystem servers & IO aggregators
>90 GB/sec aggregate 2012: 11 PB 2014: 16.4 PB
15
NCAR HPC Profile
0.0
0.5
1.0
1.5
Jan-04 Jan-05 Jan-06 Jan-07 Jan-08 Jan-09 Jan-10 Jan-11 Jan-12 Jan-13 Jan-14 Jan-15 Jan-16
IBM iDataPlex/FDR-IB (yellowstone)
Cray XT5m (lynx)
IBM Power 575/32 (128) POWER6/DDR-IB (bluefire)
IBM p575/16 (112) POWER5+/HPS (blueice)
IBM p575/8 (78) POWER5/HPS (bluevista)
IBM BlueGene/L (frost)
IBM POWER4/Colony (bluesky)
bluesky
bluefire
frost
lynx
yellowstone
bluevistablueice
frost upgrade
16
Historical Power Efficiency on NCAR Workload
0
10
20
30
40
50
60
70
Jan-00 Jan-01 Jan-02 Jan-03 Jan-04 Jan-05 Jan-06 Jan-07 Jan-08 Jan-09 Jan-10 Jan-11 Jan-12 Jan-13
Power Consumption (sustained MFLOP per Watt)
IBM iDataPlex/FDR-IB (yellowstone)
Cray XT5m (lynx)
IBM POWER6 Power575 (bluefire)
IBM POWER5+ p575 (blueice)
IBM POWER5 p575 (bluevista)
IBM BlueGene/L (frost)
IBM AMD/Opteron Linux
IBM POWER4 p690
SGI Origin3800
Compaq ES40
IBM POWER3 WH-2
SGI Origin2000
SGI Origin2000
HP SPP-2000
Cray J90
Cray J90~0.2 sus MFLOPs/Watt
Yellowstone~43 sus MFLOPs/Watt
Lynx~14 sus MFLOPs/Watt
Bluefire~6 sus MFLOPs/Watt
Bluevista / Blueice~1.4 sus MFLOPs/Watt
Frost~8 sus MFLOPs/Watt
17
Beyond PUE Highly Dynamic Compute System Loads
18
System U?liza?on Yellowstone
Avg Availability98.02%
Avg Utilization84.82%
Avg AdvRes0.97%
Geyser
Avg Availability78.95%
Avg Utilization11.63%
Avg AdvRes0.00%
Caldera
Avg Availability87.50%
Avg Utilization17.59%
Avg AdvRes0.00%
0%
20%
40%
60%
80%
100%
04/0
6
04/0
7
04/0
8
04/0
9
04/1
0
04/1
1
04/1
2
04/1
3
Yellowstone Availability & Utilization 4518 nodes, 16 CPU/node
%Avail-%AdvRes %Availability
%Utilization
Sun Mon Tue Wed Thu Fri Sat
Racks H01 & H02 being tested with cgroups (special queue)
Racks H01 & H02 returned to production
04/10 15:00
0%
20%
40%
60%
80%
100%
04/0
6
04/0
7
04/0
8
04/0
9
04/1
0
04/1
1
04/1
2
04/1
3
Caldera (GPU DAV) Availability & Utilization 16 nodes, 16 CPUs/node
%Avail-%AdvRes %Availability
%Utilization
Sun Mon Tue Wed Thu Fri Sat
0%
20%
40%
60%
80%
100%
04/0
6
04/0
7
04/0
8
04/0
9
04/1
0
04/1
1
04/1
2
04/1
3
Geyser (Large Memory DAV) Availability & Utilization 16 nodes, 40 CPUs/node
%Avail-%AdvRes %Availability
%Utilization
Sun Mon Tue Wed Thu Fri Sat
'sacks' AdvRes geyser09
Input queue emptied
Erebus
Avg Availability95.57%
Avg Utilization29.67%
Avg AdvRes0.00%
0%
20%
40%
60%
80%
100%
04/0
6
04/0
7
04/0
8
04/0
9
04/1
0
04/1
1
04/1
2
04/1
3
Erebus (AMPS) Availability & Utilization 84 nodes, 16 CPUs/node
%Avail-%AdvRes %Availability
%Utilization
Sun Mon Tue Wed Thu Fri Sat
19
Summer Mechanical Trend
0"
50"
100"
150"
200"
250"
300"
350"
0.0" 2.0" 4.0" 6.0" 8.0" 10.0" 12.0" 14.0" 16.0" 18.0" 20.0"
Mech%Load
%[kW]%
WB%Temp%[deg%C]%
WB%vs%Mech%Load%
20
An Ini?al Fall Surprise (Mechanical Load (kw) vs. WetBulb (F)
50#
100#
150#
200#
250#
300#
350#
10# 15# 20# 25# 30# 35# 40# 45# 50# 55# 60# 65# 70# 75# 80#
Mechanical)(kw))
21
Looking Closer at October Data
40 42 44 46 48
2030
4050
6070
Mechanical Systems Power Use, October 2012
Outside Relative Humidity
Out
side
Tem
pera
ture
100
150
200
250
300
kW
22
$-‐
$500.00
$1,000.00
$1,500.00
$2,000.00
$2,500.00
$3,000.00
$3,500.00
Jan-‐13 Feb-‐13 Mar-‐13
Energy Cost p
er Day
Jan-‐13 13-‐Feb 13-‐Mar Cost Per Day Gas $14.14 $122.95 $90.76
Cost Per Day Elec $2,870.19 $2,638.93 $2,665.76
Comparison of Daily Cost Heat Pumps vs. Boiler
23
NWSC as a Teaching Laboratory
24
Emerson Hannon Villanova
Electrical Engineering
• Energy Efficiency • Modeling of NWSC during construc?on
• Developed energy model NWSC based on outside condi?ons
1.05
1.1
1.15
1.2
1.25
1.3
02 03 04 05 06 07 08 09 10 11 12
Power Usage Effe
cLvene
ss
Month
Modeled PUE
25
Jared Baker University of Wyoming Mechanical Engineering
• Computa?onal Fluid Dynamics
• Validated TileFlow During Construc?on
• Simulated mul?ple computer vendor configura?ons during procurement selec?on process
26
Clair Christofersen University of Miami Ohio
Mechanical Engineering – Applied Math
• Compared Jared’s Model to Actual Condi?ons
• Further Tileflow analysis • Comparison of Hot Aisle
Containment • Temperature pressure
measurements
27
Jason Jones Clemson University
Mechanical Engineering • Energy Efficiency Analysis & Measurement • Comparison of Actual to Model • Fan wall energy analysis • Mechanical response to ambient
condi?ons
28
Ademola Olrinde Texas A&M Kingsville
Mechanical Engineering
• Energy Efficiency Op?miza?on of NWSC • Detailed Analysis of 12 month opera?ng
Data • Op?miza?on Efforts
– UPS Systems – Heat Pumps
• Graduate Work in Pipe Flow and Pumping Power
29
Now Focused on the Le^ Side of PUE Decimal
30
"Stochas?cally Robust Resource Alloca?on for Thermal-‐Aware Heterogeneous Compu?ng”
• Colorado State University – Sudeep Pasricha – HJ Siegel – Mark Oxely
• NSF Funded Project • Looking at energy aware
resource scheduling
NSF Funded Proposal
31
"Op?mized Energy Aware Scien?fic Throughput at Scale"
NSF Proposal Stage • NCAR
– Aaron Andersen – Stephen Sain
• Colorado State University – Sudeep Pasricha – HJ Siegel
• Georgia Ins?tute of Technology – Yogendra Joshi
• Power Aware Scheduling • Power Aware Data Movement • Sta?s?cal Approaches to Scale
PI serverData collection and control
system
Building management systemCRAC fan speed control and return
air temperature control
Data acquisition system Air inlet and exit temperatures, fan
speed, humidity, server power
PIV Data BaseServer fan speed and air flow
characterization
Server compute load allocationData center virtualization
POD based room level air flow and compute load allocation
Data inflow to PI server
Data outflow from PI server