highlightsofearlyfacilityexperienceatthe newhighefficiencyncar … · 2020-05-22 · sat sun mon...

31
1 Highlights of Early Facility Experience at the New High Efficiency NCAR Wyoming Supercompu?ng Center Aaron Andersen Deputy Director Opera?ons & Services May 2013

Upload: others

Post on 07-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HighlightsofEarlyFacilityExperienceatthe NewHighEfficiencyNCAR … · 2020-05-22 · Sat Sun Mon Tue Wed Thu Fri Racks H01 & H02 being tested with cgroups (special queue) ... Jan13

1

 Highlights  of  Early  Facility  Experience  at  the  

New  High  Efficiency  NCAR  Wyoming  Supercompu?ng  Center  

 Aaron  Andersen  

Deputy  Director  Opera?ons  &  Services  May  2013  

Page 2: HighlightsofEarlyFacilityExperienceatthe NewHighEfficiencyNCAR … · 2020-05-22 · Sat Sun Mon Tue Wed Thu Fri Racks H01 & H02 being tested with cgroups (special queue) ... Jan13

2

Page 3: HighlightsofEarlyFacilityExperienceatthe NewHighEfficiencyNCAR … · 2020-05-22 · Sat Sun Mon Tue Wed Thu Fri Racks H01 & H02 being tested with cgroups (special queue) ... Jan13

3

General  Facility  Informa?on  

•  153,000  Gross  Sq.Ft.  •  2  -­‐  12,000  Sq.Ft.  Data  Halls  •  4  –  4MW  electrical  modules  16MW  possible  •  98~99%  annual  hours  Evapora?ve  Water  Side  Economizer  •  Cri?cal  Systems  are  on  UPS  and  Generator  Backup  

–  Network,  Cybersecurity,  Storage,  Archive  –  HPC  Nodes  are  not  on  UPS  

•  10  ^.  Raised  Floor  System  –  Primary  reason  was  large  liquid  cooled  systems  –  Secondary  reason  limit  copper  distances  

•  Heat  Pump  System  Recover  Heat  from  Compu?ng  for  Building  Use  

 

Page 4: HighlightsofEarlyFacilityExperienceatthe NewHighEfficiencyNCAR … · 2020-05-22 · Sat Sun Mon Tue Wed Thu Fri Racks H01 & H02 being tested with cgroups (special queue) ... Jan13

4

Facility  Highlights  

•  All  about  Energy  Efficiency    – Opera?onal  cost  savings  – Emissions  reduc?on  

•  Minimize  risk  – Not  increasing  inlet  temperatures  or  using  outside  air  

– U?lize  liquid  cooling  where  cost  effec?ve  •  Efficient  use  of  Capital  

– Efficiency  efforts  had  to  show  5  year  payback  

Page 5: HighlightsofEarlyFacilityExperienceatthe NewHighEfficiencyNCAR … · 2020-05-22 · Sat Sun Mon Tue Wed Thu Fri Racks H01 & H02 being tested with cgroups (special queue) ... Jan13

5

Big  Picture  Focus  On  Efficiency  •  U?lize  the  regions  cool,  dry  

climate  to  minimize  energy  use  –  Very  low  pressure  drop  

•  Minimize  bends  •  Oversized  pipe  

–  Elevated  chilled  water  temp  •  65  degree  

•  U?lize  liquid  cooled  computer  solu?ons  where  prac?cal    –  HPC  Systems  

•  U?lize  hot  aisle  containment  for  commodity  equipment  

•  Focus  on  the  biggest  losses  –  Compressor  based  cooling  –  Transformer  losses  

                               

0.1%  

15.0%   6.0%  

13.0%  65.9%  

Typical  Modern  Data  Center  

Lights    

Cooling  

Fans  

Electrical  Losses  

IT  Load  

91.9%  

NWSC  Design  

Lights    

Cooling  

Fans  

Electrical  Losses  

IT  Load  

Page 6: HighlightsofEarlyFacilityExperienceatthe NewHighEfficiencyNCAR … · 2020-05-22 · Sat Sun Mon Tue Wed Thu Fri Racks H01 & H02 being tested with cgroups (special queue) ... Jan13

6

65  F  (18.3  C)  Degree  Chilled  Water  Evapora?ve  Solu?on  

Page 7: HighlightsofEarlyFacilityExperienceatthe NewHighEfficiencyNCAR … · 2020-05-22 · Sat Sun Mon Tue Wed Thu Fri Racks H01 & H02 being tested with cgroups (special queue) ... Jan13

7

Cooling  Tower  •  Very  high  efficiency  tower  

•  U?liza?on  of  outside  condi?ons  to  op?mize  performance  – Water  use  reduc+on  by  not  running  the  fans  all  year  

–  Energy  use  reduc+on  

Integral Group

NCAR Supercomputing

LEED EAp2 & EAc1 Narrative

03.15.2011

5

Figure�3:�Cooling�tower�fan�does�not�need�to�operate�at�low�wetbulb�temperatures�

��Unlike�the�baseline�DX�units,�the�proposed�cooling�system�takes�advantage�of�the�particular� climate�of�Cheyenne� to�deliver� savings.�With�most�of� the� year�cool�and�dry,� the� cooling� towers�are�able� to�operate�without� the�need� for�a�chiller�for�a�much�greater�percentage�of�the�year�than�in�many�other�locations.��Cooling� towers� operate� based� on�wetbulb� temperature,�which� is� the� lowest�temperature�to�which�air�can�be�reduced�to�through�the�evaporation�of�water�at�constant�pressure.�The� lower�the�wetbulb�temperature,�the� less�fan�energy�necessary� to� cool�water� to� a� given� temperature.� Figure�4,�opposite,�outlines�this�point�using�3�regions.�Region�1�on�the�left�is�the�number�of�hours�per�year�the� cooling� tower� is� able� to� operate� without� the� use� of� fan� energy.� This�amounts� to� 4,000� hours,� or� 46%� of� the� year.� Region� 2� in� the�middle� is� the�number�of�hours�per�year�the�cooling�towers�need�to�use�fans�to�cool�the�water�but�still�do�not�need�a�chiller.�This�amounts�to�4,200�hours,�or�48%�of�the�year.�On� the� right� is� region� 3,�which� represents� the� number� of� hours� per� year� a�chiller� is� necessary.� The� chiller� runs� 600�hours� per� year,� and� of� those� hours�operates�above�20%�load�for�only�41�hours.�This�is�0.5%�of�the�year.���

Figure�4:�Cheyenne�weather�conditions�and�cooling�tower�operation�

��

Region�1:�Cooling�tower�without�fans,�46%�of�the�year�Region�2:�Cooling�tower�with�fans,�48%�of�the�year�Region�3:�Chiller�operates,�7%�of�the�year�

�Lastly,�the�cooling�tower�is�designed�to�operate�in�parallel�with�the�chiller�as�an�Mintegrated� water�side� economizerO.� Therefore,� even� when� the� chiller�operates,�the�cooling�tower�is�providing�cooling�directly�to�the�cooling� loop�as�well� as� to� the� chillerPs� condenser� loop.�Doing� so�minimizes� the� load� on� the�chiller�to�the�greatest�extent�possible.�Figure�5�on�the�following�page�illustrates�the�operation�of�the�water�side�economizer.��� �

Page 8: HighlightsofEarlyFacilityExperienceatthe NewHighEfficiencyNCAR … · 2020-05-22 · Sat Sun Mon Tue Wed Thu Fri Racks H01 & H02 being tested with cgroups (special queue) ... Jan13

8

Hot  Aisle  Containment  

650  ×  366  -­‐  weather.com  

Page 9: HighlightsofEarlyFacilityExperienceatthe NewHighEfficiencyNCAR … · 2020-05-22 · Sat Sun Mon Tue Wed Thu Fri Racks H01 & H02 being tested with cgroups (special queue) ... Jan13

9

Fan  Wall  and  10  ^.  (3  m.)  Raised  Floor  

Page 10: HighlightsofEarlyFacilityExperienceatthe NewHighEfficiencyNCAR … · 2020-05-22 · Sat Sun Mon Tue Wed Thu Fri Racks H01 & H02 being tested with cgroups (special queue) ... Jan13

10

Yellowstone  Water  Cooling  

Page 11: HighlightsofEarlyFacilityExperienceatthe NewHighEfficiencyNCAR … · 2020-05-22 · Sat Sun Mon Tue Wed Thu Fri Racks H01 & H02 being tested with cgroups (special queue) ... Jan13

11

IT  Load  and  Facility  Tuning    Increasing�IT�Load�Improves�PUE

0

100

200

300

400

500

600

700

800

900

1000

1100

1200

1300

1400

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8

3

3.2

5/12 6/12 7/12 8/12 9/12 10/12 11/12 12/12 1/13 2/13 3/13

IT�Loa

d�(kW)

PUE

Time�(month/year)

PUE�vs.�IT�Load

PUE

PUE�Average

IT�Load�Avg

IT�Loa

d

PUE

Page 12: HighlightsofEarlyFacilityExperienceatthe NewHighEfficiencyNCAR … · 2020-05-22 · Sat Sun Mon Tue Wed Thu Fri Racks H01 & H02 being tested with cgroups (special queue) ... Jan13

12

Actual  Annual  PUE  Trending  Toward  Design  PUE  Actual�Annual�Data�Center�PUE

Trending�Toward�Design�PUE

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8

3

3.2

5/12 6/12 7/12 8/12 9/12 10/12 11/12 12/12 1/13 2/13 3/13

PUE

Time�(month/year)

NWSC Data�Center�PUE

PUE

Integral�AVG

1.22PUE

1.09PUE

Page 13: HighlightsofEarlyFacilityExperienceatthe NewHighEfficiencyNCAR … · 2020-05-22 · Sat Sun Mon Tue Wed Thu Fri Racks H01 & H02 being tested with cgroups (special queue) ... Jan13

13

Two  Main  Sources  of  Variability  

•  Computer  Load  Variability  – Power  saving  computer  system  at  scale  – Peak  Demand  ~1700W  – Typical  ~700kW  –  1500kW  –  Idle  ~300kW  

•  Weather  Variability  – Hea?ng  Load  increase  in  winter  months  affects  PUE  

– Real  cost  savings  in  part  dependent  on  natural  gas  prices  

Page 14: HighlightsofEarlyFacilityExperienceatthe NewHighEfficiencyNCAR … · 2020-05-22 · Sat Sun Mon Tue Wed Thu Fri Racks H01 & H02 being tested with cgroups (special queue) ... Jan13

14

NCAR Data-Centric Architecture

DAV  Resource  “geyser”  &  “caldera”  

Data  Transfer  Services  

CFDS  Resource  “glade”  

Science  Gateways  RDA,  ESG  

HPC  Resource  “yellowstone”  

High  Bandwidth  Low  Latency  HPC  and  I/O  Networks  FDR  InfiniBand  and  10Gb  Ethernet  

Data  CollecLons  Project  Spaces  

Scratch  User  &  Home  

Archive  Interface  

Partner  Sites   XSEDE  Sites  Remote  Vis  

1Gb/10Gb  Ethernet  (40Gb+  future)  

1.50  PF  Peak  29  bluefire-­‐equivalents  

NCAR  HPSS  Archive  100  PB  capacity  ~15  PB/yr  growth  

login   Filesystem  servers  &  IO  aggregators  

>90  GB/sec  aggregate  2012:  11  PB  2014:  16.4  PB  

Page 15: HighlightsofEarlyFacilityExperienceatthe NewHighEfficiencyNCAR … · 2020-05-22 · Sat Sun Mon Tue Wed Thu Fri Racks H01 & H02 being tested with cgroups (special queue) ... Jan13

15

NCAR  HPC  Profile  

0.0

0.5

1.0

1.5

Jan-04 Jan-05 Jan-06 Jan-07 Jan-08 Jan-09 Jan-10 Jan-11 Jan-12 Jan-13 Jan-14 Jan-15 Jan-16

IBM iDataPlex/FDR-IB (yellowstone)

Cray XT5m (lynx)

IBM Power 575/32 (128) POWER6/DDR-IB (bluefire)

IBM p575/16 (112) POWER5+/HPS (blueice)

IBM p575/8 (78) POWER5/HPS (bluevista)

IBM BlueGene/L (frost)

IBM POWER4/Colony (bluesky)

bluesky

bluefire

frost

lynx

yellowstone

bluevistablueice

frost upgrade

Page 16: HighlightsofEarlyFacilityExperienceatthe NewHighEfficiencyNCAR … · 2020-05-22 · Sat Sun Mon Tue Wed Thu Fri Racks H01 & H02 being tested with cgroups (special queue) ... Jan13

16

Historical  Power  Efficiency  on  NCAR  Workload  

0

10

20

30

40

50

60

70

Jan-00 Jan-01 Jan-02 Jan-03 Jan-04 Jan-05 Jan-06 Jan-07 Jan-08 Jan-09 Jan-10 Jan-11 Jan-12 Jan-13

Power Consumption (sustained MFLOP per Watt)

IBM iDataPlex/FDR-IB (yellowstone)

Cray XT5m (lynx)

IBM POWER6 Power575 (bluefire)

IBM POWER5+ p575 (blueice)

IBM POWER5 p575 (bluevista)

IBM BlueGene/L (frost)

IBM AMD/Opteron Linux

IBM POWER4 p690

SGI Origin3800

Compaq ES40

IBM POWER3 WH-2

SGI Origin2000

SGI Origin2000

HP SPP-2000

Cray J90

Cray  J90~0.2 sus  MFLOPs/Watt

Yellowstone~43 sus  MFLOPs/Watt

Lynx~14 sus  MFLOPs/Watt

Bluefire~6 sus  MFLOPs/Watt

Bluevista  /  Blueice~1.4 sus  MFLOPs/Watt

Frost~8 sus  MFLOPs/Watt

Page 17: HighlightsofEarlyFacilityExperienceatthe NewHighEfficiencyNCAR … · 2020-05-22 · Sat Sun Mon Tue Wed Thu Fri Racks H01 & H02 being tested with cgroups (special queue) ... Jan13

17

Beyond  PUE    Highly  Dynamic  Compute  System  Loads  

Page 18: HighlightsofEarlyFacilityExperienceatthe NewHighEfficiencyNCAR … · 2020-05-22 · Sat Sun Mon Tue Wed Thu Fri Racks H01 & H02 being tested with cgroups (special queue) ... Jan13

18

System  U?liza?on  Yellowstone

Avg Availability98.02%

Avg Utilization84.82%

Avg AdvRes0.97%

Geyser

Avg Availability78.95%

Avg Utilization11.63%

Avg AdvRes0.00%

Caldera

Avg Availability87.50%

Avg Utilization17.59%

Avg AdvRes0.00%

0%

20%

40%

60%

80%

100%

04/0

6

04/0

7

04/0

8

04/0

9

04/1

0

04/1

1

04/1

2

04/1

3

Yellowstone Availability & Utilization 4518 nodes, 16 CPU/node

%Avail-%AdvRes %Availability

%Utilization

Sun Mon Tue Wed Thu Fri Sat

Racks H01 & H02 being tested with cgroups (special queue)

Racks H01 & H02 returned to production

04/10 15:00

0%

20%

40%

60%

80%

100%

04/0

6

04/0

7

04/0

8

04/0

9

04/1

0

04/1

1

04/1

2

04/1

3

Caldera (GPU DAV) Availability & Utilization 16 nodes, 16 CPUs/node

%Avail-%AdvRes %Availability

%Utilization

Sun Mon Tue Wed Thu Fri Sat

0%

20%

40%

60%

80%

100%

04/0

6

04/0

7

04/0

8

04/0

9

04/1

0

04/1

1

04/1

2

04/1

3

Geyser (Large Memory DAV) Availability & Utilization 16 nodes, 40 CPUs/node

%Avail-%AdvRes %Availability

%Utilization

Sun Mon Tue Wed Thu Fri Sat

'sacks' AdvRes geyser09

Input queue emptied

Erebus

Avg Availability95.57%

Avg Utilization29.67%

Avg AdvRes0.00%

0%

20%

40%

60%

80%

100%

04/0

6

04/0

7

04/0

8

04/0

9

04/1

0

04/1

1

04/1

2

04/1

3

Erebus (AMPS) Availability & Utilization 84 nodes, 16 CPUs/node

%Avail-%AdvRes %Availability

%Utilization

Sun Mon Tue Wed Thu Fri Sat

Page 19: HighlightsofEarlyFacilityExperienceatthe NewHighEfficiencyNCAR … · 2020-05-22 · Sat Sun Mon Tue Wed Thu Fri Racks H01 & H02 being tested with cgroups (special queue) ... Jan13

19

Summer  Mechanical  Trend  

0"

50"

100"

150"

200"

250"

300"

350"

0.0" 2.0" 4.0" 6.0" 8.0" 10.0" 12.0" 14.0" 16.0" 18.0" 20.0"

Mech%Load

%[kW]%

WB%Temp%[deg%C]%

WB%vs%Mech%Load%

Page 20: HighlightsofEarlyFacilityExperienceatthe NewHighEfficiencyNCAR … · 2020-05-22 · Sat Sun Mon Tue Wed Thu Fri Racks H01 & H02 being tested with cgroups (special queue) ... Jan13

20

An  Ini?al  Fall  Surprise  (Mechanical  Load  (kw)  vs.  WetBulb  (F)  

50#

100#

150#

200#

250#

300#

350#

10# 15# 20# 25# 30# 35# 40# 45# 50# 55# 60# 65# 70# 75# 80#

Mechanical)(kw))

Page 21: HighlightsofEarlyFacilityExperienceatthe NewHighEfficiencyNCAR … · 2020-05-22 · Sat Sun Mon Tue Wed Thu Fri Racks H01 & H02 being tested with cgroups (special queue) ... Jan13

21

Looking  Closer  at  October  Data  

40 42 44 46 48

2030

4050

6070

Mechanical Systems Power Use, October 2012

Outside Relative Humidity

Out

side

Tem

pera

ture

100

150

200

250

300

kW

Page 22: HighlightsofEarlyFacilityExperienceatthe NewHighEfficiencyNCAR … · 2020-05-22 · Sat Sun Mon Tue Wed Thu Fri Racks H01 & H02 being tested with cgroups (special queue) ... Jan13

22

 $-­‐        

 $500.00    

 $1,000.00    

 $1,500.00    

 $2,000.00    

 $2,500.00    

 $3,000.00    

 $3,500.00    

Jan-­‐13   Feb-­‐13   Mar-­‐13  

Energy  Cost  p

er  Day  

 

Jan-­‐13   13-­‐Feb   13-­‐Mar  Cost  Per  Day  Gas   $14.14     $122.95     $90.76    

Cost  Per  Day  Elec   $2,870.19     $2,638.93     $2,665.76    

Comparison  of  Daily  Cost  Heat  Pumps  vs.  Boiler  

Page 23: HighlightsofEarlyFacilityExperienceatthe NewHighEfficiencyNCAR … · 2020-05-22 · Sat Sun Mon Tue Wed Thu Fri Racks H01 & H02 being tested with cgroups (special queue) ... Jan13

23

NWSC  as  a  Teaching  Laboratory  

Page 24: HighlightsofEarlyFacilityExperienceatthe NewHighEfficiencyNCAR … · 2020-05-22 · Sat Sun Mon Tue Wed Thu Fri Racks H01 & H02 being tested with cgroups (special queue) ... Jan13

24

Emerson  Hannon  Villanova  

Electrical  Engineering  

•  Energy  Efficiency    •  Modeling  of  NWSC  during  construc?on  

•  Developed  energy  model  NWSC  based  on  outside  condi?ons  

1.05  

1.1  

1.15  

1.2  

1.25  

1.3  

02   03   04   05   06   07   08   09   10   11   12  

Power  Usage  Effe

cLvene

ss  

Month  

Modeled  PUE  

Page 25: HighlightsofEarlyFacilityExperienceatthe NewHighEfficiencyNCAR … · 2020-05-22 · Sat Sun Mon Tue Wed Thu Fri Racks H01 & H02 being tested with cgroups (special queue) ... Jan13

25

Jared  Baker  University  of  Wyoming  Mechanical  Engineering  

•  Computa?onal  Fluid  Dynamics  

•  Validated  TileFlow  During  Construc?on  

•  Simulated  mul?ple  computer  vendor  configura?ons  during  procurement  selec?on  process  

Page 26: HighlightsofEarlyFacilityExperienceatthe NewHighEfficiencyNCAR … · 2020-05-22 · Sat Sun Mon Tue Wed Thu Fri Racks H01 & H02 being tested with cgroups (special queue) ... Jan13

26

Clair  Christofersen  University  of  Miami  Ohio  

Mechanical  Engineering  –  Applied  Math  

•  Compared  Jared’s  Model  to  Actual  Condi?ons  

•  Further  Tileflow  analysis  •  Comparison  of  Hot  Aisle  

Containment  •  Temperature  pressure  

measurements  

Page 27: HighlightsofEarlyFacilityExperienceatthe NewHighEfficiencyNCAR … · 2020-05-22 · Sat Sun Mon Tue Wed Thu Fri Racks H01 & H02 being tested with cgroups (special queue) ... Jan13

27

Jason  Jones  Clemson  University  

Mechanical  Engineering  •  Energy  Efficiency  Analysis  &  Measurement  •  Comparison  of  Actual  to  Model  •  Fan  wall  energy  analysis  •  Mechanical  response  to  ambient  

condi?ons  

Page 28: HighlightsofEarlyFacilityExperienceatthe NewHighEfficiencyNCAR … · 2020-05-22 · Sat Sun Mon Tue Wed Thu Fri Racks H01 & H02 being tested with cgroups (special queue) ... Jan13

28

Ademola  Olrinde  Texas  A&M  Kingsville  

Mechanical  Engineering  

•  Energy  Efficiency  Op?miza?on  of  NWSC  •  Detailed  Analysis  of  12  month  opera?ng  

Data  •  Op?miza?on  Efforts  

–  UPS  Systems  –  Heat  Pumps  

•  Graduate  Work  in  Pipe  Flow  and  Pumping  Power  

Page 29: HighlightsofEarlyFacilityExperienceatthe NewHighEfficiencyNCAR … · 2020-05-22 · Sat Sun Mon Tue Wed Thu Fri Racks H01 & H02 being tested with cgroups (special queue) ... Jan13

29

Now  Focused  on  the  Le^  Side  of  PUE  Decimal  

Page 30: HighlightsofEarlyFacilityExperienceatthe NewHighEfficiencyNCAR … · 2020-05-22 · Sat Sun Mon Tue Wed Thu Fri Racks H01 & H02 being tested with cgroups (special queue) ... Jan13

30

"Stochas?cally  Robust  Resource  Alloca?on  for  Thermal-­‐Aware  Heterogeneous  Compu?ng”  

•  Colorado  State  University  –  Sudeep  Pasricha  –   HJ  Siegel  –  Mark  Oxely  

•  NSF  Funded  Project  •  Looking  at  energy  aware  

resource  scheduling  

NSF  Funded  Proposal  

Page 31: HighlightsofEarlyFacilityExperienceatthe NewHighEfficiencyNCAR … · 2020-05-22 · Sat Sun Mon Tue Wed Thu Fri Racks H01 & H02 being tested with cgroups (special queue) ... Jan13

31

"Op?mized  Energy  Aware  Scien?fic  Throughput  at  Scale"  

NSF  Proposal  Stage  •  NCAR  

–  Aaron  Andersen  –  Stephen  Sain  

•  Colorado  State  University  –  Sudeep  Pasricha  –  HJ  Siegel  

•  Georgia  Ins?tute  of  Technology  –  Yogendra  Joshi  

•  Power  Aware  Scheduling  •  Power  Aware  Data  Movement  •  Sta?s?cal  Approaches  to  Scale  

PI serverData collection and control

system

Building management systemCRAC fan speed control and return

air temperature control

Data acquisition system Air inlet and exit temperatures, fan

speed, humidity, server power

PIV Data BaseServer fan speed and air flow

characterization

Server compute load allocationData center virtualization

POD based room level air flow and compute load allocation

Data inflow to PI server  

Data outflow from PI server