erlangen regional computing center [rrze] · 2016-10-27 · cf. letter of zuv from 19.11.2014 ......

39
ERLANGEN REGIONAL COMPUTING CENTER [ RRZE ] HPC-Bedarf und HPC-Strategie 2020 RRZE-Campustreffen, 18.06.2015 Dr. Thomas Zeiser / Prof. Dr. Gerhard Wellein, RRZE

Upload: others

Post on 08-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

ERLANGEN REGIONAL COMPUTING CENTER [RRZE]

HPC-Bedarf und HPC-Strategie 2020 RRZE-Campustreffen, 18.06.2015 Dr. Thomas Zeiser / Prof. Dr. Gerhard Wellein, RRZE

Page 2: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

PART 1: OPERATIONAL TOPICS

Page 3: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

3

§  HPC systems will soon be provisioned through IdM à No HPC account without a valid IdM affiliation! à Expiration date of the HPC account may be shorter than

requested if there is no IdM affiliation with long enough duration. Long term plans: (details are not fixed yet) §  Work flow process to request HPC access instead of papers. §  Automatic prolongation of HPC accounts of staff members as

their employment gets extended (up to the duration of the HPC project)

Coupling of HPC to IdM

18.05.2015 | HPC-Bedarf und HPC-Strategie 2020 | Dr. Thomas Zeiser

Page 4: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

4

§  Do not expect that RRZE will keep data of expired HPC accounts for ever. IdM will make it easier to detect orphaned HPC accounts.

§  Data may be purged 3 months after expiration of the HPC permission. (See back side of HPC form.) (ATTENTION: if your HPC account is identical to your IdM account!)

§  If data shall be transferred to a different account we need the permission of the original owner.

Expired / orphaned HPC accounts

18.05.2015 | HPC-Bedarf und HPC-Strategie 2020 | Dr. Thomas Zeiser

Page 5: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

5

§  cf. HPC-Kolloquium from October 2012 http://www.rrze.fau.de/dienste/arbeiten-rechnen/hpc/kundenbereich/HPC-Koll_301012.pdf

§  cf. letter of ZUV from 19.11.2014 http://www.zuv.fau.de/universitaet/organisation/verwaltung/zuv/verwaltungshandbuch/drittmittel/Exportkontolle_bei_Forschungsleistungen_-_BAFA_-_Au%C3%9Fenwirtschaftsgesetz.pdf

Official legal information is only available from Bundesamt für Wirtschaft und Ausfuhrkontrolle (BAFA)

§  http://www.ausfuhrkontrolle.info/ausfuhrkontrolle/de

Some readable notes: §  http://www.bmbf.de/pub/supercomputer_und_exportkontrolle.pdf §  http://www.bafa.de/ausfuhrkontrolle/de/arbeitshilfen/merkblaetter/merkblatt_tt.pdf

Reminder: Export control & non-proliferation HPC systems and research are Dual Use goods

18.05.2015 | HPC-Bedarf und HPC-Strategie 2020 | Dr. Thomas Zeiser

Page 6: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

6

§  Woody: single node (throughput) jobs §  The last old w0xxx nodes have been switched off in 09/2014 §  w10xx (48) and w11xx (72) have modern CPUs but still 8 GB/node

§  LiMa: nodes are slowly dying due to cooling failure in 06/2014 §  Currently approx. 440 out of originally 500 nodes available

§  Emmy: business as usual – sometimes quite long queue times §  TinyGPU: GPUs of the original nodes (tg0xx) no longer supported

by latest NVidia drivers; use tg0xx for non-GPU load. §  TinyBlue, TinyFAT, Windows cluster, HPC storage: no news

HPC systems @ RRZE hardware overview / news

18.05.2015 | HPC-Bedarf und HPC-Strategie 2020 | Dr. Thomas Zeiser

Page 7: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

7

§  TinyGPU/TinyBlue/TinyFAT: §  OS upgraded from Ubuntu 12.04 to 14.04 (already in Feb./Apr.) §  Use “woody3.rrze” as Ubuntu front end

§  Woody: §  Still running SuSE SLES 11SP3 §  Might be reinstalled with Ubuntu 14.04 in the future.

§  LiMa/Emmy: §  OS upgrade from CentOS 6.x to 7.x planned for later this year.

HPC systems @ RRZE software news / plans

18.05.2015 | HPC-Bedarf und HPC-Strategie 2020 | Dr. Thomas Zeiser

Page 8: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

8

§  Inauguration of SuperMUC phase-2 will be in June/July §  Access requires scientific proposal:

https://www.lrz.de/services/compute/supermuc/projectproposal/

§  LRZ’s new Linux cluster will soon be operational §  Will consist of hardware similar to half a SuperMUC phase-2 island. §  Accounts can easily be requested through RRZE.

New HPC systems beyond Erlangen

18.05.2015 | HPC-Bedarf und HPC-Strategie 2020 | Dr. Thomas Zeiser

Page 9: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

9

§  RRZE presented a Forschungsgroßgeräteantrag at FAU’s KORA. §  There was a long discussion whether FAU can afford a new HPC

cluster every 3 years – but the proposal finally passed. §  It took DFG three months to acknowledge receipt of the proposal. §  No idea how long evaluation will take …

§  The new system will be installed at the same position as LiMa. à LiMa must be disassembled before the new system can be brought in. à Expected size: at most as large as Emmy – probably (slightly) smaller.

Next HPC system for Erlangen

18.05.2015 | HPC-Bedarf und HPC-Strategie 2020 | Dr. Thomas Zeiser

Page 10: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

10

§  Hardware: a new cluster (~2,5 Mio EUR) every 3 years è less then 1 Mio EUR/year

§  Running costs for electricity and cooling §  Average power input of all current HPC systems: >300 kW * 365x24 §  Cooling efforts: PUE > 2.0 (could be reduced but infrastructure work is pending) è almost 1 Mio EUR/year (electricity and cold water are just available from the wall socket and nobody cares [yet])

§  HPC staff at RRZE (<2,5 FTE) è less then 200 kEUR/year

How expensive are the current HPC systems @ FAU?

18.05.2015 | HPC-Bedarf und HPC-Strategie 2020 | Dr. Thomas Zeiser

Page 11: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

11

https://www.zuv.fau.de/universitaet/organisation/verwaltung/zuv/ verwaltungshandbuch/haushalt/FAU_Haushaltsplan.pdf

18.05.2015 | HPC-Bedarf und HPC-Strategie 2020 | Dr. Thomas Zeiser

Page 12: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

PART 2: “WHAT NEXT?”

Page 13: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

13

§  RRZE: Install a new HPC cluster every 3 years à Art. 91b §  At least two systems are operated concurrently §  Between 20% and 50% of investment cost contributed directly

by scientists (“Rezentralisierung”)

What next: Clusters at RRZE: 2003 – 2013

18.05.2015 | HPC-Bedarf und HPC-Strategie 2020 | Prof. Gerhard Wellein

Page 14: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

14

§  Proposal for new system sent to DFG (2.5 M€)

What next: Clusters at RRZE: 2003 – 2013

18.05.2015 | HPC-Bedarf und HPC-Strategie 2020 | Prof. Gerhard Wellein

Node specs #Nodes #Cores Price Peak TOP500 Year

2 x Intel Xeon 2.66 GHz; 2 GB 77 154 0.35 M€ 0.8 TFlop/s 315 2003

2 x Intel Xeon 3.0 GHz; 8 GB 182 728 1.0 M€ 8.7 TFlop/s 124 2006

2 x Intel Xeon 2.66 GHz; 24 GB 500 6.000 2.3 M€ 64 TFlop/s 130 2010

2 x Intel Xeon 2.2 GHz; 64 GB 560 11.200 2.6 M€ 234 Tflop/s 210 2013

Page 15: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

15

TOP500 – still looking good?

18.05.2015 | HPC-Bedarf und HPC-Strategie 2020 | Prof. Gerhard Wellein

Budget increase – RRZE x86 Cluster: 2003 à 2010: 6.0x 2010 à 2013: 1.1x 2013 à 2016: 0.96x

Page 16: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

16

Trends 2003 – 2013: §  Price per node (including all infrastructure): ~ 4.500-5.000 € §  Power consumption per node (CPU only) ~ 300-350 W

§  Power was not an issue for RRZE in the past (HPC systems including cooling contributed less than 5% of “Technische Fakultät” campus (“10 MW power line”))

§  Cluster nodes TOP120-130 (Nov. 2012): ~660 nodes 8c-SNB (AVX vs. SSE) à expect 1.3X-1.6X increase every 2 years to achieve TOP120-130 entry

What next?! Clusters at RRZE: 2003 – 2013

18.05.2015 | HPC-Bedarf und HPC-Strategie 2020 | Prof. Gerhard Wellein

Page 17: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

17

TOP120-130 installation in 2016: ~1.500 nodes §  Space: 20-30 Racks §  Power consumption system only (PUE=2): 0.5 MW (1 MW) §  Operating this machine is not feasible with existing infrastructure

& concept

à RRZE will not increase node counts in near future à Less competitive compute resources for science and research at FAU

What next?! Clusters at RRZE: 2003 – 2013

18.05.2015 | HPC-Bedarf und HPC-Strategie 2020 | Prof. Gerhard Wellein

Page 18: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

18

Construction of new data centers / computer rooms (since 2000):

§  RWTH Aachen §  TU Darmstadt §  TU Dresden §  Uni. Köln §  Paderborn, Siegen,…

§  Big 3: Stuttgart, Jülich, München

What’s up with the “competition”?

18.05.2015 | HPC-Bedarf und HPC-Strategie 2020 | Prof. Gerhard Wellein

Page 19: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

19

Jugglers’ tricks of computer architects: SIMD & FMA Basic limitations: §  SIMD à 512 Bit max. §  ILP à 4 FMAs useless?! (instruction throughput: 4 Instr./cyce) §  Cores*Clock à Next slide à Accelerator?! (particulary good juggler)

What’s up with the hardware?

18.05.2015 | HPC-Bedarf und HPC-Strategie 2020 | Prof. Gerhard Wellein

SIMD ILP Cores*Clock (RRZE) [cores x GHz]

Woodcrest 128 Bit MULT + ADD 6 Westmere 128 Bit MULT + ADD 16 IvyBridge 256 Bit MULT + ADD 22 Haswell 256 Bit FMA + FMA 32

Page 20: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

20

Data: http://en.wikipedia.org/wiki/ à Look for specific microarchitecture Top Bin for 2-way EP servers Cores*GHz slows down ?! Price increases! Accelerators: Trade in code flexibility/quality for performance

What’s up with the hardware?

18.05.2015 | HPC-Bedarf und HPC-Strategie 2020 | Prof. Gerhard Wellein

Socket Config. Socket-Speed TDP Price [USD] Nehalem 4 cores*3.33 GHz 13.3 core*GHz 130 W 1600 Westmere 6 cores*3.46 GHz 20.8 core*GHz 130 W 1663 Sandy Bridge 8 cores*3.1 GHz 24.8 core*GHz 150 W 1885 Ivy Bridge 12 cores*2.7 GHz 32.4 core*GHz 130 W 2614 Haswell 18 cores*2.3 GHz 41.4 core*GHz 145 W > 2700

Page 21: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

21

Current status: RRZE freezes system size on LIMA + EMMY level

à  Aiming at a procurement of a 500 nodes system every 3 years à  2 systems are operated simultaneously (+HPC Storage)

à  Qualitative growth of computing power no longer possible (beyond Socket-Speed / Accelerator)

There will be a detailed survey of the HPC needs by ZISC soon!

“Status Quo” RRZE

18.05.2015 | HPC-Bedarf und HPC-Strategie 2020 | Prof. Gerhard Wellein

Page 22: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

22

§  Computing time in Erlangen is sufficient

§  Applying for compute time in München/Stuttgart/Jülich/GCS/PRACE/…

§  ISER “Acquisition” + Update of the infrastructure (>1 Mio EUR) à Potential: nodes x 2 but also 2x electricity and HW!

§  Reconstruction HPC server room or complete “RRZE”: timeline unknown RRZE-initiatives since 2010 have been unsuccessful / without progress

Alternatives on acting

18.05.2015 | HPC-Bedarf und HPC-Strategie 2020 | Prof. Gerhard Wellein

Gruschregal

Regal 1Regal 3 (Novell-Server)

1/I

3/I 3/II 3/III 3/IV 3/V

Netzschränke GWIN / XWINReihe 15 VT 113.02 VT 113.02 VT 113.02VT 113.02VT 113.02 VT 113.02 VT 113.02 VT 113.02 VT 113.02 VT 113.02 VT 113.02 HP M N O

2/IV 2/III 2/II 2/I X Y Z A B C D E Schrank 2 Schrank 1 Schrank 3 Schrank 4 MIK

rzbib03VT 113.02 VT 113.02 Regal 2 NetApp U V (Schrank) Rack 2

DCF77/GPS NTP-Server Festplatten

Klimaanlagen Schrank 7 Schrank 6 NEC (TinyFat)

RRZE Server DFN DFN DFNZUV Server Spam CRS ClusterServerschränke Regal (Schrank) 9 NEC

Reihe 4 (LiMa)

9/I 9/II 9/III 9/IV 9/V 9/VI 4/I 4/II 4/III 4/IV 4/V 4/VI ClusterHPC Cluster (2013)(aktuelle Planung: 2 Reihen, wassergekühlt) (LiMa)

10/VI 10/V 10/IV 10/III 10/II 10/I 5/VI 5/V 5/IV 5/III 5/II 5/I (Tiny- RRZE Server Regal 8RRZE Server RRZE Server GPU)

HPC HPC 8/I 8/II 8/III 8/IV 8/V 8/VI 8/VII (w10xx)

Storage Storage

Ersatz- (IBM) (IBM) Reihe 10 Reihe 5teil- 11/IX 11/VIII 11/VII 11/VI 11/V 11/IV 11/III 11/II 11/I NEC NECschrank

IBM Cluster Cluster ClusterHP Rechencluster (Woody) (Tiny- (TinyBlue)

Win) (LiMa) (LiMa)

ehemaliges Rack 6

6/I 6/II 6/III 6/IV 6/V Rastermaß=Bodenplatte: 0,6 m x 0,6m 14/I 14/II 14/III 14/IV 14/V14/VI

HPC Testschränke HPC Testschränke HPC Cluster (2013)

7/I 7/II 7/III 7/IV 7/V 7/VI

USV 3 USV USV 5 USV101 USV102 USV 6 USV 6 USV 7 Normalnetz HPC USV1 USV 2 Normal- Klima-Material und Dokumentationsschränke Wekzeug NN 2 4 HV HV Feld 4-6 Feld 1-3 NH Verteiler netz 1 anlage

Eingangstür Eingang ISER ISER Lager ISER Büro

V

N

O

P

Q

R

U

S

T

J

K

L

E

F

G

H

M

56 57 60 61 62

I

A

B

C

D

50 51 52 53 54 5541 58 5944 45 46 47 48 4935 36 37 38 39 4025 42 4328 29 30 31 32 33 3419 20 21 22 23 249 26 2712 13 14 15 16 17 18

V

1 2 3

R

S

T

U

G

H

P

10 114 5 6 7

K

L

8

M

N

O

I

A

B

C

D

E

F

Q

JISER

HPC currently

Page 23: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

ANLAGE:

Ausgewählte Folien aus der KoRa-Sitzung am 9. Feb. 2015 zur Vorstellung des Forschungsgroßgeräte-Antrags „Hochleistungscomputecluster“

Page 24: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

24

§  ein Forschungsgroßgerät (bis 5 Mio €) nach Art. 91b GG §  „Neuer HPC-Cluster insbesondere für numerische Simulation in

der Chemie und Biologie (Lebenswissenschaften) sowie den Material- und Ingenieurwissenschaften und der Geographie/Klimatologie.“

§  ein Gesamtbugdet von 2,5 Mio €

Beantragt wird …

09.02.2015 | Großgeräteantrag HPC-Cluster | [email protected]

19%  

10%  

21%  

50%  

Finanzierung    (Hardware-­‐Beschaffung)  

Berufungs-­‐/Projektzusagen  Verstärkungs-­‐mi=el  Land  FAU  (HPC-­‐Erneuerung)  

Bundes-­‐anteil  

Page 25: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

25

Auslastung der großen HPC-Systeme im Jahr 2014

09.02.2015 | Großgeräteantrag HPC-Cluster | [email protected]

Emmy-Cluster

LiMa-Cluster

Page 26: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

26

Jobgrößenverteilung auf Emmy und LiMa

09.02.2015 | Großgeräteantrag HPC-Cluster | [email protected]

1 2

3-­‐4

5-­‐89-­‐16

17-­‐32

33-­‐64

65-­‐128 >128

1 2

3-­‐4

5-­‐8

9-­‐16

17-­‐32

33-­‐64

65-­‐128>128

Emmy-Cluster Im Jahr 2014

LiMa-Cluster Im Jahr 2014

Über 50% der Rechenzeit entfällt auf parallele Jobs mit mindestens 9 Knoten (d.h. mindestens 180 bzw. 108 Kernen).

è Parallelrechner mit gutem internen Netzwerk ist zwingend notwendig

Page 27: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

27

Nutzungsanteil der wichtigsten Gruppen des Emmy und LiMa-Clusters im Jahr 2014

09.02.2015 | Großgeräteantrag HPC-Cluster | [email protected]

0%

5%

10%

15%

20%

25%

*  Inst.  Theoret.  Physik  *

Sonstige  Depertment  Physik

*  Computer  Chemie  Centrum

(CCC)  *

*  LS  Theoretische  Chemie  *

Sonstige  Dept.  Chemie

*  Professur  Computational

Biology  *

Professur  für  Bioinformatik

LS  Strömungsmechanik

*  LS  Prozessmaschinen  und

Anlagetechnik  *

Department

Werkstoffwissenschaften

RRZE/Professur  für

Hochleistungsrechnen

LS  Systemsimulation

Sonstige

Nutzungsanteil

Emmy-­‐Cluster

LiMa-­‐Cluster

Physik 10%

Gewichtet über alle Systeme

Chemie 42%

Bio 17%

Med 6%

CBI 11%

WW 3%

Inf 9%

Rest 2%

>50%

Page 28: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

28

Detaillierter Rechenzeitverbrauch im Jahr 2014

09.02.2015 | Großgeräteantrag HPC-Cluster | [email protected]

Cluster Emmy LiMa Woody TinyBlue TinyFAT TinyGPU Windows

insgesamt  abgegebene  Rechenzeit  (in  SMT-­‐Core-­‐Stunden) 156.242.374                   87.129.678                       4.220.121                       7.177.648                       520.680                               172.092                       381.557                      

insgesamt  abgegebene  Rechenzeit  (in  Knoten-­‐Stunden) 3.906.059                             3.630.403                             1.055.030                       448.603                               32.543                                   10.756                           31.796                          

Auslastung 84% 88% 81% 63% 25% 14% 23%

Wert  der  Rechenzeit  gemäß  RRZE-­‐Preisliste 3.124.847  €                           1.815.202  €                           211.006  €                             179.441  €                             19.526  €                                 4.302  €                               19.078  €                        

Nutzungsanteil

Gewichtete  HPC-­‐Nutzung

Gewichtete  HPC-­‐Nutzung  je  Fakultät

NatFak 69,0%Erlangen  Centre  Acroparticle  Physics  (ECAP) 19% <1% 2% <1% 0,8%Inst.  Theoret.  Physik 3% 11% 12% 17% 38% <1% 6,5%Sonstige  Depertment  Physik 7% 1% <1% 2,3%Computer  Chemie  Centrum  (CCC) 17% 24% 22% 80% 1% 21,6%LS  Theoretische  Chemie 18% 12% 1% 1% 43% 19% 14,7%Sonstige  Dept.  Chemie 8% 4% 1% <1% 6,1%Professur  Computational  Biology 18% 14% 29% 16,7%Sonstige  Dept.  Biologie 0% <1% 0,0%Dept.  Geographie 0,0%Dept.  Mathematik <1% <1% <1% 0,3%

MedFak 6,1%Professur  für  Bioinformatik 9% 1% <1% 1% 3% 67% 6,0%Sonstige  Med.  Fakultät <1% <1% 1% 0,1%

TechFak 24,0%LSTM 2% 9% 3% <1% 4,5%IPAT 9% 4% 2% 1% 2% 4% 6,7%Sonstige  Dept.  CBI <1% <1% 1% <1% <1% 1% 0,2%Dept.  EEI 0,0%Dept.  WW 4% 3% 2% <1% 3,3%Dept.  MB <1% 1% 8% 0,0%RRZE/Professur  für  Hochleistungsrechnen 2% 2% <1% <1% <1% 2% 1,6%LS  Systemsimulation 8% 7% <1% 7,2%Sonstige  Dept.  Informatik <1% <1% 4% <1% <1% <1% 0,4%

Wirtschaftswissenschaften 0,4%LS  Volkswirtschaftslehre 58% 0,2%LS  Statistik  und  Oekonometrie 42% 0,1%LS  Versicherungswirtschaft <1% 0,0%

PhilFak 0,0%Professur  Computerlinguistik <1% 1% 0,0%

Externe  Projekte  und  Sonstiges 0,5%Vorlesungsaufzeichnung  /  Videokodierung 9% 0,0%Externe  Projektpartner 1% <1% 2% <1% 1% 0,5%Uni  Bamberg,  HS  Coburg+Nürnberg <1% 0,0%

Cluster Emmy LiMa Woody TinyBlue TinyFAT TinyGPU Windows

gemäß  den  im  Jahr  2014  geltenden  RRZE-­‐Preislisten  entspricht  die  insgesamt  abgegebene  Rechenzeit  einem  Wert  von  über  5,3  Mio  €  (Anschaffungskosten  verteilt  auf  3  Jahre  sowie  laufende  Betriebskosten)

Page 29: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

29

Fairshare-Verteilung der letzten 10 Tage (Feb. 2014)

09.02.2015 | Großgeräteantrag HPC-Cluster | [email protected]

LiMa

Emmy

Page 30: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

30

Stellplan RRZE-Rechnerraum

09.02.2015 | Großgeräteantrag HPC-Cluster | [email protected]

Server (RRZE-Server, Hosting, Housing)

ZUV-Server

HPC-Systeme

DFN-Infrastruktur

Netz-Komponenten (RRZE)

Sicherungsschränke / Stromverteiler

Raum-Klimageräte

Regal 3 (Novell-Server)

HPC TestclusterNetzschränke GWIN / XWINReihe 15

rzbib03VT 113.02 VT 113.02 Regal 2 NetApp U V (Schrank) Rack 2

DCF77/GPS NTP-Server Festplatten

Klimaanlagen NEC NECRRZE Server DFN DFN DFN

ZUV Cluster Cluster

RRZE Server NEC Cluster II

NEC NEC

Cluster ClusterHP Rechencluster

RRZE Server in Planung RRZE Server in Planung NEC Cluster II

Rastermaß=Bodenplatte: 0,6 m x 0,6m

Eingangstür Eingang ISER LAGER RRZE ISER Büro

A

B

C

D

E

F

Q

J

M

N

O

I

P

10 114 5 6 7

K

L

8

V

1 2 3

R

S

T

U

G

H

9 26 2712 13 14 15 16 17 18 19 20 21 22 23 24 25 42 4328 29 30 31 32 33 34 35 36 37 38 39 40 41 58 5944 45 46 47 48 49

D

50 51 52 53 54 55

M

56 57 60 61 62

I

A

B

C

J

K

L

E

F

G

H

N

O

P

Q

R

U

S

T

V

1 2 3 4 5 6 7 8 23 249 10 11 12 13 14 15 16 17 18 19 20 21 22 3934 35 36 37 4025 26 27 28 29 30 31 32 33 38 55 5641 42 43 44 45 46 47 48 49 50 51 52 53 54 61 6257 58 59 60

ISER

Page 31: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

31

§  LiMa-Cluster (2010) und TinyFAT (2010) raus §  Neues System rein §  Für mindestens 2 Monate wird den Wissenschaftlern maximal

2/3 der jetzigen Rechenleistung zur Verfügung stehen

§  Zum angepeilten Installationszeitpunkt wird LiMa >5 Jahre alt sein §  Abschreibung gemäß DFG-Richtlinien über 4 Jahre

Aufstellungsort für beantragtes System

09.02.2015 | Großgeräteantrag HPC-Cluster | [email protected]

Page 32: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

32

§  Durch Ausfall der Kaltwasserversorgung in der Nacht vom 24. Juni ist es zu massiver Überhitzung von LiMa gekommen

à unmittelbare Hardware-Ausfälle à massiv beschleunigte Alterung à Ausfälle / Instabilitäten §  Stand heute:

bereits 53 von 500 Rechenknoten nicht mehr nutzbar

Aufstellungsort für beantragtes System

09.02.2015 | Großgeräteantrag HPC-Cluster | [email protected]

Page 33: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

33

Ausfall der Kaltwasserversorgung in der Nacht vom 24. Juni 2014 & Folgen

09.02.2015 | Großgeräteantrag HPC-Cluster | [email protected]

Page 34: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

34

§  Anschaffung/Erweiterung: 2006/2007 (ca. 1,5 Mio EUR) §  225 Rechenknoten mit einer Stromaufnahme von >75 kW

+ Energieaufwand für die Kühlung è Stromkosten 1 Mio kWh/a = 150k €/a

Frühzeitiger Cluster-Tausch lohnt sich finanziell für die Uni: Beispiel Woody-Cluster

09.02.2015 | Großgeräteantrag HPC-Cluster | [email protected]

Page 35: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

35

§  Abschaltung von Woody in 2013/2014 obwohl Hardware noch lief §  Ersatz durch 6 moderne, preisoptimierte Enclosure mit je zwölf

1-Socket-Systeme für seriellen Durchsatz Anschaffungspreis: 60k EUR (30k EUR durch FAU „gesponsort“)

§  Die 72 Knoten bringen für Durchsatzjobs etwa die gleiche aggregierte Rechenleistung wir die alten 225 Rechenknoten

§  Stromaufnahme < 8 kW + Energieaufwand für die Kühlung è Stromkosten 0,1 Mio kWh/a = 15k EUR/a

Frühzeitiger Cluster-Tausch lohnt sich finanziell für die Uni: Beispiel Woody-Cluster

09.02.2015 | Großgeräteantrag HPC-Cluster | [email protected]

Page 36: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

36 09.02.2015 | Großgeräteantrag HPC-Cluster | [email protected]

HPC-Systeme am RRZE: gestern, heute, morgen throughput cluster parallel cluster(s)

2006/2009

HPC storage (60 TB disk)

HPC storage HSM to tape

2009 2009

2003

“fat” nodes up to 512 GB mem

2001 / 2003 / 2010 10 Gbit HPC-Ethernet backbone

small research cluster with GPUs small cluster

running Windows HPC 2009 / 2010 / 2013 2008 2009

130@Top500 11/2010 (2,0 Mio €) 2010 2013 high-end parallel cluster

210@Top500 11/2013 (2,5 Mio €)

Page 37: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

37

§  A new HPC cluster every 3 years à Art. 91b §  At least two systems are operated concurrently §  20% - 50% of investment costs contributed by scientists

Clusters@RRZE: 2003 – 2014

09.02.2015 | Großgeräteantrag HPC-Cluster | [email protected]

Page 38: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

38

Clusters@RRZE: 2003 – 2014

09.02.2015 | Großgeräteantrag HPC-Cluster | [email protected]

Node specs #Nodes #Cores Price Peak TOP500 Year

2 x Intel Xeon 2.66 GHz; 2 GB 77 154 0.35 M€ 0.8 TFlop/s 315 2003

2 x Intel Xeon 3.0 GHz; 8 GB 182 728 1.0 M€ 8.7 TFlop/s 124 2006

2 x Intel Xeon 2.66 GHz; 24 GB 500 6.000 2.0 M€

+0.3 M€ 64 TFlop/s 130 2010

2 x Intel Xeon 2.2 GHz; 64 GB 560 11.200 2.6 M€ 234 TFlop/s 210 2013

2 x Intel Xeon 2.5 GHz; 64 GB

~300 -400

~6.000 -8.000 2.5 M€ ~200-300

TFlop/s n/a 2016

Page 39: ERLANGEN REGIONAL COMPUTING CENTER [RRZE] · 2016-10-27 · cf. letter of ZUV from 19.11.2014 ... Inauguration of SuperMUC phase-2 will be in June/July ! Access requires scientific

ERLANGEN REGIONAL COMPUTING CENTER [RRZE]

Thank you for your attention! Regionales RechenZentrum Erlangen [RRZE] Martensstraße 1, 91058 Erlangen http://www.rrze.fau.de