partnership initiative computational science –using hpc ... · walberla ibm 128000lattice...

13
1 Partnership Initiative Computational Science – Using HPC efficiently Dieter Kranzlmüller Munich Network Management Team Ludwig-Maximilians-Universität München (LMU) & Leibniz Supercomputing Centre (LRZ) of the Bavarian Academy of Sciences and Humanities From Michael Resch‘s Talk n Prof. Resch: – „Education and Training HPC and the challenges we see in HPC“ – „It is a question of algorithms, of methods, of how you use the system“ – „For a successful simulation, you need to know computer science, mathematics, and the respective field of application“ – „Focus at HLRS on engineering“ – „Training is necessary!“ D. Kranzlmüller SSIP 2017 Workshop 2

Upload: others

Post on 17-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Partnership Initiative Computational Science –Using HPC ... · waLBerla IBM 128000Lattice Boltzmann 5.6 90 LAMMPS IBM 128000Molecular Modelling 5.6 90 APES IBM 64000CFD 6 47 BQCD

1

PartnershipInitiativeComputationalScience– UsingHPCefficiently

DieterKranzlmüller

Munich NetworkManagementTeamLudwig-Maximilians-UniversitätMünchen(LMU)&LeibnizSupercomputing Centre (LRZ)of the Bavarian Academy of Sciences and Humanities

FromMichaelResch‘s Talk

n Prof.Resch:

– „Educationand TrainingHPCand the challenges we see inHPC“

– „It is aquestion of algorithms,of methods,of how you use the system“

– „For asuccessful simulation,you need to know computer science,mathematics,and the respective field of application“

– „FocusatHLRSonengineering“

– „Trainingis necessary!“

D.Kranzlmüller SSIP2017Workshop 2

Page 2: Partnership Initiative Computational Science –Using HPC ... · waLBerla IBM 128000Lattice Boltzmann 5.6 90 LAMMPS IBM 128000Molecular Modelling 5.6 90 APES IBM 64000CFD 6 47 BQCD

2

LeibnizSupercomputingCentreoftheBavarianAcademyofSciencesandHumanities

D.Kranzlmüller SSIP2017Workshop 3

Withapprox.230employeesformorethan100.000studentsandformorethan30.000employeesincluding8.500scientists

• EuropeanSupercomputingCentre• National SupercomputingCentreè GCS

• RegionalComputerCentreforallBavarianUniversities• ComputerCentreforallMunichUniversities

Photo:ErnstGraf

SuperMUC Phase 1 + 2

D.Kranzlmüller SSIP2017Workshop 4

Page 3: Partnership Initiative Computational Science –Using HPC ... · waLBerla IBM 128000Lattice Boltzmann 5.6 90 LAMMPS IBM 128000Molecular Modelling 5.6 90 APES IBM 64000CFD 6 47 BQCD

3

LRZBuildingExtension

D.Kranzlmüller SSIP2017Workshop 5

Picture: Ernst A. Graf

Picture: Horst-Dieter Steinhöfer

Figure: Herzog+Partner für StBAM2 (staatl. Hochbauamt München 2)

I/O

nodes

NAS80 Gbit/s

18 Thin node islands (each >8000 cores)

1 Fat node island (8200 cores) è SuperMIG

$HOME1.5 PB / 10 GB/s

Snapshots/Replika1.5 PB

(separate fire section)

non blocking non blocking

pruned tree (4:1)

SB-EP16 cores/node

2 GB/core

WM-EX40cores/node6.4 GB/core

10 PB

200 GB/s

GPFS for$WORK

$SCRATCH

InternetAchive and Backup ~ 30 PB

Desaster Recovery Site

Compute nodes Compute nodes

SuperMUCArchitecture

D.Kranzlmüller SSIP2017Workshop 6

Page 4: Partnership Initiative Computational Science –Using HPC ... · waLBerla IBM 128000Lattice Boltzmann 5.6 90 LAMMPS IBM 128000Molecular Modelling 5.6 90 APES IBM 64000CFD 6 47 BQCD

4

PowerConsumptionatLRZ

D.Kranzlmüller SSIP2017Workshop 7

0

5.000

10.000

15.000

20.000

25.000

30.000

35.000St

rom

verb

rauc

h in

MW

h

CoolingSuperMUC

D.Kranzlmüller SSIP2017Workshop 8

Page 5: Partnership Initiative Computational Science –Using HPC ... · waLBerla IBM 128000Lattice Boltzmann 5.6 90 LAMMPS IBM 128000Molecular Modelling 5.6 90 APES IBM 64000CFD 6 47 BQCD

5

SuperMUCPhase2@LRZ

SSIP2017Workshop

HighEnergyEfficiencyü UsageofIntelXeonE52697v3processorsü Directliquidcooling

- 10%poweradvantageoveraircooledsystem- 25%poweradvantageduetochiller-lesscooling

Photos:TorstenBloth,Lenovo

ü Energy-aware scheduling- 6%poweradvantage- ~40%poweradvantage- Totalannualsavingsof~2Mio.€

forSuperMUCPhase1and2D.Kranzlmüller 9Slide:HerbertHuber

Increasing numbers

Date System Flop/s Cores

2000 HLRB-I 2 Tflop/s 1512

2006 HLRB-II 62 Tflop/s 9728

2012 SuperMUC 3200 Tflop/s 155656

2015 SuperMUC Phase II 3.2 + 3.2 Pflop/s 229960

D.Kranzlmüller SSIP2017Workshop 10

Page 6: Partnership Initiative Computational Science –Using HPC ... · waLBerla IBM 128000Lattice Boltzmann 5.6 90 LAMMPS IBM 128000Molecular Modelling 5.6 90 APES IBM 64000CFD 6 47 BQCD

6

SuperMUC Jobsize 2015 (in Cores)

SSIP2017WorkshopD.Kranzlmüller

11

Challenges onExtremeScale Systems

n Size:number of cores >100.000

n Complexity/Heterogeneity

n Reliability/Resilience

n Energy consumption as part of TotalCost of Ownership(TCO)– Executecodes with optimalpowerconsumption

(or within acertain powerband)è Frequency scaling– Optimize for energy-to-solution

è Allow more codes within given budget– Improved performance

è (inmost cases)improved energy-to-solution

D.Kranzlmüller SSIP2017Workshop 12

Page 7: Partnership Initiative Computational Science –Using HPC ... · waLBerla IBM 128000Lattice Boltzmann 5.6 90 LAMMPS IBM 128000Molecular Modelling 5.6 90 APES IBM 64000CFD 6 47 BQCD

7

1st LRZ Extreme Scale Workshop

n July2013:1st LRZExtremeScaleWorkshop

n Participants:– 15internationalprojects

n Prerequisites:– Successfulrunon4islands(32768cores)

n ParticipatingGroups(Softwarepackages):– LAMMPS,VERTEX,GADGET,WaLBerla,BQCD,Gromacs,APES,SeisSol,CIAO

n Successfulresults(>64000Cores):– InvitedtoparticipateinPARCOConference(Sept.2013)includinga

publicationoftheirapproach

D.Kranzlmüller SSIP2017Workshop 13

1st LRZ Extreme Scale Workshop

n Regular SuperMUC operation– 4 Islands maximum– Batch scheduling system

n Entire SuperMUC reserved 2,5 days for challenge:– 0,5 Days for testing– 2 Days for executing– 16 (of 19) Islands available

n Consumed computing time for all groups:– 1 hour of runtime = 130.000 CPU hours– 1 year in total

D.Kranzlmüller SSIP2017Workshop 14

Page 8: Partnership Initiative Computational Science –Using HPC ... · waLBerla IBM 128000Lattice Boltzmann 5.6 90 LAMMPS IBM 128000Molecular Modelling 5.6 90 APES IBM 64000CFD 6 47 BQCD

8

Results (Sustained TFlop/s on 128000 cores)

D.Kranzlmüller SSIP2017Workshop 15

Name MPI #cores Description TFlop/s/island TFlop/smaxLinpack IBM 128000 TOP500 161 2560Vertex IBM 128000 PlasmaPhysics 15 245GROMACS IBM,Intel 64000 MolecularModelling 40 110Seissol IBM 64000 Geophysics 31 95waLBerla IBM 128000 LatticeBoltzmann 5.6 90LAMMPS IBM 128000 MolecularModelling 5.6 90APES IBM 64000 CFD 6 47BQCD Intel 128000 QuantumPhysics 10 27

Extreme Scaling Continued

n Lessons learned è Stability and scalability

n LRZ Extreme Scale Benchmark Suite (LESS) will be available in two versions: public and internal

n All teams will have the opportunity to run performance benchmarks after upcoming SuperMUC maintenances

n 2nd LRZ Extreme Scaling Workshop è 2-5 June 2014– Full system production runs on 18 islands with sustained Pflop/s

(4h SeisSol, 7h Gadget)– 4 existing + 6 additional full system applications– High I/O bandwidth in user space possible (66 GB/s of 200 GB/s max)– Important goal: minimize energy*runtime (3-15 W/core)

n 3rd Extreme Scale-Out with new SuperMUC Phase 2

D.Kranzlmüller SSIP2017Workshop 16

Page 9: Partnership Initiative Computational Science –Using HPC ... · waLBerla IBM 128000Lattice Boltzmann 5.6 90 LAMMPS IBM 128000Molecular Modelling 5.6 90 APES IBM 64000CFD 6 47 BQCD

9

ExtremeScale-OutSuperMUC Phase2

n 12May– 12June2015(30days)n SelectedGroupof EarlyUsers

n Nightly Operation:general queue max 3islandsn Daytime Operation:special queue max 6islands (full system)

n Totalavailable:63,432,000core hoursn Totalused:43,758,430core hours (Utilisation:68.98%)

Lessons learned (2015):n Preparation is everythingn Finding Heisenbugs is difficultn MPIis at its limitsn Hybrid(MPI+OpenMP)is the way to gon I/Olibraries getting even more important

D.Kranzlmüller SSIP2017Workshop 17

4thExtremeScale Workshop2016

• 4DayWorkshop(29February – 3March2016)• 13Projects:

• 147,456cores in9216Nodes• 14.1Mio CPUh• MaxTimeperJob6h• Dailyand nightly operation mode

Application Field Institution PI

1 INDEXA CDF TUMünchen M.Kronbichler

2 MPAS Climate Science KIT D.Heinzeller

3 Inhouse MaterialScience TUDresden F.Ortmann

4 HemeLB LifeScience UC London P.Coveney

5 KPM Chemistry FAU Erlangen M.Kreutzer

6 SWIFT Cosmology UDurham M.Schaller

7 LISO CFD TUDarmstadt S.Kraheberger

8 ILDBC Lattice Boltzmann FAU Erlangen M.Wittmann

9 Walberla Lattice Boltzmann FAU Erlangen Ch.Godenschwager

10 GASPI Framework ITWM Kaiserslautern M.Kühn

11 GADGET Cosmology LMUMünchen K.Dolag

12 VERTEX Astrophysics MPI for Astrophysics T.Melson

13 PSC Plasma LMUMünchen K.Bamberg

D.Kranzlmüller SSIP2017Workshop 18

Page 10: Partnership Initiative Computational Science –Using HPC ... · waLBerla IBM 128000Lattice Boltzmann 5.6 90 LAMMPS IBM 128000Molecular Modelling 5.6 90 APES IBM 64000CFD 6 47 BQCD

10

VERTEX:SimulationCodeforSupernove Explosions (plasma+neutrino dynamics)

A.Marekand Team(MaxPlanck-InstituteforAstrophysics,Garching)

D.Kranzlmüller SSIP2017Workshop 19

COMPATè CompBioMed (PeterCoveney,UCL)

n Molecular Modelling Simulationrunning onallcores of SuperMUCPhase1+2

n Dockingsimulation of potentials drugs for breast cancern Goal:Ademonstration of feasibility with the powerof high

performance computingn 37hours totalrun timen 241,672coresn 8.900.000CPUhoursn Toolsdeveloped inEUProjectsMAPPERand COMPAT:

http://www.compat-project.eu/

n Lessons learnedè CompBioMed,aCentre of ExcellenceinComputational Biomedicinehttp://www.compbiomed.eu

D.Kranzlmüller SSIP2017Workshop 20

Page 11: Partnership Initiative Computational Science –Using HPC ... · waLBerla IBM 128000Lattice Boltzmann 5.6 90 LAMMPS IBM 128000Molecular Modelling 5.6 90 APES IBM 64000CFD 6 47 BQCD

11

Summaryfrom the4thLRZExtremeScale Workshop

Lesson learned

1 Although Phase1is awell running system for some timenow (since 2011)stillsome quirks and problems inthe system have been found and havebeen fixed.

2 Themain focus was onthe performance optimization of the user codesand lead to great results.

3 One code wasusing Phase1+Phase2 for the first time.(for 37h allavailable241,672cores)

4 Applications from JUQUEEN and PizDaint show how the general purposearchitecture of SuperMUCcompares to specialized architectures likeGPUsor BlueGene.

5 Application codes now reach the Pflop/srange.

D.Kranzlmüller SSIP2017Workshop 21

ExtremeScaling - Conclusions

n Thenumberofcomputecores,thecomplexity(andheterogeneity)issteadilyincreasing– introducingnewchallenges/issues

n Usersneedtopossibilitytoreliablyexecute(andoptimize)theircodesonthefullsizemachineswithmorethan100.000cores

n TheExtremeScalingWorkshopSeries@LRZoffersanumberofincentivesforusersè NextWorkshopSpring2017

n ThelessonslearnedfromtheExtremeScalingWorkshopareveryvaluablefortheoperationofthecenter– Improveperformanceofapplicationsandenergyconsumptionduring

operations– Improvereliability/stabilityofhard- andsoftwareenvironmentunder

extremeconditions– Learnhowtousetheinfrastructureandprepareprocessesforoperations

D.Kranzlmüller SSIP2017Workshop 22

Page 12: Partnership Initiative Computational Science –Using HPC ... · waLBerla IBM 128000Lattice Boltzmann 5.6 90 LAMMPS IBM 128000Molecular Modelling 5.6 90 APES IBM 64000CFD 6 47 BQCD

12

PartnershipInitiativeComputationalSciencesπCS

n Individualizedservicesforselectedscientificgroups– flagshiprole– Dedicatedpoint-of-contact– Individualsupportandguidanceandtargetedtraining &education– PlanningdependabilityforusecasespecificoptimizedITinfrastructures– EarlyaccesstolatestITinfrastructure(hard- andsoftware)developments

andspecificationoffuturerequirements– AccesstoITcompetencenetworkandexpertiseatCSandMath

departmentsn Partnercontribution

– EmbeddingITexpertsinusergroups– Jointresearchprojects(includingfunding)– Scientificpartnership– equalfooting– jointpublications

n LRZbenefits– Understandingthe(currentandfuture)needsandrequirementsofthe

respectivescientificdomain– Developingfutureservicesforallusergroups– Thematicfocusing:EnvironmentalComputing

D. Kranzlmüller SSIP2017Workshop 23

piCS Cookbook

1. Choose focus topics to serve as lighthouse– Nationalagreement within GCS:LRZfocuses onEnvironment(&Energy)

2. Choose user communities– Already active atLRZ?– Notactive atLRZ?

3. Invite them for introductory piCS Workshops– Showfaces &tour– Discussion onjoint topics,requirements,interests,...

4. Establish linksbetween communities and specific points-of-contact– Whom to talk to,if there are questions?– When to talk to them?Ingeneral,as early as possible– Maybe,place people into the research groups (weekly,for acertain period)

5. Runjoint lectures (e.g.hydrometeorology and computer science)6. Apply for joint projects7. Use HPCefficiently

D.Kranzlmüller SSIP2017Workshop 24

Page 13: Partnership Initiative Computational Science –Using HPC ... · waLBerla IBM 128000Lattice Boltzmann 5.6 90 LAMMPS IBM 128000Molecular Modelling 5.6 90 APES IBM 64000CFD 6 47 BQCD

13

D.Kranzlmüller SSIP2017Workshop 25

Partnership Initiative Computational Science –Using HPC efficiently

Dieter Kranzlmü[email protected]