partnership initiative computational science –using hpc ... · walberla ibm 128000lattice...

1

PartnershipInitiativeComputationalScience– UsingHPCefficiently

DieterKranzlmüller

Munich NetworkManagementTeamLudwig-Maximilians-UniversitätMünchen(LMU)&LeibnizSupercomputing Centre (LRZ)of the Bavarian Academy of Sciences and Humanities

FromMichaelResch‘s Talk

n Prof.Resch:

– „Educationand TrainingHPCand the challenges we see inHPC“

– „It is aquestion of algorithms,of methods,of how you use the system“

– „For asuccessful simulation,you need to know computer science,mathematics,and the respective field of application“

– „FocusatHLRSonengineering“

– „Trainingis necessary!“

D.Kranzlmüller SSIP2017Workshop 2

2

LeibnizSupercomputingCentreoftheBavarianAcademyofSciencesandHumanities


Withapprox.230employeesformorethan100.000studentsandformorethan30.000employeesincluding8.500scientists

• EuropeanSupercomputingCentre• National SupercomputingCentreè GCS

• RegionalComputerCentreforallBavarianUniversities• ComputerCentreforallMunichUniversities

Photo:ErnstGraf

SuperMUC Phase 1 + 2


3

LRZBuildingExtension


Picture: Ernst A. Graf

Picture: Horst-Dieter Steinhöfer

Figure: Herzog+Partner für StBAM2 (staatl. Hochbauamt München 2)

I/O

nodes

NAS80 Gbit/s

18 Thin node islands (each >8000 cores)

1 Fat node island (8200 cores) è SuperMIG

$HOME1.5 PB / 10 GB/s

Snapshots/Replika1.5 PB

(separate fire section)

non blocking non blocking

pruned tree (4:1)

SB-EP16 cores/node

2 GB/core

WM-EX40cores/node6.4 GB/core

10 PB

200 GB/s

GPFS for$WORK

$SCRATCH

InternetAchive and Backup ~ 30 PB

Desaster Recovery Site

Compute nodes Compute nodes

SuperMUCArchitecture


4

PowerConsumptionatLRZ


0

5.000

10.000

15.000

20.000

25.000

30.000

35.000St

rom

verb

rauc

h in

MW

h

CoolingSuperMUC


5

SuperMUCPhase2@LRZ

SSIP2017Workshop

HighEnergyEfficiencyü UsageofIntelXeonE52697v3processorsü Directliquidcooling

- 10%poweradvantageoveraircooledsystem- 25%poweradvantageduetochiller-lesscooling

Photos:TorstenBloth,Lenovo

ü Energy-aware scheduling- 6%poweradvantage- ~40%poweradvantage- Totalannualsavingsof~2Mio.€

forSuperMUCPhase1and2D.Kranzlmüller 9Slide:HerbertHuber

Increasing numbers

Date System Flop/s Cores

2000 HLRB-I 2 Tflop/s 1512

2006 HLRB-II 62 Tflop/s 9728

2012 SuperMUC 3200 Tflop/s 155656

2015 SuperMUC Phase II 3.2 + 3.2 Pflop/s 229960


6

SuperMUC Jobsize 2015 (in Cores)

SSIP2017WorkshopD.Kranzlmüller

11

Challenges onExtremeScale Systems

n Size:number of cores >100.000

n Complexity/Heterogeneity

n Reliability/Resilience

n Energy consumption as part of TotalCost of Ownership(TCO)– Executecodes with optimalpowerconsumption

(or within acertain powerband)è Frequency scaling– Optimize for energy-to-solution

è Allow more codes within given budget– Improved performance

è (inmost cases)improved energy-to-solution


7

1st LRZ Extreme Scale Workshop

n July2013:1st LRZExtremeScaleWorkshop

n Participants:– 15internationalprojects

n Prerequisites:– Successfulrunon4islands(32768cores)

n ParticipatingGroups(Softwarepackages):– LAMMPS,VERTEX,GADGET,WaLBerla,BQCD,Gromacs,APES,SeisSol,CIAO

n Successfulresults(>64000Cores):– InvitedtoparticipateinPARCOConference(Sept.2013)includinga

publicationoftheirapproach


1st LRZ Extreme Scale Workshop

n Regular SuperMUC operation– 4 Islands maximum– Batch scheduling system

n Entire SuperMUC reserved 2,5 days for challenge:– 0,5 Days for testing– 2 Days for executing– 16 (of 19) Islands available

n Consumed computing time for all groups:– 1 hour of runtime = 130.000 CPU hours– 1 year in total


8

Results (Sustained TFlop/s on 128000 cores)


Name MPI #cores Description TFlop/s/island TFlop/smaxLinpack IBM 128000 TOP500 161 2560Vertex IBM 128000 PlasmaPhysics 15 245GROMACS IBM,Intel 64000 MolecularModelling 40 110Seissol IBM 64000 Geophysics 31 95waLBerla IBM 128000 LatticeBoltzmann 5.6 90LAMMPS IBM 128000 MolecularModelling 5.6 90APES IBM 64000 CFD 6 47BQCD Intel 128000 QuantumPhysics 10 27

Extreme Scaling Continued

n Lessons learned è Stability and scalability

n LRZ Extreme Scale Benchmark Suite (LESS) will be available in two versions: public and internal

n All teams will have the opportunity to run performance benchmarks after upcoming SuperMUC maintenances

n 2nd LRZ Extreme Scaling Workshop è 2-5 June 2014– Full system production runs on 18 islands with sustained Pflop/s

(4h SeisSol, 7h Gadget)– 4 existing + 6 additional full system applications– High I/O bandwidth in user space possible (66 GB/s of 200 GB/s max)– Important goal: minimize energy*runtime (3-15 W/core)

n 3rd Extreme Scale-Out with new SuperMUC Phase 2


9

ExtremeScale-OutSuperMUC Phase2

n 12May– 12June2015(30days)n SelectedGroupof EarlyUsers

n Nightly Operation:general queue max 3islandsn Daytime Operation:special queue max 6islands (full system)

n Totalavailable:63,432,000core hoursn Totalused:43,758,430core hours (Utilisation:68.98%)

Lessons learned (2015):n Preparation is everythingn Finding Heisenbugs is difficultn MPIis at its limitsn Hybrid(MPI+OpenMP)is the way to gon I/Olibraries getting even more important


4thExtremeScale Workshop2016

• 4DayWorkshop(29February – 3March2016)• 13Projects:

• 147,456cores in9216Nodes• 14.1Mio CPUh• MaxTimeperJob6h• Dailyand nightly operation mode

Application Field Institution PI

1 INDEXA CDF TUMünchen M.Kronbichler

2 MPAS Climate Science KIT D.Heinzeller

3 Inhouse MaterialScience TUDresden F.Ortmann

4 HemeLB LifeScience UC London P.Coveney

5 KPM Chemistry FAU Erlangen M.Kreutzer

6 SWIFT Cosmology UDurham M.Schaller

7 LISO CFD TUDarmstadt S.Kraheberger

8 ILDBC Lattice Boltzmann FAU Erlangen M.Wittmann

9 Walberla Lattice Boltzmann FAU Erlangen Ch.Godenschwager

10 GASPI Framework ITWM Kaiserslautern M.Kühn

11 GADGET Cosmology LMUMünchen K.Dolag

12 VERTEX Astrophysics MPI for Astrophysics T.Melson

13 PSC Plasma LMUMünchen K.Bamberg


10

VERTEX:SimulationCodeforSupernove Explosions (plasma+neutrino dynamics)

A.Marekand Team(MaxPlanck-InstituteforAstrophysics,Garching)


COMPATè CompBioMed (PeterCoveney,UCL)

n Molecular Modelling Simulationrunning onallcores of SuperMUCPhase1+2

n Dockingsimulation of potentials drugs for breast cancern Goal:Ademonstration of feasibility with the powerof high

performance computingn 37hours totalrun timen 241,672coresn 8.900.000CPUhoursn Toolsdeveloped inEUProjectsMAPPERand COMPAT:

http://www.compat-project.eu/

n Lessons learnedè CompBioMed,aCentre of ExcellenceinComputational Biomedicinehttp://www.compbiomed.eu


11

Summaryfrom the4thLRZExtremeScale Workshop

Lesson learned

1 Although Phase1is awell running system for some timenow (since 2011)stillsome quirks and problems inthe system have been found and havebeen fixed.

2 Themain focus was onthe performance optimization of the user codesand lead to great results.

3 One code wasusing Phase1+Phase2 for the first time.(for 37h allavailable241,672cores)

4 Applications from JUQUEEN and PizDaint show how the general purposearchitecture of SuperMUCcompares to specialized architectures likeGPUsor BlueGene.

5 Application codes now reach the Pflop/srange.


ExtremeScaling - Conclusions

n Thenumberofcomputecores,thecomplexity(andheterogeneity)issteadilyincreasing– introducingnewchallenges/issues

n Usersneedtopossibilitytoreliablyexecute(andoptimize)theircodesonthefullsizemachineswithmorethan100.000cores

n TheExtremeScalingWorkshopSeries@LRZoffersanumberofincentivesforusersè NextWorkshopSpring2017

n ThelessonslearnedfromtheExtremeScalingWorkshopareveryvaluablefortheoperationofthecenter– Improveperformanceofapplicationsandenergyconsumptionduring

operations– Improvereliability/stabilityofhard- andsoftwareenvironmentunder

extremeconditions– Learnhowtousetheinfrastructureandprepareprocessesforoperations


12

PartnershipInitiativeComputationalSciencesπCS

n Individualizedservicesforselectedscientificgroups– flagshiprole– Dedicatedpoint-of-contact– Individualsupportandguidanceandtargetedtraining &education– PlanningdependabilityforusecasespecificoptimizedITinfrastructures– EarlyaccesstolatestITinfrastructure(hard- andsoftware)developments

andspecificationoffuturerequirements– AccesstoITcompetencenetworkandexpertiseatCSandMath

departmentsn Partnercontribution

– EmbeddingITexpertsinusergroups– Jointresearchprojects(includingfunding)– Scientificpartnership– equalfooting– jointpublications

n LRZbenefits– Understandingthe(currentandfuture)needsandrequirementsofthe

respectivescientificdomain– Developingfutureservicesforallusergroups– Thematicfocusing:EnvironmentalComputing

D. Kranzlmüller SSIP2017Workshop 23

piCS Cookbook

1. Choose focus topics to serve as lighthouse– Nationalagreement within GCS:LRZfocuses onEnvironment(&Energy)

2. Choose user communities– Already active atLRZ?– Notactive atLRZ?

3. Invite them for introductory piCS Workshops– Showfaces &tour– Discussion onjoint topics,requirements,interests,...

4. Establish linksbetween communities and specific points-of-contact– Whom to talk to,if there are questions?– When to talk to them?Ingeneral,as early as possible– Maybe,place people into the research groups (weekly,for acertain period)

5. Runjoint lectures (e.g.hydrometeorology and computer science)6. Apply for joint projects7. Use HPCefficiently


13


Partnership Initiative Computational Science –Using HPC efficiently

Dieter Kranzlmü[email protected]

partnership initiative computational science –using hpc ... · walberla ibm 128000lattice...

Documents