are we there yet? - hpc advisory council...are we there yet? experiences developing and...

29
Are we there yet? Experiences developing and commissioning the HPC System for ASKAP Telescope CSIRO ASTRONOMY AND SPACE SCIENCE Juan Carlos (JC) Guzman | Head of ATNF Software and Computing Perth HPC Advisory Council Conference – 31 July – 1 August 2017

Upload: others

Post on 02-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Are we there yet? - HPC Advisory Council...Are we there yet? Experiences developing and commissioning the HPC System for ASKAP Telescope CSIRO ASTRONOMY AND SPACE SCIENCE Juan Carlos

Arewethereyet?ExperiencesdevelopingandcommissioningtheHPCSystemforASKAPTelescope

CSIROASTRONOMYANDSPACESCIENCE

JuanCarlos(JC)Guzman|HeadofATNFSoftwareandComputingPerthHPCAdvisoryCouncilConference– 31July– 1August2017

Page 2: Are we there yet? - HPC Advisory Council...Are we there yet? Experiences developing and commissioning the HPC System for ASKAP Telescope CSIRO ASTRONOMY AND SPACE SCIENCE Juan Carlos

Arewethereyet?ExperiencesdevelopingandcommissioningtheHPCSystemforASKAPTelescope

CSIROASTRONOMYANDSPACESCIENCE

JuanCarlos(JC)Guzman|HeadofATNFSoftwareandComputingPerthHPCAdvisoryCouncilConference– 31July– 1August2017

WeacknowledgetheWajarri Yamatji peopleasthetraditionalownersoftheObservatorysiteandtheNoongar peopleasthetraditionalownersoftheland

wherethismeetingisbeingheld

Page 3: Are we there yet? - HPC Advisory Council...Are we there yet? Experiences developing and commissioning the HPC System for ASKAP Telescope CSIRO ASTRONOMY AND SPACE SCIENCE Juan Carlos

OutlineOverviewofASKAP

ASKAPComputingSystemhistory,challengesandfuture

LessonsLearned

Page 4: Are we there yet? - HPC Advisory Council...Are we there yet? Experiences developing and commissioning the HPC System for ASKAP Telescope CSIRO ASTRONOMY AND SPACE SCIENCE Juan Carlos

OutlineOverviewofASKAP

ASKAPComputingSystemhistory,challengesandfuture

Lessonslearned

Page 5: Are we there yet? - HPC Advisory Council...Are we there yet? Experiences developing and commissioning the HPC System for ASKAP Telescope CSIRO ASTRONOMY AND SPACE SCIENCE Juan Carlos

AustralianSKAPathfinder- overview• 36-antennamulti-beaminterferometerinaradio-quietzone• Frequencyrange:700MHz– 1.8GHz,baselinesfrom23mto6km• Surveyinstrument– pushingwideinstantaneousfieldofview• 2nd generationphased-arrayfeed(PAF)receiver+flexiblebeamformer• 3-axismount(wholeantennacanrotate)– canfixorrotatebeampattern• Automaticprocessingeventually– necessaryforthefullinstrument• Earlysciencewith12antennasstartedinOctober2016• MostreportedsciencewaswithBETA(6-antennaarraywithMkI PAF)• 18antennashavealreadybeenintegratedintothearray

Page 6: Are we there yet? - HPC Advisory Council...Are we there yet? Experiences developing and commissioning the HPC System for ASKAP Telescope CSIRO ASTRONOMY AND SPACE SCIENCE Juan Carlos

PhasedArrayFeed– 188singlepolreceivers

Page 7: Are we there yet? - HPC Advisory Council...Are we there yet? Experiences developing and commissioning the HPC System for ASKAP Telescope CSIRO ASTRONOMY AND SPACE SCIENCE Juan Carlos

Widefieldofview

7 |

Page 8: Are we there yet? - HPC Advisory Council...Are we there yet? Experiences developing and commissioning the HPC System for ASKAP Telescope CSIRO ASTRONOMY AND SPACE SCIENCE Juan Carlos

• 126km2

• 32kmroadsandtracks• 16000kmopticfibre• >8000fibres• ControlBuilding• Powerstation• Underconstruction

MurchisonRadioObservatory(MRO)

Page 9: Are we there yet? - HPC Advisory Council...Are we there yet? Experiences developing and commissioning the HPC System for ASKAP Telescope CSIRO ASTRONOMY AND SPACE SCIENCE Juan Carlos

MROpowerstation

Page 10: Are we there yet? - HPC Advisory Council...Are we there yet? Experiences developing and commissioning the HPC System for ASKAP Telescope CSIRO ASTRONOMY AND SPACE SCIENCE Juan Carlos

ASKAP– systemarchitecture

x36

Combineddatarate~21Tb/s

~2.5GB/s

Page 11: Are we there yet? - HPC Advisory Council...Are we there yet? Experiences developing and commissioning the HPC System for ASKAP Telescope CSIRO ASTRONOMY AND SPACE SCIENCE Juan Carlos

OutlineOverviewofASKAP

ASKAPComputingSystemhistory,challengesandfuture

Lessonslearned

Page 12: Are we there yet? - HPC Advisory Council...Are we there yet? Experiences developing and commissioning the HPC System for ASKAP Telescope CSIRO ASTRONOMY AND SPACE SCIENCE Juan Carlos

Indirectimagingofthesky

Synthesistelescopesmeasurecorrelationsbetween

receivedvoltagesforeachpairofantennas

Threedifferenttypesofimagesarerequired

Continuumimage Spectrallinecube Transientimage• Verycoarseimage• Madeevery5seconds

• Veryaccurateimage• Needmultipleiterations• Hardtoparallelize

• 16200independentimages• Eachatslightlydifferentfrequency• Embarrassinglyparalleltask• Oneiterationmaybesufficient

Weneedtomakeimagesinnearrealtime,ideallyallthreetypesinparallel

Tothefirstorder(narrowFOV),themeasurementequationisa2DFourierTransform

Page 13: Are we there yet? - HPC Advisory Council...Are we there yet? Experiences developing and commissioning the HPC System for ASKAP Telescope CSIRO ASTRONOMY AND SPACE SCIENCE Juan Carlos

Indirectimagingofthesky

Synthesistelescopesmeasurecorrelationsbetween

receivedvoltagesforeachpairofantennas

Threedifferenttypesofimagesarerequired

Continuumimage Spectrallinecube Transientimage• Verycoarseimage• Madeevery5seconds

• Veryaccurateimage• Needmultipleiterations• Hardtoparallelize

• 16200independentimages• Eachatslightlydifferentfrequency• Embarrassinglyparalleltask• Oneiterationmaybesufficient

Weneedtomakeimagesinnearrealtime,ideallyallthreetypesinparallel

Tothefirstorder(narrowFOV),themeasurementequationisa2DFourierTransform

90%ComputationCost

Page 14: Are we there yet? - HPC Advisory Council...Are we there yet? Experiences developing and commissioning the HPC System for ASKAP Telescope CSIRO ASTRONOMY AND SPACE SCIENCE Juan Carlos

ASKAPKeyComputingRequirements• 90%ComputationalCostingridding/degridding• https://www.skatelescope.org/uploaded/59116_132_Memo_Humphreys.pdf• Developedstand-alonebenchmarkinggriddingcodefortestinginmultipleplatforms

• 10,000cores(80%efficiency),4GB/core200TFLOPsPeak• DataIngestfromCorrelator~2.8GB/s=~10TB/h(RawVisibilities)• Processingofrawvisibilities(calibration&imaging)needstokeepup• Cannotaffordtokeeprawvisibilities• Multiplescienceproductsafterobservations~5PB/year

ASKAPSDP- PietroBaracchiConference|JCGuzman14 |

Page 15: Are we there yet? - HPC Advisory Council...Are we there yet? Experiences developing and commissioning the HPC System for ASKAP Telescope CSIRO ASTRONOMY AND SPACE SCIENCE Juan Carlos

ThePawsey HighPerformanceComputingCentreforSKAScience• AUD$80Msuper-computingcentre• 25%resourcestosupportoperationalrequirementsofstorageandprocessingofdatafromASKAPandMWA• ConstructioncompletedApril2013

Page 16: Are we there yet? - HPC Advisory Council...Are we there yet? Experiences developing and commissioning the HPC System for ASKAP Telescope CSIRO ASTRONOMY AND SPACE SCIENCE Juan Carlos

ASKAPCentralProcessor@Pawsey CentreIngestcluster

• 16nodes,2socketspernode• 8coresCPUs,64GbofRAMpernode

CentralProcessor(Galaxy)472xCrayXC30ComputeNodes• 200TFlop/sPeak• 64GbofRAMpernode• 2socketspernode,10coreseach

SharedstorageCraySonexion Lustre Storage• 1.3PBusable• 480x4TBDiskDrives• PeakI/Operformance:30Gb/s

Page 17: Are we there yet? - HPC Advisory Council...Are we there yet? Experiences developing and commissioning the HPC System for ASKAP Telescope CSIRO ASTRONOMY AND SPACE SCIENCE Juan Carlos

ASKAPsoft• Indevelopmentsince2007• Extensivere-useofcorelibraries• Re-writtenSynthesis(parallel)codeC++/MPI

• Assumptions• Instrumentstable(relativelyeasytocalibrate)

• Goodglobalskymodel• Imagingmodeladequate

• Automatedcalibrationandimaging(pipeline)• ASKAPisoneofthepathfindersinthisdomain(streaming+batch)

• Treatprocessingsoftwareasapartofthetelescope

• Requiresparadigmshiftinthesciencecommunity

• Commissioningrequiresdifferentthingstothefulltelescope

Calibration Pipeline Services

Small-N (e.g. Continuum) Imager Pipeline

Large-N (eg. Spectral Line) Imager Pipeline

Ingest Pipeline

UV Data

16416 Channels(18.5kHz)

UV Data

304 Channels(1MHz)

Imager(cimager)

Imager(cimager)

Source Finder/Identifier

Source Finder/Identifier

Source Catalog

Source Catalog

ccalibrator

Transient Detector Pipeline

Transient Imager

(cfimager) Images

Transient Finder/Identifier

Transient Detections

16416 Channels(18.5kHz)

Calibration Solution

~30 Channels(10MHz)

Calibration Data

Service

Sky Model Service

Light Curve Service

Image Cube

Images

ASKAP Science Processing

ASKAP-SW-0020

Version: 2.0Date: 20/12/2011Project: ASKAP

Prepared by: Tim Cornwell, Ben Humphreys, Emil Lenc, Maxim Voronkov, MatthewWhiting

Reviewed by: Ilana Feain,Review reference : Redmine issue 3280Approved by: Ilana Feain Date: 20/12/2011

Keywords: computing, science, processing

Page 18: Are we there yet? - HPC Advisory Council...Are we there yet? Experiences developing and commissioning the HPC System for ASKAP Telescope CSIRO ASTRONOMY AND SPACE SCIENCE Juan Carlos

• Smallerdatasets!• 1 TB/hr (ASKAP-12)vs10TB/hr (ASKAP-36)• Largernaturalresolution(maximumbaseline=2.18km)

• Abletodomanualprocessing– stillhard(manybeams,largecubes),buttractable• Processingteamwillrunpipelinesmanuallyuponcompletionof

observation• Neededtounderstandandlearnabouttheinstrument!!

• Somefeaturesnotavailable• Processingisnotautomated• NoSkyModelavailable,norcalibrationserviceappliediningest• Transientpipelinenotyetdeveloped

ASKAPsoft forCommissioning&EarlyScience

Page 19: Are we there yet? - HPC Advisory Council...Are we there yet? Experiences developing and commissioning the HPC System for ASKAP Telescope CSIRO ASTRONOMY AND SPACE SCIENCE Juan Carlos

Results:ASKAPsoft:First36beamimage

Imagecredit:Wasim Raja

• Continuumimagewith9antennasat939.5MHz• Processingresemblesanearly-scienceexperiment• Eachbeamcalibratedseparately• Individualdeconvolution ofdifferentbeams

• OnlyASKAPsoft used

Page 20: Are we there yet? - HPC Advisory Council...Are we there yet? Experiences developing and commissioning the HPC System for ASKAP Telescope CSIRO ASTRONOMY AND SPACE SCIENCE Juan Carlos

Results:NGC7232WALLABYEarlyScience

Credit:JuanMadrid– 14Sep2016

Page 21: Are we there yet? - HPC Advisory Council...Are we there yet? Experiences developing and commissioning the HPC System for ASKAP Telescope CSIRO ASTRONOMY AND SPACE SCIENCE Juan Carlos

ASKAPComputingProject

• Teamof7peopledistributedbetweenPerth&Sydney• ExternalReviews:PreliminaryDesign(2009),CriticalDesign(2010)andProductionReadiness(2016)• Iterativesoftwaredevelopmentprocess~2monthscycles• ContinuousIntegrationTool(Jenkins)• Confluence&JIRA• Subversionsoontobemovedtogit

Page 22: Are we there yet? - HPC Advisory Council...Are we there yet? Experiences developing and commissioning the HPC System for ASKAP Telescope CSIRO ASTRONOMY AND SPACE SCIENCE Juan Carlos

Issues

• 1.3PBFaststorage(Lustre filesystem)aka/scratch2• MultipleusersdoingmanualprocessingneededduringcommissioningandEarlyScience

• SharedwithMWAusers• Shortageofspaceandnon-deterministicperformanceaffectedthedataingestsoftware(ingestpipeline)

• UnderestimatescratchspaceofEarlyScienceProgram

• New1.9PBfilesysteminMay2017• ProcuredbyPawsey• 1PBdedicatedtoASKAPreal-timeand0.9PBtoMWA

• Stillhaveashortageof0.5– 1PBtosupportEarlyScienceprogramdependingonthefateof/scratch2

Page 23: Are we there yet? - HPC Advisory Council...Are we there yet? Experiences developing and commissioning the HPC System for ASKAP Telescope CSIRO ASTRONOMY AND SPACE SCIENCE Juan Carlos

Issues

• Needstableinstrumenttovalidateourassumptions• PAFbeamsstable(relativelyeasytocalibrate)• GoodGlobalSkyModel(continuum)• Imagingperformanceadequate(highdynamicrange)

• EarlyScienceandCommissioningdifferentusecaseasfull(automated)pipeline->ScopeCreep

• Under-estimateeffortonsoftwareintegration,verificationandsupport

• SharingresourceswithASKAPCommissioningandSKApre-construction

Page 24: Are we there yet? - HPC Advisory Council...Are we there yet? Experiences developing and commissioning the HPC System for ASKAP Telescope CSIRO ASTRONOMY AND SPACE SCIENCE Juan Carlos

Nextsteps• SoftwareDevelopmentforbasicmodesforfullASKAP• Scalingtestinganddebugging• Real-timeservicesdevelopmentandintegration(calibration)• Automatedcontinuumandspectrallinepipelines

• AdditionalSciencePipelines• FullPolarisation Calibration• ”PostageStamps”– smallregions(10”spatialresolution)• TransientandZoom-modepipeline

• UpgradeoftheGalaxyplatformin12– 24months(TBD)• Testing,profilinginAthena(benchmarking&datachallenges)• EvaluatingGPUcode• UpdatingASKAPComputingRequirements

Page 25: Are we there yet? - HPC Advisory Council...Are we there yet? Experiences developing and commissioning the HPC System for ASKAP Telescope CSIRO ASTRONOMY AND SPACE SCIENCE Juan Carlos

Nextsteps– TowardsSKA1inAustralia

• SKA1_LOW~100timeslargerthanASKAP

• JointICRAR/CSIROSKAScienceDataProcessingProject(namedRialto)• ContinueourinvolvementinSDPconsortiumtowardsCDR

• NextgenerationofcalibrationandimagingprocessingsoftwareasaprototypeforSKA1_LOW,ASKAPandMWA

• Re-useofASKAPsoft andDAliuGE ExecutionFramework

Page 26: Are we there yet? - HPC Advisory Council...Are we there yet? Experiences developing and commissioning the HPC System for ASKAP Telescope CSIRO ASTRONOMY AND SPACE SCIENCE Juan Carlos

OutlineOverviewofASKAP

ASKAPComputingSystemhistory,challengesandfuture

Lessonslearned

Page 27: Are we there yet? - HPC Advisory Council...Are we there yet? Experiences developing and commissioning the HPC System for ASKAP Telescope CSIRO ASTRONOMY AND SPACE SCIENCE Juan Carlos

Lessonslearned(forSKA1)

• ASKAPoperationsmodeldoesnotfollowtraditionalHPC(batch)user/supportmodel• Buildstrongrelationshipwithserviceproviders:ServiceAgreements,co-location

• DedicatedresourcesatalllevelsforRadioAstronomy:People,Software,Hardware

• Commissioningoftelescopestakeslongtime,significantresourcesandisdifferenttofulloperationsofthetelescope• Supportthetransitionperiodwasunderestimated

• Isolatefastsharedstorage(Lustre filesystem)from“traditional”HPCusermodelandincludemorestorageifyoucan

Page 28: Are we there yet? - HPC Advisory Council...Are we there yet? Experiences developing and commissioning the HPC System for ASKAP Telescope CSIRO ASTRONOMY AND SPACE SCIENCE Juan Carlos

Arewethereyet?

ASKAPsoft isalreadyworking!

Stilllotsofworktodo,manychallengesaheadandmoretolearn!

Whensoftwareisreallyfinished?...…Never?

Page 29: Are we there yet? - HPC Advisory Council...Are we there yet? Experiences developing and commissioning the HPC System for ASKAP Telescope CSIRO ASTRONOMY AND SPACE SCIENCE Juan Carlos

CSIROAstronomyandSpaceScienceJuanCarlosGuzmanHeadofATNFSoftwareandComputingt +61864368569E [email protected] www.csiro.au/cass

CSIROASTRONOMYANDSPACESCIENCE

Thankyou