increasing coherence between simulation and data analytics · § tony hey, stewart tansley, and...

21
Sandia National Laboratories is a multi-mission laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. Increasing Coherence Between Simulation and Data Analytics Chesapeake Large Scale Data Analytics Conference Annapolis, MD October 25, 2016 Rob Leland Vice President, Science & Technology Chief Technology Officer Sandia National Laboratories SAND2016-10762 C

Upload: others

Post on 22-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,

Sandia National Laboratories is a multi-mission laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.

IncreasingCoherenceBetweenSimulationandDataAnalyticsChesapeake Large Scale Data Analytics ConferenceAnnapolis, MDOctober 25, 2016 RobLeland

VicePresident,Science&TechnologyChiefTechnologyOfficerSandiaNationalLaboratories

SAND2016-10762 C

Page 2: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,

Outline

2

§ Ataleoftwovisions

§ Somebackground

§ AchargefromtheNationalStrategicComputingInitiative

§ Answerstothreekeyquestions§ Whyisanincreasingcoherencebetweensimulationandanalyticsimportant?§ Whatisreallymeantby“increasingcoherence”betweenthetwo?§ Howmightcoherencebefurtheredinpractice?

§ Aunifyingvision

Page 3: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,

Vision1:Fromascientificperspective

FromTheFourthParadigm:Data-IntensiveScientificDiscoverybyJimGray

Dataanalysiscomplementstheory,experiment,andcomputation

Page 4: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,

GraphmatchingexampleofdataanalyticsAkeyanalyticprimitive-- usedtofindaspecificinstanceofanabstractpatternofinterest

FromCoffman,Greenblatt,andMarcus,Graph-BasedTechnologiesforIntelligenceAnalysis, CommunicationsoftheACM,47,March2004.

Vision2:Fromanationalsecurityperspective

Page 5: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,

Somebackground

5

§ Simulation§ Computationstounderstandphysicalphenomenaorconductengineering

§ LargeScaleDataAnalytics(LSDA)§ DataAnalytics=Discoveringmeaningfulpatternsindata§ LargeScale=Requiringleading-edgeprocessingandstoragecapabilities

§ LSDAisincreasinginimportance§ Pervasive

§Commerce,finance,healthcare,science,engineering,nationalsecurity,...§ Lastingsocietalsignificance

§ Internetsearch,genomics,climatemodeling,Higgsparticle,...

§ LSDAisgetting“harder”§ Captureddatagrowingexponentiallywithtime§ Individualanalysisbecomingmoresophisticated§ Morepeopleexaminingmoredatamorefrequently§ AggregateworkgrowingmuchfasterthanMoore’sLaw

TheEconomist:

Page 6: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,

NationalStrategicComputingInitiative(NSCI)

6

Page 7: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,

NSCIStrategicObjectives

7

§ (1)Acceleratingdeliveryofacapableexascale computingsystemthatintegrateshardwareandsoftwarecapabilitytodeliverapproximately100timestheperformanceofcurrent10petaflopsystemsacrossarangeofapplicationsrepresentinggovernmentneeds.

§ (2)Increasingcoherencebetweenthetechnologybaseusedformodelingandsimulationandthatusedfordataanalyticcomputing.

§ (3)Establishing,overthenext15years,aviablepathforwardforfutureHPCsystemsevenafterthelimitsofcurrentsemiconductortechnologyarereached(the"post-Moore'sLawera").

§ (4)IncreasingthecapacityandcapabilityofanenduringnationalHPCecosystembyemployingaholisticapproachthataddressesrelevantfactorssuchasnetworkingtechnology,workflow,downwardscaling,foundationalalgorithmsandsoftware,accessibility,andworkforcedevelopment.

§ (5)Developinganenduringpublic-privatecollaborationtoensurethatthebenefitsoftheresearchanddevelopmentadvancesare,tothegreatestextent,sharedbetweentheUnitedStatesGovernmentandindustrialandacademicsectors.

Page 8: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,

Q1:Whyisincreasingcoherencebetweensimulationandanalyticsimportant?

8

§ Forsimulation§ HPCsimulationmustrideonsomecommoditycurve§ Largermarketforcesbehindanalytics§ Canexploitcommoditycomponenttechnologyfromanalytics

§ Foranalytics§ LargeScaleDataAnalyticsproblemsbecomingevermoresophisticated§ Requiringmorecoupledmethods§ CanexploitarchitecturallessonsfromHPCsimulation

§ Forboth:Integrationofsimulationandanalyticsinthesameworkflow§ Automationofanalysisofdatafromsimulation§ Creationofsyntheticdataviasimulationtoaugmentanalysis§ Automatedgenerationandtestingofhypothesis§ Explorationofnewscientificandtechnicalscenarios§ ...

Mutualinspiration,technicalsynergy,andeconomiesofscaleinthecreation,deployment,anduseofHPCresources

Page 9: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,

9

Achallengebecausesimulationandanalyticsdifferinmanyrespects…

Page 10: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,

DatastructuresdescribingsimulationandanalyticsdifferGraphsfromsimulationsmaybeirregular,buthavemorelocalitythanthosederivedfromanalytics

ComputationalSimulationofphysicalphenomena:

Climatemodeling Carcrash

Internetconnectivity Yeastproteininteractions

LargeScaleDataAnalytics:

FiguresfromLelandet.al.courtesyofYelick,LBNL.

Page 11: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,

TheU.S.roadmap,whichhasspatiallocalityandisthusmostsimilarofthethreeinstructuretocomputationalpatternsthatwouldariseintypicalphysicalsimulations.

Computationandcommunicationpatternsdiffer

Black =timespentcomputingGreen =timespentcommunicatingWhite =timespentwaitingfordatatobecommunicated

TheErdős-Rényi graph,awell-studiedexampleingraphtheorywork.

A scale-freegraph,anexamplemorereflectiveofreal-worldnetworks.

FigurefromLelandet.al.courtesyofJohnson,PNNL.

Page 12: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,

Simulation

Analytics

Standardbenchmarksinclude:• LINPACK(smallestdataintensiveness;barelyvisibleongraph)• STREAM• SPECFP• SpecInt

MemoryperformancedemandsdifferAkeydifferentiatorintheperformanceofsimulationandanalytics

FigurefromMurphy&Kogge withadjustmenttodoubleradiusofLinpack datapointtomakeitvisible.

Areaofthecircle=relativedataintensiveness(i.e.totalamountofuniquedataaccessed overafixedintervalofinstructions)

Simulation

Analytics

Page 13: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,

Applicationcodeproperty Simulation Analytics

Spatiallocality High Low

Temporallocality Moderate Low

Memoryfootprint Moderate High

Computationtype Maybefloating-pointdominated* Integerintensive

Input-outputorientation Outputdominated Inputdominated

*Increasingly,simulationworkhasbecomelessfloating-pointdominated

Applicationcodecharacteristicsdiffer

Contrastingproperties:

Page 14: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,

Q2:Sowhatdowereallymeanby“increasingcoherence”betweensimulationandanalytics?

14

§ NOTonesystemostensiblyoptimizedforbothsimulationandanalytics

§ Greatercommonalityinunderlyingcomponentryanddesignprinciples

§ Greaterinteroperability,allowinginterleavingofbothtypesofcomputations

…Amorecommonhardwareandsoftwareroadmapbetweensimulationandanalytics

Page 15: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,

15

Andyet,thereishope…

Page 16: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,

Simulationandanalyticsareevolvingtobecomemoresimilarintheirarchitecturalneeds

16

§ CurrentchallengesfortheLSDAcommunity§ Datamovement§ Powerconsumption§ Memory/interconnectbandwidth§ Scalingefficiency

§ InstructionmixforSandia’sHPCengineeringcodes§ Memoryoperations 40%§ Integeroperations 40%§ Floatingpoint 10%§ Other 10%

§ Commondesignimpactsofenergycosttrends§ Increasedconcurrency(processingthreads,cores,memorydepth)§ Increasedcomplexityandburdenon

§ systemsoftware,languages,tools,runtimesupport,codes

…similartoHPCsimulation

…similartoLSDA

Page 17: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,

Energycostofmovingdataisbecomingdominant

Energycost,inpicojou

les(pJ),pe

r64

-bitflo

ating-po

into

peratio

n

Costestimatesfortechnologyyear

Energycostforvariouscommonoperations

FromDanMcMorrow,TechnicalChallengesofExascaleComputing,JSR-12-310,JASON,MITRECorporation,April2013.

Page 18: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,

ArchitecturalCharacteristic

Simulation Analytics

Computation Memoryaddressgenerationdominated Same

Primarymemory Lowpower,highbandwidth,semi-randomaccess Same

Secondarymemory Emergingtechnologiesmayoffsetcost,allowingmuchmorememory …require extremelylargememoryspaces

Storage Integrationofanotherlayerofmemoryhierarchytosupportcheckpoint/restart …tosupportout-of-coredatasetaccess

Interconnecttechnology Highbisectionbandwidth,(forrelativelycoarse-grainedaccess) …(forfine-grainedaccess)

Systemsoftware(node-level)

Lowdependenceonsystemservices,increasinglyadaptive,resourcemanagementforstructured parallelism

…highlyadaptive,resourcemanagementforunstructured parallelism

Systemsoftware(system-level) Increasinglyirregularworkflows Irregularworkflows

Emergingarchitecturalandsystemsoftwaresynergies

Similarneeds:

Page 19: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,

Q3:Howmightcoherencebefurtheredinpractice?

19

§ Makingitanelementofnationalstrategy§ CheckviatheNSCI

§ Buildingthisintoexascale computingefforts§ AlsoacomponentoftheNSCI

§ Communicatingwithandenlistingthetechnicalcommunitiesconcerned§ Thisforumandsimilarevents

§ Furtherdevelopingthevision§ Today’sdialoguesession!

Page 20: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,

Acknowledgements

20

Page 21: Increasing Coherence Between Simulation and Data Analytics · § Tony Hey, Stewart Tansley, and Kristin Tolle (editors), The Fourth Paradigm: Data-Intensive Scientific Discovery,

Additionalreferences

21

§ TheEconomist,“Data,Data,Everywhere,” Feb25th,2010

§ R.C.MurphyandP.M.Kogge,“OntheMemoryAccessPatternsofSupercomputerApplications:BenchmarkSelectionandItsImplications,”IEEETransactionsonComputers56(7,July2007):937–945.

§ R.Murphy,“PowerIssues,”presentationtoJASON2012,June2012.

§ PeterKogge (editor)etal.,ExaScale ComputingStudy:TechnologyChallengesinAchievingExascaleSystems. DARPA,2008.

§ DanMcMorrow,TechnicalChallengesofExascaleComputing,JSR-12-310,JASON,MITRECorporation,April2013.

§ TonyHey,StewartTansley,andKristinTolle(editors), TheFourthParadigm:Data-IntensiveScientificDiscovery,MicrosoftResearch,2009.

§ JimGray,TheFourthParadigm:Data-IntensiveScientificDiscovery