nasa big data task force february 16 minutes · 2020-04-28 · nasa advisory council ad hoc big...
TRANSCRIPT
![Page 1: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0890867e708231d422a2d5/html5/thumbnails/1.jpg)
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
1
Ad Hoc Big Data Task Force of the
NASA Advisory Council Science Committee
Meeting Minutes
Inaugural Meeting February 16, 2016
NASA Headquarters Glennan Conference Room, 1Q39
_____________________________________________________________CharlesP.Holmes,Chair
____________________________________________________________ErinC.Smith,ExecutiveSecretary
![Page 2: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0890867e708231d422a2d5/html5/thumbnails/2.jpg)
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
2
ReportpreparedbyJoanM.ZimmermannIngenicomm,Inc.
![Page 3: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0890867e708231d422a2d5/html5/thumbnails/3.jpg)
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
3
TableofContentsIntroduction 3Charter/ScienceCommitteeandSubcommitteeFeedback 3LegacyfromNACITIC 4Discussion 5HPDBigData 6ScienceCommitteeGreetings 8BigDataandEarthScience 9SupercomputingandBigData 10APDandBigData 11Publiccomment 13OtherFederalBigDataInitiatives 13PlanetaryScienceBigData 14Discussion/wrap-up 15 AppendixA-AttendeesAppendixB-MembershiprosterAppendixC-PresentationsAppendixD-Agenda
![Page 4: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0890867e708231d422a2d5/html5/thumbnails/4.jpg)
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
4
IntroductionDr.ErinSmith,ExecutiveSecretaryoftheNASAAdvisoryCouncil(NAC)AdHocBigDataTaskForce(BDTF),calledthemembershiptoorderandmadesomeadministrativeannouncements.Dr.CharlesHolmes,ChairoftheBDTF,openedtheinauguralmeetingoftheBDTF.Introductionsweremadearoundthetable.Charter/SubcommitteeFeedbackDr.SmithpresentedanoverviewoftheTaskForce,whichwascreatedinresponsetoanumberofWhiteHousedirectivesontheBigDataconcept,whichrelatedtothepurviewsofNASA’sHeliophysicsandEarthSciencesdivisions(HPDandPSD),whichengageinthestudyofsolaractivityandsolarstorms,andweatherforecasting.Theadministrationalsoexpressedagreatdealofinterestintheinteroperabilityofdatasets,andrelatedusesofBigData.Successfulapplicationsofscienceintheseareaswillrequirethebreakdownofsubdisciplinestovepipes,andtheinteroperabilityofNASAdatasetswiththoseoftheNationalOceanicandAtmosphericAdministration(NOAA)andtheUSGeologicalSurvey(USGS),makingdataavailabletonumerousenduserssuchasemergencyresponseanddisasterreliefagencies.BigDatamayalsoenabletheidentificationofactionablescienceinformation,makingdatausefulforunforeseenapplications.BigDataalsomeansdifferentthingstodifferentusers,andforspecificdata-handlingtools,dataformats,andthecreationofdatastandards.ApplicationsvaryfortheAstrophysics(supernovamodels),Planetary(identifyingexoplanets,galaxyformation),andHeliophysicsdivisions(onetarget/manymissions,coronalmassejections,radiationenvironmentforhumanexploration).NASA’sEarthScienceDivisionhasbeenmanagingandexploitingBigDataformanyyearsincreatingclimatemodels,andforsocietalapplicationssuchasdroughtforecastinganddisasterresponse.ManyNASAspacebornemeasurementsarecurrentlybeingusedtoimproveairqualitydecisionsupportsystemsinTexas,andinproducingaccuratecloudformationmodels.HPDdataandengineeringdataarebeingfedintoanIntegratedRadiationProtectionSystem,tohelpdeterminehowtogettoacceptableriskfiguresforradiationexposureinhumanexploration.Thetermsofreference(TOR)fortheBDTFformabroadcharter,whichcanbedescribedasexaminingwhatthecommunityasawholeisdoinginBigData,aswellaswhatotheragenciesaredoing,andidentifyingwhatcanbedonebetter.TheintentistocataloguebestpracticesinNASAandotherfederalagencies,aswellasinprivateindustry,researchinstitutions,andacademia.Oneofthefinalproductsmaybeawhitepaperreportingoutfindingsandrecommendations.AmajorchallengefortheTaskForcewillbetodefinewhattheterm‘bigdata’meanstothevariouscommunities;toanastronomeritisanarchiveissue.ToHPDandESD,itisinteroperabilityissuesandengineering.Otherchallengeswillbetodeterminethemostusefulandefficientarchitectures,storagemodes,dataaccessibility,datarates,datasecurity,andintellectualpropertyrequirements.Howdowecommunicatewhatdatasetsaresaying,andhowdowetrainpeopleinuseofdatasets?Itisadynamicarea.Todate,theBDTFhas
![Page 5: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0890867e708231d422a2d5/html5/thumbnails/5.jpg)
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
5
completeditsethicstrainingandisintheprocessofsigningonitslasttwomemberstoroundoutthecommittee.TheNACScienceCommitteehasprovidedfeedbacktotheBDTF,namelytoacquiremorerepresentationfromcommercialentitiesandothernon-NASAsciences,aswellastoconsiderground-basedsciencesthatmayhaveproducedscientificdata;Feedbackwasalsotolookatdatavisualization;datapermanence;anddatausage.TheScienceCommitteehasaskedthattheBDTFactasago-betweenforcommunity,andtofindlinksandleveragepointswithexistingeffortsonbigdata.TheScienceCommitteealsorecommendedthatBDTFinvitepeoplefromtheNASAarchives,NASAAmesResearchCenter,simulationexperts,modelers,andindustrypartners.Withindisciplines,practitionersshouldbeabletounderstandthemselveswithintheirsubfields,andtoallowforcross-pollinationbetweensubfields.TheBDTFhasalsobeenaskedtofindthebestwaytogatherfeedbacksothattheScienceCommitteeanditssubcommitteescanbenefitfromthiseffort(surveytoindustrymembers,townhalls,e.g.).TheNACSciencesubcommitteeswouldliketheBDTFtoaddressdatausability,managementandaccess,utilization(includingreal-time),analysisanddataminingoflargedatasets,algorithmandstatisticsdevelopment,datacuration,archivingtoolsandtechnology,visualization(suchashyperwall),andusingstateoftheartinformationtechnology(IT)systemsandtools.Otherquestionstoaddress:Whatopportunitiesarethereinbigdata?Whichsubjectmatterexperts(SMEs)shouldbeconsulted?Whatkindofproductsaredesirable?Dr.Holmesnotedthatgiventheextensiveshoppinglist,hewishedtodeviseaworkplantousethelimitedtimeavailable,inordertodistilltheTaskForceoutputintosomethingvaluable.Astotheterm“interoperability,”hechallengedDr.Smithtofine-tunethisdefinition,asitisawide-opentopic.Hebelievedthatinnovationcomesfromthebottomup,andworriedthat“interoperable”raisessomeredflagsforthecreationoftop-downmanagement.Dr.ClaytonTinoworriedabout“needsforfutureuse,”whichwouldrequireafundamentalunderstandingofdataformats;itisnearlyanon-solvableproblemtomakedataunderstandabletoallcommunities.Dr.JamesKintercommentedthatinteroperabilitytendstobecomeacatchallphraseforsimulationandmodeling,bestpractices,andinteroperabilitybetweendisciplinescientists(includingmetadataanddocumentation).Dr.RetaBeebenotedthat“datamining”connotessomethingmagicalandisamajorquestion.Externally,peoplethinkthatdataminingismagicallydone.Datasetsaresodifferent,particularlyinPlanetaryScience,thatdataminingbecomesamajorproblem.Dr.Holmesreiteratedhisbeliefinthebottoms-upapproach,andtoallowsuccessesfromthisapproachtoreplicatethroughotherscientificareas.LegacyfromNACITInfrastructureCommitteeDr.HolmesgaveanoverviewoftheBDTF’shistory,havingservedasvicechairoftheNACInformationTechnologyInfrastructureCommittee(ITIC),whichstoodfrom2010-2013.ItsmainaffiliationwaswiththeNASAChiefInformationOfficer(CIO),butithadtiesacrossNASAaswell,inareassuchascybersecurity.TheNACrecommendedthatboththeITICandtheScienceCommitteeexploreanapproachtoimproveaccessto
![Page 6: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0890867e708231d422a2d5/html5/thumbnails/6.jpg)
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
6
NASAsciencedatarepositories,withthatexplorationtoincludebestpractices,etc.,thathavebeentranslatedtothepresentTORfortheBDTF.InFall2013,theNACadvisorycommitteestructurewasrevamped,cybersecuritywasputundertheaegisofanewcommittee,andtheworkoftheformerITICnowcontinueswiththecurrentBigDataTaskForce,reportingtotheScienceCommittee.OneofthefirstrecommendationsoftheformerITICwasthatNASAshouldtakeadvantageofassetsintheFederalgovernment,suchasGPUclusters,cloudcomputingundertheNationalScienceFoundation(NSF),andothersponsorship.ITICalsorecommendedthatNASAimprovethecyberinfrastructurethatsupportsAgencyscience.OneofthefindingsoftheITICnotesthatNASAsciencedatadoesnotsitinoneplacebutisdistributedacrossNASAcenters,atUSGS,industry,anduniversities.NASAdatacentersarediscipline-focused,andaremanagedinthisway.Thenumberofsciencepublicationscomingoutofthesecentersisgrowingdramatically.EducationandPublicOutreachcontinuestotapintothesedatastores,sometimesdirectly,andsometimesthroughagroupthatprocessesitforthegeneralpublic.TheDepartmentofEnergy(DOE)hassetupabackbonethroughoutthecountrywithmanynodesnotfarfromtheNASAcenters;itwouldbegoodtoleveragethispipeline,aswellasa10-Gpsnetworkresearchthatlinksresearchinnovationlaboratories.UseofNASAsupercomputersatbothGoddardSpaceFlightResearchCenter(GSFC)andAmesResearchCenter(ARC)isgrowing.TheEarthObservingSystemDataandInformationSystem(EOS-DIS)isalsogrowinginitsdataproductdistribution.Webservicestosupportdisasterapplications,suchastheShort-termPredictionResearchandTransition(SPoRT)CenteratMarshall,aretransitioningresearchdatatotheoperationalweathercommunity.TheSolarDynamicsObservatory(SDO)isrevolutionizingthewayweunderstandthesun,andiscollectingroughlyapetabyteofdataperyear,with5petabytesperyearworthofprocessing.Therehasbeenatwo-order-of-magnitudejumpinwhatsolarphysicshadbeeningestingpreviouslyfromoldermissionssuchasHinode.NASA’sMultimissionArchiveatSpaceTelescope(MAST)isshowingalmostexponentialgrowth,andwhichwillgrowevenmorewhenfuturetelescopemissionscomeon-line.Thereare200-plusappsintheAppleiStorethatwillreturnfromasearchonNASA;manyoftheseappsareinhighdemandfromthepublic,andpullprocessedresultsoutofNASA’sdatastores.Morethan250,000peoplehavetakenpartinNASA’sGalaxyZooprogram.In2012,theOfficeofScienceandTechnologyPolicy(OSTP)sentoutamemotothepublicannouncingaBigDataInitiative,earmarking$200Mtobespentonimprovingaccesstothegovernment’sbigdatastores.In2013,thereweremorememosandExecutiveOrderscomingoutonthisissue,butNASAwasmissingfromthelistofrecipients(DOE,DepartmentofDefense,andothers);soitmustbeasked-wheredidNASAmisstheboat?Dr.HolmesnotedanITICfindinginNovember2012,thatNASAacquirefiber-opticpathwaystosupportcurrentandfuturedata,andarecommendationthattheybuyratherthanownthesepathways.DiscussionThecommitteediscussedadraftworkplantodeterminehowtheBDTFwouldmoveforward.Dr.HolmesfeltthattheBDTFshouldn’taddresstheareasofdatasearchability
![Page 7: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0890867e708231d422a2d5/html5/thumbnails/7.jpg)
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
7
andavailability,proprietaryperiods,long-termarchiving,andotherfrequentrequeststhataremadeofNASA’sdatastores,feelingthatprocessesarealreadyinplaceforthisatNASA.TheBDTFshouldbreaknewgroundinstead,andshouldsurveythecommunity,choose3to4topics,andproduceproducts.TheBDTFshouldformaconciseproblemstatement,research,organizeanddeveloppositions,formaconsensus,anddraftandpresentresultsinawhitepaper(4-6pp)accompaniedbyaslidepresentation.BecausetheBDTFexpiresinDecember2017,thereareonly4-5moreface-to-facemeetingsinadvanceofeachofthefutureScienceCommitteemeetingsinwhichtodevelopfindingsandrecommendationstotaketotheScienceCommittee.Tothisend,theTaskForceshouldalsoholdteleconferencesasappropriate.Dr.HolmesreviewedhisdutiesasChairasprimarilybeingtherepresentativetotheScienceCommittee,andclosedwiththethought:“Dogood,workhard,NASAneedsus.”Dr.RayWalkeragreedthatdataavailability/searchabilitydidnotrequireahardlook,butnotedthatasdatavolumesgetlarger,itwillbenecessarytofigureoutthepieceswewanttouse;inthissensetheissueisstillimportanttoconsider.Dr.HolmesinvitedDr.WalkertowriteupanactionablerecommendationontheissueandsendittoDr.Smith.Dr.Tinocommentedthattherearemodel-level,internal,andexternalusedomains;whatisitthatareweactuallytryingtodo?Heagreedtowriteupanitemonthisquestion.Dr.Kintersaidthatitseemsthatbydefinition,BigDatameansthebiggestandbaddestdatasets;inthatrespect,wetypicallyweseeaccessibilityasawaytoaggregateandanalyzedatafromanentiredataset(petabytes);veryfewuserswillhavetheresourcestooperatedatasetsofsuchmagnitude.TheTaskForceshouldalsothinkaboutfacilitatingtheanalysisofdatasetsthataretoobigtomoveandtoobigtoanalyzein-situ.Dr.Holmesagreedtorevisetheworkplanwiththeadditionsofthewrittencontributions,andtolookatareasthatcanbeextendedbeyondthestateofwork;theBDTFneedstolookatbenchmarksregardingthisissue.HPDBigDataDr.JeffreyHayespresentedareasofconcernfortheHeliophysicsDivision(HPD)intermsofBigDataneeds.HPDstudiesthesun’svariance,theresponseofgeospace,andtheSun-Earthsystem’simpactsonhumanity.Todothis,HPDengagesinthescienceofspaceweather,triestounderstandtheinterconnectionsbetweentheSunandEarth,anddevelopsknowledgetoimprovethepredictionofextremeeventssuchasmajorcoronalmassejections(CMEs).Themissionportfolioincludesaresearchandanalysis(R&A)line,anExplorersmissionline,alongwithLivingwithaStar,SolarTerrestrialProbes,andthesoundingrocketsprogram.MissioninvestmentisguidedbytheDecadalSurveysandNASA’sadvisorybodies.TheHPSystemObservatoryincludesnumeroussatellitessuchasIRIS,Wind,STEREO,theVanAllenprobes,andtheInterstellarBoundaryExplorer(IBEX).Withinthecurrentmissionsandtheoperationsbudgets,thereisacertainamountoffundingfordataarchiving,andthecreationofstandardsandaccessibility.Dr.Hayesfeltthatmostmissionswereabletorespondquicklytodecisionsondataarchivingandcuration.SeniorReviewsaddressthescientificmeritsofHPDmissionseverytwoyears,andtakeintoaccounttheaccessibility,usabilityandutilityofdata(includingarchivingafterthemissioniscomplete).Asaresult,thedatapipelineisdoingverywell.
![Page 8: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0890867e708231d422a2d5/html5/thumbnails/8.jpg)
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
8
About70-80%ofHPDdatacomefromextendedmissionphases.Thesunvariesinaroughly22-yearcycle;alloftheseHPDmissionsoperatingsimultaneouslyarebeginningtoenabletheunderstandingofaverycomplexsystem.TheaveragecostofaHeliophysicssatelliteoperationis$2.9Mannually.TheSolarDataAnalysisCenter(SDAC)andSpacePhysicsDataFacility(SPDF)aretheactivearchivesforHPDandrunatabout$3.3Mperyear.ThereisalsoaROSESelementamountingtoabout$1Mayear.Thus,thetotaltocuratethedataisabout$4.5Mperyear,plussomemoneyinthemissionlinesthemselves.Dr.Hayesnotedthat“Scientistswantallthedataallthetime,forever.”Intheearly2000s,theDecadalSurveycameoutwithapriorityforaVirtualObservatory,inwhichtheideawastocollectallthedata(bothAstrophysicsandHeliophysics)andmakeituniversallyaccessiblethroughcommonstandards.Atthetime,Astrophysicshadonestandard,andHeliophysicshadmultiplestandards.Overthelast20years,NASAhasbeentryingtogetthesestandardsinline,andDr.Hayesfeltthatgoodprogresswasoccurringinthisarea.Heliophysicshasanexplicitpolicythatestablishedstandards,whichareFITS,CDF,andNetCDF.NASAisinamuchbetterplacethanitwas10yearsagointermsofstandardization.HPDhasalsorestoredalargefractionofdatafromitsoldermissions,andhasbeensystematicallyexaminingoldarchivesandrestoringdataarchivesanddatasetsofscientificinterest.Foranymetadata,itisnecessarytogeteveryonetoagreeonkeywords.HPDhasgottengoodbuy-in,anduserscannowusetheSpacePhysicsArchiveSearchandExtract(SPASE)metadatawrapperstodoaninventory,searchbydateorevent,etc.,tohelpdosystemscience.Theprocesshasgottenalotbetter,andappearstobegoingfaster.HPD’sthreemostrecentmissionsaresuccessfullyusingtheSPASEmetadatawrappers.ThefirstdatafromMagnetosphericMultiscale(MMS),forexample,willbeavailableonSPDFonMarch1.HPDisstartingtogetterabytesofdata-thisisanewexperience.Thereare800TBfromSDOtodate,andthevolumeisgrowing.HPDisnowlookingatstoring1PBintheSDAC;thisdatavolumewillprobablytripleorquadrupleasfuturemissionscomeonline.StanfordUniversitywillnotalwayssupportSDAC;atsomepointthedatawillhavetobroughtbacktoNASA.Dr.Hayesfeltthatputtingdataonthecloudwasstillaniffyprospect,andcitedarecentaccidentaldeletionofstoreddataasoneofitspotentialdrawbacks.Solarprojectdatavolumegrowth,intermsofbothlifetimedatavolumeanddatarate,willcontinuetogrow.Thequestioniswhereandwhowillstoreit,andhowwillitbemovedaround?HPDcan’tthrowdataawaybecauseHeliophysicsscienceneedsthecontext.Datapolicyisworkingwell.HPDhasaregistryandinventoryofthedata,andisconstantlyupdating.Legacydatasetshaveprettymuchcompletedtheirextractions.NowHPDisconcentratingonstandards.AfuturechallengeishowtousetheSPASEmetadata,howtousethedata,andhowtomakeitaccessibletothenon-expertuser.Remotesensingvs.in-situmeasurementsareverydifferentandthesedifferencesmustbetakenintoaccount.Formodeling,howdowearchiveuseful,powerfulcomparisons?Atthispoint,modelsdonothaveastandard;weareworkingtowardit.Aswemoveawayfrom
![Page 9: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0890867e708231d422a2d5/html5/thumbnails/9.jpg)
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
9
theVirtualObservatoryconcepttoamoreconsolidatedwayofgettingdataout,wemustfocusonmetadataandlinkstogenericaccessmethods,andavoidstovepiping.Theinterdisciplinaryaspectsofdatawillbeaddressedbyalargergroup.Dr.HayesnotedthattheVirtualObservatoryconceptdidnotfail,butthetechnologyhassincemovedon.Dr.HolmesaskedDr.HayestoidentifyHPDneedsfromtheBDTFstandpoint.Dr.Hayesrepliedthatoneusefulfindingacknowledgingthevalueofstandards.Theotherissueofconcernforhimwastheunfundedmandateaboutkeepingversionsofdatainperpetuity.ThereisaNASApolicyinresponsetotheOSTPaboutpublicaccessibilityandpublications,howevertheworrisomeissueiswhetherthereferencedatainapaperhascertainpedigreethatmayormaynotbepreservedinthearchive.Whoownsthefinaldata?Whichversionofthesoftware?Thereisneverenoughdiskspace.Anotherusefulfindingwouldbeastatementthathavingdataactive,on-line,isagoodthing.Data,especiallytaxpayer-fundeddata,shouldn’tbeburiedinsomeone’sdeskdrawer.NASAtendstogetpushbackfromprincipalinvestigatorsonthisissue-theyfeeltheirdataisproprietary.Dr.HayesagreedtowriteupanitemforDr.Smith.Dr.Kintercommentedthatthereisnodatastandardformodels,andthatthisisachallengeforthefuture;hewonderedhowmuchinteractionthereisbetweentheHeliophysicscommunityandthetroposphericandweathercommunities.Dr.Hayesfelttherewasnotmuchinteraction,certainlynotatthetroposphericlevel.Therearemeetingsongoing,however,andHPDwouldbeopentoanythingtheothercommunitieshavethatcanbeused.Thevariablesmaybedifferent,butitissomethingthatcouldbeexplored.Dr.WalkermentionedthattheNationalScienceFoundation(NSF)islookingintodataassimilation.Dr.HolmesnotedthatthecommunityhadlookedatcompatibilitybetweenEarthScienceandHeliophyicsdatatenyearsago,andstoppedbecauseofdatasparseness.Dr.NealHurlburtagreedthattheeffortwasstillatthecasestudy-level.IRISisagoodexampleofwherewewereforcedtousemodels.Dr.Kinternotedthattherearealsooceandataassimilationsthathaveasimilarproblemwithdatasparseness.Thetroposphericproblemhasmovedwellduringthelastdecade,andcanaccommodatedatasparsenessalittlebetter.GSFChassomeexpertisehere.Dr.HolmesaskedDr.KinterprovidePOCsatGoddard.Dr.WalkermentionedthatthePlanetaryDataSystem(PDS)hasbegunastudyofarchivingmodels,aswellastheCommunityCoordinatedModelingCenter(CCMC),andEuropeanworkinbothHeliophysicsandPlanetaryattheUniversityofParis;thesecanprovideusefulLessonsLearned.ScienceCommitteeGreetingsScienceCommitteeChair,Dr.BradleyPeterson,addressedthecommittee,thankingmembersfortheirimportantcontributions.Henotedthattimewasapressingissue,andurgedtheBDTFtofocusonfindingcommonalitiesandbestpracticesacrossthesubdisciplines,andbuildingontheexistinginfrastructureonlyifitisuseful.HeaskedthemembershiptoregardtheNASAbudgetisazero-sumgame,asNASAwillbuyintorecommendationsonlyiftheyareaffordable,orwhethertheyareworthgivingupsomethingfor.Eatingintothebudgetformissionsandresearchwouldbeanundesirableoutcome.Dr.PetersonsuggestedthattheBDTFconsultwithsubcommittee
![Page 10: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0890867e708231d422a2d5/html5/thumbnails/10.jpg)
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
10
chairswhenuseful,inordertoiterateideasacrosstheScienceCommittee,subcommittees,andBDTF.BigDataandEarthScienceDr.KevinMurphypresentedanoverviewoftheEarthScienceDataSystemsprogram,andstatedthatregardlessofvaryingdefinitionsofbigdata,EarthSciencehasit,aswellasalargeuserbase.Objective2.2ofthe2014NASAStrategicPlaninformstheusageofEarthSciencedatatoformaviewofEarththatcanbeusedacrossdisciplines:ocean,atmosphere,cryosphere,etc.andtheirinteractions.TheEarthObservingSystemDataandInformationSystem(EOSDIS)isthelargestcomponentoftheEarthSciencedatasystem,andisassociatedwiththecompetitivelyselectedprograms,MakingEarthSystemdatarecordsforUseinResearchEnvironments(MEaSUREs)andAdvancingCollaborativeConnectionsforEarthSystemScience(ACCESS).EOSDISworksinternationallyandamongthefederalagenciestogetdatatothepublic,andprocessesdatafromlevel0tohigherproductstomakeavailabletousers.EOSDISwasinitiatedin1990,incorporatingheritagedatasetsin1994fromsatellites,aircraftandin-situsensors(e.g.fluxtowers),andwasdesignedtohandleaterabyteofdataperday.EOSDISreprocessesdataquiteoftenasinstrumentsdeteriorateorasbettersignalprocessingmethodsbecomeavailable.Thereareabout15petabytes(PB)ofdatacurrentlyavailable,allofwhichinteroperatewithotheragenciesandarchivesthroughestablishedstandards.EOSDIShasadistributedframework,andhashadanopendatapolicysince1997.Thesystemgeneratesbiophysicalproductsandgeolocatesthem,anddistributestotheendusers.EOSDIShasanextensivevolumeofdatarepresentedinover9200datatypes,whichrangeoverhumandimensions,land,atmosphere,oceandynamicsandthecryosphere.Thesystemworkscloselywithmissionsinformulationanddevelopmentinordertopreparedataplans.EOSDISisspreadoutovertheUS.MissiondataareprocessedbyScienceInvestigator-ledProcessingSystem(SIPS),whicharethenpassedalongtotheDistributedActiveArchiveCenters(DAACs)tosupporttheuserbase.DAACsarelocatedathostorganizationsthatarewidelyrecognizedbythecommunity,andeachDAAChasaworkinggroupthathelptodirecthowtheDAACswork.ThereisalsoaProgramScientistwithineachDAACthatroughlyalignswitheachsubdiscipline.ThetwocomponentsoverseeingtheDAACsareprimarilyHeadquartersformanagementandtheGoddardSpaceFlightCenter(GSFC)forimplementation.TheEarthScienceDataandInformationSystem(ESDIS)managesthecoordinationofEOSDISactivitiestoavoidduplicationofefforts.ESDISholdsannualmeetingsandcontinuallytakesinputthroughweeklyteleconferencesandannualmeetingswithDAACsmanagersandDAACsystemsengineers.Roughly160-180peoplegototheannualmeetings.TheEOSDISinfrastructurealsotiestogetherusersandDAACsthroughearthdata.nasa.gov,acommonmetadatarepository(CMR),GlobalImageryBrowseServices(GIBS),EOSDISMetricsSystem(EMS),andvarioususersupporttools.EOSDISperformsanannualcustomersatisfactionsurvey,andalsohasDAACUserWorking
![Page 11: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0890867e708231d422a2d5/html5/thumbnails/11.jpg)
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
11
Groups,whichreceiveregularfeedback.EOSDISmetricsfrom2015show9462uniquedataproducts,and2.6MdistinctusersofEOSDISdataandservices.EOSDISdistributesabouttwiceasmuchdataasitingests.In2015,thesystemreceivedanACSIscoreof77(consideredverygood).Thetrendforproductdeliveryisincreasing.EOSDISconvertshigh-valueproductsintoimagery,suchastheNASAWorldviewwebsite,whichusesdatafromtheAqua/Terra/ModerateResolutionImagingSpectroradiometer(MODIS)satellites,andNOAA’sVisibleInfraredImagingRadiometerSuite(VIIRS).WorldviewworksmuchlikeGoogleEarth;userscanzoominandgobackintime.Userscanalsooverlaydata,suchastheSO2cloudoveraneruptingvolcano,andfindspecificdatasuchasfirehotspots.EOSDISholdsSeniorReviewstoevaluatethevarioussubsystemstoevaluateperformanceandscientificmerit.Dr.Walkernotedthemanyhighlyderiveddataproducts,andaskedhowEOSDISkeptupwithevolvingalgorithms.Dr.Murphyexplainedthatstandardproductsareproducedincollections,andEOSDISiscurrentlygoingfromMODIScollection5tocollection6,reprocessingdata.Collection5willbemaintaineduntilcollection6iscomplete.Scienceteamswilldeterminewhenthenewcollectionisdone.Dr.HolmesaskedwhattheBDTFcouldforEarthScience.Dr.MurphyfeltthatNASAreceivedlittlerecognitionforthisimportantwork,asitisgenerallynotwellunderstood.Thedataproductrampiscurrentlylimitedbyadaptingtoinputfromnewinstruments.EOSDIShastoputalgorithmsclosertothedatainawaythatallowsunimpededaccesstoproducts;howtodothisisstillanopenquestion.NASAalsoneedstolearnhowtoworkwithcommercialhigh-performancecomputinggroups,maybe.Dr.Hurlburtaskedhowmanyofthe2.9Mdistinctuserswerepartoftheactive(science)community.Dr.Murphyrepliedthatpeoplewhousealotofthedatawillfrequentlyuseallofit(operationaluserswhouseLevel1data).Thenumbersofgraduatestudents,etc.,arehardtoestimate.Dr.KinteraskedhowESODISdealtwiththebudgetrealities.Dr.MurphynotedthatEOSDISrecognizestheneedtodeveloporadoptstandardized-enoughcomponentstoallowpeopletodeveloptheirowntools,astrategythatsavesbothtimeandeffort.NASAdoesn’twanttobethefirstadopterorthelast.Thestrategydependsonthecommunity.EOSDISkeepstheprincipleofopenapplicationprogramminginterfaces(APIs),andopenaccess.Thecommunityiswellawareofthedatapolicy.Dr.WalkeraskedabouttheextentofwhichNASAprovidesinteroperabilityinitsjointworkwithNOAA.Dr.MurphyexplainedthatNASAoperateswithNOAAonacataloguelevel,usesopensoftwaresourcing,sharesobservations,andworkscloselywithNOAAontheClimateInitiativeandintheairborneprogram.SupercomputingBigDataDr.TsengdarLee,ProgramManageroftheEarthScienceDivisionSupercomputingProgram,presentedanoverviewoftheprogram,andtheNASAvisionforfuturecomputingservices.NASAhastwosupercomputingcenters,oneatAmesResearchCenter(ARC),whichservestheentireagency)andoneatGSFC,whichservesprimarilyEarthScience.ARCsupportsagency-wideactivities,fromlaunchvehiclestogeneralrelativity.
![Page 12: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0890867e708231d422a2d5/html5/thumbnails/12.jpg)
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
12
InAugust2015,theNASAFlagshipcomputer,Pleiades,reachedahalfbillionSBUs(computingcycles)deliveredaccumulativelyfrom2008,translatingtonearly$300Mofservices,atacostofroughly26centsperSBUin2015.NASAcontinuestogrowthesystem,relyingonMoore’slawtogoforward(Dr.Leenotingthatsomearguethatthelawhascometoitsend).Scientificandengineeringeffortswillgrow,thusNASAwillhavetocomeupwithauserpolicybecausethesystemhasbecomeoversubscribed.TheROSESselectionprocessisnowbeingtightlycoupledtotheavailabilityofcomputingtime.ForEarthScienceimagingandmodeling,thesystemcanpushtheresolutiondownto1.5kmcurrently;theholygrailofatmosphericscienceis0.5km.Theworkloadischanging,shiftingintodataprocessing.Asanexample,theKeplermissionisusingPleiadestosupportvalidationfornewexoplanets.Thishasbecometheprimaryavenueforproducingdiscoveriesinthatarea.Dataassimilationsystemsarebeingusedtocreatephysicallyconsistentlong-termdatasets,from1979tothepresent,andarealsodownscalingtohigherresolutiondataforclimatestudies.TheOrbitingCarbonObservatory(OCO-2)ispresentingdataprocessingchallenges.NASAisdoingadatare-processingcampaignwithnewalgorithms,withabout60%ofthisworkbeingdoneonthesupercomputerand40%ontheAmazoncloud.HighEndCapabilityComputing(HECC)isbeingusedtoclear5yearsofanunmannedaerialvehiclesyntheticapertureradar(UAVSAR)dataprocessingbacklog,toreducelatency.Processingismovingintothebigdataarea,pitchinghigh-performancecomputingagainstLargeScaleInternet.Canhigh-performancecomputing(HPC)beusedasaprivatecloud?Howdoweputtogetheranarchitecturetoprocess,analyzeandminedata?Currently,datastorageanddatamanagementisthecoreofthebusiness,withdatainthemiddle,andalltheserviceandprocessingsurroundingthedataset.AScienceCloudarchitectureideallyprovidesanagile,highlevelofsupport,withthesystemowningthedata,usingadatamanagementsystem,dataanalyticsservice,openstack,etc.NASAisconstantlylookingatnewtechnologies:cloudandvirtualization,high-performanceobjectstore,andSciDB(thelatterheavilysupportedbyDARPA).Thesciencebenefitofasciencecloudhashelpedtovalidatemanytypesofmeasurements,suchasglobalfires.CouplingHPCandcloudcomputingcancreateabest-of-breedcomputingserviceenvironment.HECC’spathtogrowthisconstrainedatpresent;NASAhasmaxedouttheinfrastructureintermsoffacilities,building,water,andelectricity,andisengagedinastudyonhowtobuildnext-generationdatacenters.Drs.Holmes,Walker,andHurlburtexpressedconcernsaboutuserconstraints,giventhat70-80%oftheprogram’sworkloadrequiresatightlycoupledprocess.Dr.LeeagreedtowriteastatementonthisstateofbeingforusebytheBDTF.Headdedthatcertaintypesofworkloadscouldbecloud-computed,andNASAisexploringthoseoptionsaswell.Dr.ClaytonTinoaskedifDr.Leehadanysenseofthecapacitytheprogramwaslosingduetomixedmodeservices.Dr.LeerepliedthatNASAwasdoingthemixedworkloadbecauseofthedemand.Someoftheprojectsdidn’tplanfortheirHPCuse,andneedtodoabetterjobofsuchplanninginthefuture.AstrophysicsandBigDataDr.PaulHertz,DirectoroftheAstrophysicsDivision(APD)presentedBigDataneedsasviewedbytheAstrophysicscommunity.Astrophysicsaddressestheevolutionofthe
![Page 13: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0890867e708231d422a2d5/html5/thumbnails/13.jpg)
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
13
universe,theoriginofgalaxiesandstarsandthequestionofwhetherwearealoneintheuniverse.TheAPDisdrivenbytheDecadalSurveys,scienceroadmaps,andimplementationplanstosupportitsabilitytohandlelargedataquestions.Sixtypercentofthebudgetsupportsdevelopingspacemissions,20%operations,another5-10%isdedicatedtoresearchanddevelopment.Dataarchivesarefundedasaninfrastructureinvestment.APD’scurrentsuiteofmissionsrunfrommanysmallmissionssuchasNeutronstarInteriorCompositionExplorer(NICER),tothelargespacetelescopes,HubbleandthefutureJamesWebbSpaceTelescope(JWST).ThenextlargeflagshipafterJWSTisWide-FieldInfraredSurveyTelescope(WFIRST),whoseprimescienceistounderstanddarkenergyanddarkmatter,whichcanonlybedonebymeasuringthesmallimpacttheseforceshavehadinthehistoryoftheuniverse,bylookingatlargeswathsofuniverse;i.e.lookingatlargeamountsofdatatoseesmallperturbations.ThusWFIRSTwillbecomputationallyintensive.WFIRSTwillbelookingatmillionsofgalaxies,searchingforevidenceofmicrolensing,whichisalsocomputationallyintensive.Euclid,aEuropeanmissionwithsimilaritiestoWFIRST,willalsocreatelargedatasets.Anotherfutureground-basedobservatoryistheLargeSynopticSurveyTelescope(LSST).Allthreeoftheseprojectswillbecombiningtheirdatainpixel-by-pixelanalysis.Thevariousagenciesarestudyingthebestwayofcarryingoutthisdataprocessing,adecadeinadvanceoftheneed.Awhitepaperonthistopiccanbefoundat[[arxiv.org/abs/1501.07897]];Jainetal;TheWholeisGreaterThantheSumoftheParts.AllNASAAstrophysicssciencedataareopentothecommunity,andalldatacentersgothroughtheSeniorReviewprocesseverytwoyears.Allastrophysicsarchivesshareasetofcommonprotocolsandstandards,allowingtheusercommunitytocombinedatafrommultiplegroundandspaceobservatories.TheNASAAstrophysicsVirtualObservatory(NAVO)managestheprotocols,whileNSFfundsthetools.ThethreeAstrophysicsarchivesmanagetheNAVObackbone.APDrecentlyheldaSeniorReviewofthearchives,andrecommendedthattheybecomemoreproactiveandaggressiveaboutevolvingintothefuture(increasingbandwidth,keepingupwithtechnologicaladvances,preparingforlargevolumesofdata).Sometypesofcomputingmightbemoreexpensiveinthecloud,anditmustbedeterminedwhicharewhich.NASAandNSFarecurrentlyfundingtheoreticalandcomputationalAstrophysicsnetworks(TCAN).Dr.HertzwasnotawareofanyissuesthusfarongettingtimeonNSFsupercomputers.(Dr.LeenotedthatNASAcivilservantscan’ttypicallygetonNSFsupercomputers,butuniversityPrincipalInvestigatorscan.)AnothercomputationallyintensiveareaislaboratoryAstrophysics:interpretingx-raysfromChandra,farinfrareddatafromHerschel,andvisible-to-ultravioletHubblespectrallines.Theseatomiclinecalculationsareneededforcreatinglinecatalogues.Dr.TinoaskedifunderestimationofcomputingtimewereathemeinAPD.Dr.HertzexplainedthatprocessingKeplerdatahasbeenmorecomputationallyintensivethanwasappreciatedatthebeginningofthemission,butthatanewmission,TransitingExoplanetSurveySatellite(TESS),whichhasasimilardataproducttoKepler,hadplannedaccordinglytoLessonsLearnedontheneedforanticipatingcomputingtime.Dr.LeenotedthatNASAisalsomakingtighterconnectionsbetweenHPCandthebudget-planningprocess.Intermsof
![Page 14: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0890867e708231d422a2d5/html5/thumbnails/14.jpg)
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
14
recommendations,Dr.HertznotedthatAstrophysicswasaminorityuserofHPC,andwasinterestedinareaswhereitcouldleverageexistingassets,orincommercialorotherresearchthatcanimproveAstrophysicsscience.APDhaspartneredwithDOEinthepast,whentheyareinterestedinthescienceproblem.DOEisnotinterestedinexoplanets,butitisinterestedindarkenergyanddarkmatter,thereforeAPDwillbeworkingwiththemonjointWFIRST-Euclid-LSSTanalysis.PubliccommentperiodNocommentswerenotedfromtheonlineaudience.AtNASAHeadquarters,TrippCorbettmadesomecommentsfromthevendorperspective,sayingthathewasnotingabitofdisconnect,astoolsareavailableatNSSCthatshouldbemorewidelycirculated.AtarecentNASAmeeting,hehadheardabriefingonworkingwiththecloud-computingcommunityinabudget-consciousway,andagreedtosendmorespecific.informationtotheBDTF.OtherFederalBigDataInitiatives(NSF)TheNSFBigDataHubsProgramdirector,Dr.FenZhao,briefedtheBDTFbyphoneonherprogram,whichisfundedatabout$20Myear.TherearerelatedprogramsatNSFthatlookatBigDatainfrastructure,pilotandimplementationefforts,andEducation-relatedactivitiessuchastheBigDataWorkForce($30Mayearlookingattraineeships).TheBigDataHubsprogramlooksatthecomplexrelationshipsbetweendataprojects,endusers,andcommercialentities,andinvolvescross-disciplinaryeffortsanddatasharingacrosstheresearchecosystem.TheinspirationforBDHubscamefromOSTP’s2012BigDataInitiative,inwhichaBigDataPartnershipsWorkshopinitiativeresultedin29newpartnerships,with90organizationsparticipating,representingareassuchasenergy,healthcare,andfinance.Theinitiativechosevariousissuessuchasclimatechangeandpersonalizedhealthcare,andNSFinitiatedtheBDHubsefforttoallowthesepartnershipstogel.BDHubswaslaunchedinMarch2015,withfourhubsinfourregionsoftheUS,andmadeawardsinSeptember2015(ColumbiaUniversityintheNortheast,GeorgiaTechandxintheSouth,UIUCintheMidwest,andUniversityofSD,UCBerkeley,andtheUniversityofWashingtonintheWest).Hubsaredifferentlyconstructedconsortia;thecurrentphaseisallowinghubstostartuptheiractivities.TheprojectsarecalledBDSpokes,whichrepresentspecificactivitywithineachtopicalarea,suchasaplatformforsharingneurosciencedata.Thespokesarefundedat$1Moverthreeyears,andaremeanttoleverageexistingefforts.TheHubsarecurrentlyorganizingdraftsforeachspoke,andfullproposalsareduethismonth.Alargenumberofideascameinonsmartcities,andInternetofThings;thefood/energy/waternexus;andhumanhealthcare.NSFintendstofundtheseproposalsthisfiscalyear,andtherearelatentprojectswaitinginthewingsthatcanhelptransitionsomeoftheseideastopractice.NSFhopestodothisagainnextyear.Dr.HolmesofferedkudostoNSFforsettingupthisopen-endedeffort.Dr.Zhaonotedthatthereisanendgoalofsorts,aseachHubisresponsibleforgenerating29projectsattheendofthreeyears.ThisideaisnotcompletelynovelatNSF.TheFoundationhopetofundeachspokeforasecondthreeyears,tohavethembecomeself-sustaining.AsimilareffortwasundertakenunderUS-Ignite,tosupportnetworking.The
![Page 15: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0890867e708231d422a2d5/html5/thumbnails/15.jpg)
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
15
ideaistolookfortheunknowns,asinterestingthingscanhappenintheselarge,multiplecollaborations.Everyonebringstheirownphysicalinfrastructure,andalsotriestoidentifyserviceproviders.Dr.HolmesnotedthatmostoftheHubsweregeographicallyclosetoNASAPIs.Drs.HolmesandZhaoagreedthataclosercollaborationwouldbeideal.PlanetaryScienceBigDataDr.MichaelNew,ProgramScientistforthePlanetaryDataSystem(PDS),presentedtheneedsofBigDatafromtheplanetaryperspective.MostplanetarydataworkisbasedatGSFC.PlanetaryScienceDivision(PSD)datapoliciesstatethatallsciencedatareturnedfromplanetarymissionsbelongstothepublicdomain.Anyexclusivedataaccesscannotexceedsixmonths.Infundedscienceresearch,anydatanecessarytoreplicatepublishedresearchresults,thatarealsotheproductofaNASAaward,mustbemadeimmediatelyavailabletothepublic.TheplanetarydataenvironmentincludesPDS,thePlanetaryCartographyProgram(PCP;USGS),MinorPlanetsCenter(MPC;Harvard)andtheAstromaterialsCurationFacility(ACF;JohnsonSpaceCenter).Datarangesfromground-basedassets,individualinvestigators,mapping,dataanalysis(e.g.,trajectories),samplereturns,ANSMET(Antarcticmeteorites),toatmosphericdust.TheoutputofthePDSisprimarilytotaxpayers,educatorsandtalentedamateurs.AttheACF,NASAstoresspace-exposedhardware,lunarsamples,cosmicdustsamples,andHayabusa(comet)samples.NASAiscurrentlyre-engineeringitssamplecataloguetomakethesesamplesavailableonline.TheMPCisresponsibleforsmallbodies,andtheorbitsofminorplanetsandcomets.ThePCPmaintainsthecartographiccapabilityformappingtheplanetsandtheMoon,anddevelopsandmaintainstheIntegratedSystemforImagersandSpectrometers(ISIS),whichenablesthingslikespectrographicmapsofIo.ISISispreparingtoincorporateanopen-sourcevisualizationtool,theSPICE-basedCosmographia.(“SPICE”isaNASAinformationsystemanditsuseextendsfrommissionconceptthroughpost-missiondataanalysis,andithelpstocorrelateindividualinstrumentdatasetswiththosefromotherinstrumentsonthesameoronotherspacecraft.)PDSisafederatedarchive,withdatadistributedacrossthecountry;itsdisciplinenodeswererecentlyre-competed.Managementofthesystemasawholeisalsobasedonafederatedmodel.PlanetarydataaremanagedbyplanetarySMEs.Dataisphysicallystoredatthenodes,andthedeeparchiveismaintainedattheNASASpaceScienceDataCoordinatedArchive(NSSDCA).TheNavigationandAncillaryInformationFacility(NAIF)implementsstandardsandtoolsthatareneededtounderstandthemotionofcelestialobjects.Inplanetarydatasets,everythingismovingrelativetoeverythingelse:spacecraft,instrument,Earth,andSun,allofwhichneedtimeconversionstandards.ThecollectionofthesevariablesiscalledObservationGeometry(OG).ThecurrentPDSisdistributedacrosssixnodes,whichafterarecentcompetitionarenowintheirfirstyearofa5-yearCooperativeAgreement.ThePIsateachnodecollectivelyformamanagementcouncil,andprovideinputaboutstandardsanddecision-making.PDS-4hasjustrecentlybeenrolledout.ItisanXML-based,model-driven,service-orientedmodel,andamoderntechnicalfoundationforplanetarysciencedata.ExistingPDS-3
![Page 16: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0890867e708231d422a2d5/html5/thumbnails/16.jpg)
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
16
productswillbeconvertedtoPDS4whenpracticalandsensible.TheEuropeanSpaceAgencyandJAXA’planetarydatasystemsarebothadoptingPDS-4standards.ThetotalvolumeofPDSisabout1PB.Almostallcomputationsareperformedonindividualworkstations.PDShasjuststarteditsnext10-yearroadmap,andwillbeannouncinganopportunitytoself-nominateinearlyMarch.Areasofimprovementtobeaddressedintheroadmaparetoinclude:simplifyingandimprovingthepipeline;improvingsearchcapability;developingmoreusefulmetrics;improvingtoolsforarchivingsmalldatasets;andimprovingarchivepreparationanddocumentation,especiallyfornon-missiondataproviders.Relevantwebsitesare:naif.jpl.nasa.govandpds.nasa.govDr.HurlburtaskedaboutPDSmetrics.Dr.Newadmittedtohavingpoormetricsofusageandusers,andnotedthattheroadmapeffortwouldhelptoidentifythemetricsPDSwants,andtoadaptthesystemtoprovidethem.Dr.BeebecommentedthattheinternationalplanetarydataallianceacceptedSPICEastheirdatatoolattheirlastmeeting,afavorableindicator.Dr.New,whenaskedaboutBigDataneeds,allowedthattherewerenotmanyspecificareasinplanetary,withtheexceptionofmagnetosphericandplasmadata,orwhengeneratingveryhigh-fidelitygravitymodels.Thelunargravitationalmappingmission,GRAIL,iscurrentlyworkingonagravityfieldmodelontheHPC.Hehadn’theardaboutanyissueswithpipelineassociatedwiththeGRAILwork.Dr.NewfelttheBDTFcoulddirectaquestiontotheAgencyastohowitwouldliketohandlethestorageofgrantdata.PSDneedsacleardirectstatementonthisissue,whichneedstobeinformedattheAgencylevelbecauseitwillbearesponsetoanOSTPdirective.Thereare1500granteesinPSD;itwouldtakealabor-intensiveefforttostorealltheirdata.AnotherquestioniswhatkindofdataPDSisexpectedtoarchive.Dr.Holmesnotedthatthedirectiveappliestotheotherdisciplinesaswell,andinstructedDr.Smithtonotethisasanissue.Ameetingparticipantnotedthatthegrantdispositionquestionwasbeingaddressedintheroadmappingtask,entailingacommunity-basedreappraisalofthesubjectoverthenext6-9months.DiscussionDr.HolmesfollowedupbrieflywithDr.LeeonHPC,andaskedwhatvisibilityexistedfortheprogram,andwhatthechancesforcollaborationwithDOEExascalemightbe.Dr.leeidentifiedhimselfasChairoftheHigh-EndComputingInteragencyWorkingGroup(HECIWG),butnotedthattheExascalecomputingfacilityisunderNationalStrategicComputingInitiative,adifferentgovernance.TheHECIWGismeetingmonthlyatthemoment,andDr.Leefelthecouldstartvectoringthediscussionintheirdirection.HenotedthatDOEsetsupaprocessforeligibility;ataskneedstohaveacertainprofile,andxnumberofcores.ThegateforeligibilitytogetontheDOE’sleadershipcomputingsystems,however,ishigherthanNASA’sentiresystem.NASAisfarbehindNSFandDOEinthesupercomputingarena.NASA’sleadingsystemislessthan5Tflops.Dr.HolmesconsideredthatBDTFmakeafindingonthematter,asNASAisworkingonprojectsofnationalsignificance.Dr.TinoaskedifExascalewasspecificallydesignedtosolveDOEproblems,withspecificallyimplementedarchitecture.Dr.LeereportedthatDOEhasaco-designconcept,andtheybringinanapplicationthatworksontheexascalesystem.
![Page 17: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0890867e708231d422a2d5/html5/thumbnails/17.jpg)
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
17
Theyareconsideringclimate-changeasaco-designedsystem.DOEdoesn’thavetheinteroperabilityrequirement.Dr.WalkercommentedthatDOEhasspecificproblems,whileNASAismorebroad.Dr.HolmesnotedthatDOEisaddressingbothastronomyandclimate,andthatwhilesomeofthescalesaredifferent,thephysicsaresimilar.Dr.TinofeltthatNASAshouldeitherfocusonproductsandservices,oracceptgenerality.Dr.HolmessuggestedNASAmanagersaddressutilizationmodelsatfuturemeetings.Dr.KinteraskedaboutwhatHPCwoulduseBigIronforafteritsnominal3yearsofoperation..LeesaidthatNASAplanstorepurposeBigIronafter3years,backintoageneralizedcluster.NASAisstilllimitedbyfacilitiesre:powerandcooling.Dr.HolmesaskedDrs.TinoandKintertowriteatalkingpointonthefacilitiesissue.BDTFmembersraisedsomegeneraltopicsforfurtherexploration.Dr.Tinonotedthateachofthepresentershadadoptedsomeformofstandard,illustratingthatpeoplerecognizethatstandardsdomatter.Fromamanagementstandpoint,however,thesubdisciplineshadinconsistentmetricsonusers,andquestionedwhyarchiveshadtobemaintained,intheabsenceofusage.Dr.Walkerexplainedthatsomedatahaveextremelylonglives;everytimewegetanewmissiontoJupiter,forinstance,VoyagerandPioneerdatasetsareindemandagain.It’scriticalthatsomeofthesedatasetsbesafeguarded.Dr.HolmesnotedthattheSeniorReviewmightbeavehiclefordeterminingwhichdatashouldbekept.Dr.Hurlburtsuggestedusermetricsinformthesesortsofjudgments.Dr.Tinofeltusersurveyswerenotalwayseffective,andthatmetricsonactualusewouldbemoreusefulingettingsmartonwhatdatatostore.Dr.HolmesaskedDr.Tinoetal.tofleshthisoutthoughtanddomoreresearchinadvanceofthenextmeeting.Dr.Beebeaddedthatonealsoneedstoconsidertheintrinsicsizesofcommunitiesandtheirstability;theyalsotendtomovearoundwhenmajormissionsarise.Dr.HolmeswassurprisedatthelackofaclearvisionforthefutureandaskedDr.Hurlburttowriteafindingonthistopic.Dr.HolmesaskedDr.SmithtosoundouttheScienceMissionDirectoratetodeterminethelevelofconcernovergrantdatastorage.Dr.Beebereportedthatitwasamajorconcernthathasalreadyreachedthetopleveloftheadministration,whichhadestablishedworkshopsforpeoplepreparingforfederalgrants.Dr.HolmesgaveanactiontoDr.SmithtoclarifyDr.Murphy’sstatementontheuseofopensourcesoftware,andaskedBDTFmemberstoexaminetheNSFnodesoftheBDHubeffort,todeterminehowclosetheyaretoco-locatedNASAPIs.Dr.HolmesaskedthatthenextBDTFmeetingtakeplaceatGSFCfor2.5daysintheApril-Maytimeperiod,andtoperhapsconsiderasitevisittoARCinthefuture,toincludesomeinteractionwithSiliconValley.Dr.SmithreportedthatshewouldbeworkingonanextensionoftheTOR,off-line.Dr.Holmesadjournedthemeetingat4:59pm.
![Page 18: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0890867e708231d422a2d5/html5/thumbnails/18.jpg)
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
18
AppendixAAttendees
AdHocBigDataTaskForceMembersCharlesP.Holmes,Chair,BigDataTaskForceRetaBeebe,NewMexicoStateUniversity(viatelecon/Webex)NealHurlburt,LockheedMartinJamesL.Kinter,GeorgeMasonUniversity(viatelecon/Webex)ClaytonTino,Virtustream,Inc.RayWalker,UniversityofCaliforniaatLosAngelesErinSmith,ExecutiveSecretary,NASAHQNASAAttendeesLouisBarbieri,NASADanCrichton,NASAJPLElaineDenning,NASAHQDeborahDiaz,OCIONASAJohnEvans,NASAT.JensFeeley,NASAHQNavidGolpayegani,NASAJeffreyHayes,NASAHQPaulHertz,NASAHQTsengdarLee,NASAHQEdwardMasuoka,NASADuaneMcMahon,NASATomMorgan,NASAHQKevinMurphy,NASAHQMichaelNew,NASAHQHerbertSchilling,NASAGrifSchilly,NASAJohnSprague,NASAOCIOElizabethYoseph,NASANon-NASAAttendeesJosephBredenkamp,NASAretiredTerryBlankenship,BoozAllenHamiltonJungByun,BoozAllenHamiltonChiehsanCheng,GlobalScienceandTechnologyTrippCorbett,ESRIJosephDohry,BoozAllenHamiltonAlexDuner,MedillNews,Inc.
![Page 19: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0890867e708231d422a2d5/html5/thumbnails/19.jpg)
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
19
GraceHu,OMBEricFeigelson,PennStateUniversityRobertKohon,NovettaBradleyPeterson,OSU,Chair,NACScienceCommitteeAmyReis,Ingenicomm,Inc.AlyssaRetski,Lobbyit.comMarciaSmith,SpacePolicyOnlineConnieSpittler,GlobalScienceandTechnologyGeordanTilley,MedillNews,Inc.JoanZimmermann,Ingenicomm,Inc.
![Page 20: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0890867e708231d422a2d5/html5/thumbnails/20.jpg)
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
20
AppendixBMembership
Dr. Charles P. Holmes, Chair NASA HQ (Retired) Dr. Reta F. Beebe New Mexico State University Dr. Neal E. Hurlburt Lockheed Martin Space Systems Company Dr. James L. Kinter George Mason University Dr. Clayton P. Tino Virtustream Incorporated Dr. Raymond J. Walker University of California, Los Angeles Dr. Erin Smith, Executive Secretary NASA Headquarters
![Page 21: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0890867e708231d422a2d5/html5/thumbnails/21.jpg)
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
21
AppendixCPresentations
1. BigDataTaskForceCharter/SubcommitteeFeedback;ErinSmith2. LegacyfortheNACInformationTechnologyInfrastructureCommittee;Charles
Holmes3. HeliophysicsDivisionBigDataNeeds;JeffreyHayes4. BigDataandEarthScience;KevinMurphy5. SupercomputingandBigDataatNASA;TsengdarLee6. AstrophysicsDivisionBigDataNeeds;PaulHertz7. OtherFederalBigDataInitiatives(NSF);FenZhao8. PlanetaryScienceBigDataNeeds;MichaelNew
![Page 22: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0890867e708231d422a2d5/html5/thumbnails/22.jpg)
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
22
Appendix D Agenda
Ad Hoc Big Data Task Force
of the NASA Advisory Council Science Committee
Inaugural Meeting February 16, 2016
NASA Headquarters
Glennan Conference Room, 1Q39
Agenda (Eastern Standard Time)
Tuesday, February 16 8:00 – 8:30 Opening Remarks / Introduction of Members Dr. Erin Smith
Dr. Charles Holmes
8:30 – 9:15 Big Data Task Force Charter / Subcommittee Feedback Dr. Erin Smith 9:15 – 9:30 BREAK 9:30 – 10:15 Legacy from NAC IT Infrastructure Committee Dr. Charles Holmes
10:15 – 10:30 Discussion 10:30 – 10:45 BREAK 10:45 – 11:15 Planetary Science Big Data Dr. Michael New 11:15 – 11:45 Heliophysics Big Data Dr. Jeffrey Hayes 11:45 – 12:45 LUNCH 12:45 – 1:00 Greetings from the Science Committee Dr. Bradley Peterson 1:00 – 1:30 Earth Science Big Data Dr. Kevin Murphy 1:30 – 2:00 Supercomputing Big Data Dr. Tsengdar Lee
![Page 23: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration](https://reader036.vdocuments.us/reader036/viewer/2022081611/5f0890867e708231d422a2d5/html5/thumbnails/23.jpg)
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
23
2:00 – 2:30 Astrophysics Big Data Dr. Paul Hertz 2:30 – 2:45 Public Comment 2:45 – 3:00 Other Federal Big Data Initiatives (NSF) Dr. Fen Zhao
3:00 – 3:10 BREAK 3:10 – 3:30 Work Plan and Future Meetings 3:30 – 5:00 Discussion / Findings / Recommendations 5:00 ADJOURN Dial-In and WebEx Information
For entire meeting February 16, 2016 Dial-In(audio):DialtheUSAtoll-freeconferencecallnumber1-800-988-9663ortollnumber1-517-308-9427andthenenterthenumericparticipantpasscode:4718658.Youmustuseatouch-tonephonetoparticipateinthismeeting.WebEx(viewpresentationsonline):Theweblinkishttps://nasa.webex.com,themeetingnumberis999765122,andthepasswordisBigD@T@16.
* All times are Eastern Standard Time *