software and data as scaffolds for integrative science

79
Software and Data as Scaffolds for Integrative Science David LeBauer, Ph.D. University of Illinois at Urbana-Champaign Department of Agricultural and Biological Engineering Carl R Woese Institute for Genomic Biology National Center for Supercomputing Applications 1

Upload: david-lebauer

Post on 29-Jan-2018

63 views

Category:

Science


4 download

TRANSCRIPT

Page 1: Software and data as scaffolds for integrative science

SoftwareandDataasScaffoldsforIntegrativeScience

DavidLeBauer,Ph.D.UniversityofIllinoisatUrbana-Champaign

DepartmentofAgriculturalandBiologicalEngineeringCarlRWoeseInstituteforGenomicBiology

NationalCenterforSupercomputingApplications

1

Page 2: Software and data as scaffolds for integrative science

Outline

• Overview:ProblemsandApproach

• CombiningInformation

• Models:integrationacrossdomains

• PEcAn:integrationofmodelsanddata

• TERRAREF:automateddatacollectionandanalysis

• FutureDirections

2

Page 3: Software and data as scaffolds for integrative science

Challengesweface

• AgriculturalProduction:• Feeding9bnby2050• Climateischanging

• Resourcesarebecomingscarce

• ScientificProblems:

• Howdogenescontroltraits?• Howcanleveragedataandcomputing?

3Tilmanetal,Nature2002

Yield

Fertilizer

Pesticides

Page 4: Software and data as scaffolds for integrative science

TechnicalSolutionsforScienceandAgriculture

• KnowledgeisSpreadAcrossManyScalesandFormats:• ExpertKnowledge• Data• MechanisticModels

• Integratingthesewillenable:• StrongerInferenceandPrediction• MoreScienceandEngineering

4

Marshall-Colon et al 2017 Frontiers in Plant Science

Page 5: Software and data as scaffolds for integrative science

102m

10-3m

103m

104m

105m Whichcropsareviable,…andwhere?

Whatfractionofglobalenergy/fooddemand?

CountylevelmeanyieldsSupplychainoptimization

Localtopography:soil,hydrologySub-fieldmanagement

CropArchitectureRowSpacing/OrientationHarvestingEquipmentShadingresponse

SpatialScale Questions

OpportunitiesAcrossScales

5

Page 6: Software and data as scaffolds for integrative science

Outline

• ConceptualOverview• ComputationalSolutions

• CropModels

• PEcAn• TERRAREF

• FutureDirections

Zhu,Lynch,LeBauer,Millar,Stitt,Long,2015PlantCell&Environment

6

Page 7: Software and data as scaffolds for integrative science

EvanDelucia

StartingPoint:ConceptualModels

7

Page 8: Software and data as scaffolds for integrative science

BioCro:CombiningBiology,Physics,Chemistry

HumphriesandLong2005Miguezetal2009,2012Jaiswal,DeSouza,Larsen,LeBauer,…etal2017Wang,Jaiswal,LeBauer,…etal2015

8

InputsMeteorology(energy,water)

Soil(physics,carbon,nutrients)

Parameters(e.g.planttraits)

OutputsYield,Biomass,

EnergyBalance

WaterUse

NutrientUse

Page 9: Software and data as scaffolds for integrative science

ScalingPhotosynthesisfromLeaftoCanopy

Light

Temperature

Light

Light

Photosynthesis

Photosynthesis

Temperature9

Page 10: Software and data as scaffolds for integrative science

ScalingUp&PredictingtheFuture

IPCC AR5 Warszawski et al. PNAS

Temperature Precipitation

ClimateForecasts(2040-2050) CMIP5:5Climatemodelsx4CO

2emissionsScenarios

10

Page 11: Software and data as scaffolds for integrative science

EffectsofClimateonSugarcaneYieldinBrazil2040-2050 Climate Impact

(metric Tons / ha)

Jaiswal, DeSouza, Larsen, LeBauer, Miguez, Sparovek, Bollero, Buckeridge, Long, 2017

Scaling leaf-level CO2 x T x H2O response11

Page 12: Software and data as scaffolds for integrative science

Outline

• ConceptualOverview• ComputationalSolutions

• CropModels

• PEcAn:LinkingModelsandData

• TERRAREF• FutureDirections

12

PEcAn

LeBauer et al, 2013

Ecological Model-data Synthesis

Page 13: Software and data as scaffolds for integrative science

LeBauerandTreseder,2008

13

Thomasetal2013

CombiningDataandModelsIsHard,MostlyaTechnicalChallenge

Page 14: Software and data as scaffolds for integrative science

Traits System States

Prediction

Soil

Meteorology

Parameters

Boundary Conditions

Drivers

Publications

Primary Data

Repositories

Wild Data Relevant Information Configuration

Sensitivity

Calibration

Validation

AnalysesRun Model Outputs

JustRunningaModelisHard

Most of this work is model independent, so solutions can be shared

14

Page 15: Software and data as scaffolds for integrative science

DataSources AnalysesEcosystemModels

BioCroED2CLMSIPNET

...n=12

TheStandardApproach:Redundant,LaborIntensive,ErrorProne

Converter

ForMet,needoneconverterperdriver(m)xmodel(n)combination

Prediction

NARR

NOAA

Fluxnet

CMIP5

… m = 10

Met Station

Calibration

Sensitivity

Validation

Visualization

15

Page 16: Software and data as scaffolds for integrative science

PEcAncommonformats:Manyusersuse,reuse,test,andimprovecomponents

CommonFormat

CommonFormat

EcosystemModels

BioCroED2CLMSIPNET

...n=12

Converter

Onlyneedn+m(notn×m)convertersLesswork,morerobustandvalidresults

Diverse Met Data

NARR

NOAA

Fluxnet

CMIP5

… n = 10

Met Station

Analyses

Prediction

Calibration

Sensitivity

Validation

Visualization

16

Page 17: Software and data as scaffolds for integrative science

ParameterEstimation:CombiningLiteratureandFieldData

LeBaueretal,2013 17

Page 18: Software and data as scaffolds for integrative science

LeBaueretal,2013

Givencurrentdata,whatdrivesuncertainty?3Years,1crop,1location

18

PEcAnVarianceDecomposition

Bars:ParameterContributiontoUncertaintyinYieldPrediction

Grey=PriorBlack=Posterior

Usedtoinformoptimaldatacollection

Page 19: Software and data as scaffolds for integrative science

LeBaueretal,2013

Automation&Reuse:Uncertaintyanalysisbars/color=ParameterContributiontoPredictiveUncertainty

3Years,1crop,1location

19Dietzeetal,2014

~1Year,8scientists,17PFTs,6biomes

Page 20: Software and data as scaffolds for integrative science

TargetedFieldStudy:WillowWaterUse

Wertin,LeBauer,Volk,Leakey,inprep

Predictions

20

Before AfterDataCollection

Page 21: Software and data as scaffolds for integrative science

AddData Configure AnalyzeRun

MakingCrop&EcosystemModelsAccessible

LeBaueretal2013,Kooperetal2013,Dietzeetal,201321

Page 22: Software and data as scaffolds for integrative science

PEcAnisacommunityproject

42Contributors>50citationsTextbook100sofstudentstrained

22

Page 23: Software and data as scaffolds for integrative science

PEcAnRadiativeTransferModelInversion

23Ely,Serbin,Shiklomanov,Dietzeandothers

Page 24: Software and data as scaffolds for integrative science

PEcAnnowprovidesaplaceforsharedmodels,dataaccess,andtools

Tools: Web front end PostGIS database* Met Scaling and Gap filling Data Ingest Meta-Analysis* Sensitivity & Uncertainty Analysis* Ensemble Prediction Parameter Data Assimilation State Data Assimilation Benchmarking Visualization* Data Modeling:

Radiative Transfer Photosynthesis Tree Rings

Models: BioCro* CABLE CLM DALEC ED* FATES G’Day JULES Linkages LPJguess MAAT MAESPA PRELES SIPNET

Data: Literature* Field Measurements Expert Priors* Meteorology Soils PalEON Fluxnet ORNL NEON TERRA REF* LTER …

github.com/pecanproject/pecanpecanproject.org

24

Page 25: Software and data as scaffolds for integrative science

Outline

• ConceptualOverview• ComputationalSolutions

• CropModels

• PEcAn:LinkingModelsandData

• TERRAREF• FutureDirections

25

Page 26: Software and data as scaffolds for integrative science

HighThroughputPhenotyping

• HighThroughputPhenotyping:• Replacemanualwithsensor-basedmeasurements• Measuremoretraitswithhigherfrequency

• But…sensorsareexpensiveanddataaredifficulttointerpret

• Terraprogrammajorinvestmenttopushthisforward

http://bulletin

.ipm.illinois.ed

u/print.ph

p?id=513

26

Page 27: Software and data as scaffolds for integrative science

TERRAREF

• Motivation:

• AutomatedMeasurements—>StrongerInference

• Software&Data—>FrameworkforInterdisciplinaryCollaboration

• Solutions:• ReferenceDatasets• ModularandInteroperable

• OpenData,Software,Computing

27

Page 28: Software and data as scaffolds for integrative science

APhenomicsPipelineforCropImprovement

Sensors Traits Genotypes

Selection

Genomics

Higher Yield Yield Stability Nutrition Stress Tolerance and more …

Automated MeasurementsComponent & Aggregate

Genomic Prediction

Pan Genome

28

Page 29: Software and data as scaffolds for integrative science

DiverseScientificDisciplines

Sensors Traits Genotypes

Selection

Genomics

Engineering Robotics Computer Vision

(Eco)Physiology Agronomy

Biology

Breeding

Statistics & Machine Learning

29

Page 30: Software and data as scaffolds for integrative science

ARPA-ETERRA

OpenDatasetforSixProjects+PublicRelease

30

Page 31: Software and data as scaffolds for integrative science

TERRAReferenceDataSources

LemnatecScanalyzerDanforth,St.Louis

LemnatecFieldScannerUSDAALRC,Maricopa,AZ

TractorandUAVAZandKansasState

31

Page 32: Software and data as scaffolds for integrative science

FieldScannerSensors

terraref.org/articles/lemnatec-scanalyzer-field-sensors/

VNIR Imaging Spectrometer 380-1000nmSWIR Imaging Spectrometer 900-2500 nmIR Temperature SensorNDVI (1 down, 1 up) 650, 800 nmPRI Sensor 531, 570 nmPAR Sensor 410-655 nmColor Sensor 410-655 nm3D Scanners: 2 Side View, 1 DownRGB: 2 Side View, 1 Down (1)Active Reflectance 670, 730, 780 nmPS II Fluorescence Environmental: wind, temperature, humidity, light, rain, CO2

32

Page 33: Software and data as scaffolds for integrative science

Approach:IntegrateSoftwareandDatabases

• Whatdopeoplecurrentlyuse?

• Whatdomainspecificsoftwareanddatabasesexist?

• Howcanweconnectthese?• Whatstandards&conventionstoadopt?

33

Page 34: Software and data as scaffolds for integrative science

GeneralFrameworkforCross-DomainLinks

Sensors Traits Genotypes

Selection

Genomics

LocationTime

Genotype

34

Page 35: Software and data as scaffolds for integrative science

DataFormats,Standards&Conventions

Sensors Traits Genotypes

Selection

CF Conventions OGC

geoTIFF NetCDF-CF LAS

PEcAn Crop Ontology AgMIP/ICASA BRAPI

BAM, FASTQ, VCF, BED, FASTA, GFF

Genomics

35

Page 36: Software and data as scaffolds for integrative science

TERRAREFDatabases

Sensors Traits Genotypes

Selection

Genomics

36

Page 37: Software and data as scaffolds for integrative science

ModularSoftware

github.com/terraref 37

Page 38: Software and data as scaffolds for integrative science

TERRAREFPipeline

Fieldmeasurements

Metadata

TraitData

PipelineOrchestration

SensorData

Analysis&Development

1TB/d

<48h

Genomics

38

Page 39: Software and data as scaffolds for integrative science

DataAnalysisEnvironmentsAnyLinuxConfiguration+LargeFilesystem+ Databases+ Compute

Workflows:Analyze! Share! PublishDevelop! Deploy

workbench.terraref.org39

Page 40: Software and data as scaffolds for integrative science

~/data~/tutorials

40

Page 41: Software and data as scaffolds for integrative science

WebApplicationDevelopedwithNDSWorkbench

traitvis.workbench.terraref.org 41

Page 42: Software and data as scaffolds for integrative science

218mm

RobertPlessZongyangLiSolmazHajmohammadi

3DLaserScanner

42

Page 43: Software and data as scaffolds for integrative science

%Reflectance

10cm

Nscandirection

HyperspectralImageat543nm

x

y

43

Page 44: Software and data as scaffolds for integrative science

Thermal

44

Page 45: Software and data as scaffolds for integrative science

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 45

Page 46: Software and data as scaffolds for integrative science

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 46

Page 47: Software and data as scaffolds for integrative science

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 47

Page 48: Software and data as scaffolds for integrative science

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 48

Page 49: Software and data as scaffolds for integrative science

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 49

Page 50: Software and data as scaffolds for integrative science

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 50

Page 51: Software and data as scaffolds for integrative science

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 51

Page 52: Software and data as scaffolds for integrative science

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 52

Page 53: Software and data as scaffolds for integrative science

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 53

Page 54: Software and data as scaffolds for integrative science

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 54

Page 55: Software and data as scaffolds for integrative science

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 55

Page 56: Software and data as scaffolds for integrative science

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 56

Page 57: Software and data as scaffolds for integrative science

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 57

Page 58: Software and data as scaffolds for integrative science

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 58

Page 59: Software and data as scaffolds for integrative science

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 59

Page 60: Software and data as scaffolds for integrative science

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 60

Page 61: Software and data as scaffolds for integrative science

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 61

Page 62: Software and data as scaffolds for integrative science

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 62

Page 63: Software and data as scaffolds for integrative science

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 63

Page 64: Software and data as scaffolds for integrative science

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 64

Page 65: Software and data as scaffolds for integrative science

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 65

Page 66: Software and data as scaffolds for integrative science

AutomatedDetectionAlgorithms(TimeSeriesofPanicleCounts)

ZongyangLiandRobertPless 66

Page 67: Software and data as scaffolds for integrative science

GeoffMorris&ZhenbinHu,KSULOD(LogorithmofOdds)geneslinkedtotrait

GenesThatControlGrowthRate

67

Page 68: Software and data as scaffolds for integrative science

Getinvolved

• Signupforbetareleaseofsoftwareanddata• terraref.org/data

• Useandprovidefeedbackonsoftwareanddataformats

• github.com/terraref

• Collaborate• Fieldmeasurements

• Software• Algorithms

• ColocatedSensors68

Page 69: Software and data as scaffolds for integrative science

Outline

• ConceptualOverview• ComputationalSolutions

• CropModels

• PEcAn:LinkingModelsandData

• TERRAREF• FutureDirections

69

Page 70: Software and data as scaffolds for integrative science

Baroneetal2017bioRxiv“UnmetNeedsforAnalyzingBiologicalBigData:ASurveyof704NSFPrincipalInvestigators”

SoftwareCarpentryXSEDE.org,SharedClusters

Trainingisthebottleneck

70

Page 71: Software and data as scaffolds for integrative science

Introductiontodatascience,withexamplesandprojectsfromTERRAREF

HackathonsandTraining

71

ArkansasStateUniversityIowaStateUniversityPurdueUniversityUniversityofArizonaUniversityofIllinoisUniversityofNebraskaUniversityofArkansas

Page 72: Software and data as scaffolds for integrative science

Toppetal,unpublished72

Page 73: Software and data as scaffolds for integrative science

SensorModelingandModelCoupling

Toppetal.unpublished73

Page 74: Software and data as scaffolds for integrative science

ModularModelComponents

Zhu,Lynch,LeBauer,Millar,Stitt,Long,2015PlantCell&EnvironmentMarshall-Colonetal2017FronsersinPlantSciencecropsinsilico.org

Eachcomponentrepresents>=1hypothesis.

Eachparameteroroutputcanbetreatedasaphenotype

EnvironmentaldriverscanbeintegratedovertoaddressGxE

74

Page 75: Software and data as scaffolds for integrative science

PurduePhenomics&IoTPlatforms• DevelopCyberinfrastructure

• Makedatauseable

• Facilitateinterdisciplinaryresearch• Assessexistingcapabilities,currentroadblocks,futureneeds• WorkwithLibrary,RCAC,facultytofacilitatedatapublishing

• QA/QC• CommunityStandardsandCommonInterfaces

75

Funding:

NSFAdvancesinBiologicalInfrastructure

USDANIFAFoodandAgricultureCyberinformaticsandTools

Page 76: Software and data as scaffolds for integrative science

AgriculturalTechnology

Onceweunderstandhowthesesystemswork,wecanengineerforecosystemservicesratherthatsolelyforyield:• Climatecontrol• Soilimprovement,carbonstorage• Roots,mycorrhizae,microbiome• Pharmaceuticals• PetrochemicalSubstitutes• …anythingplantscando

NASA Ames Research Center

76

Page 77: Software and data as scaffolds for integrative science

ToddMockler ProjectLeadNadiaShakoor ProjectDirector

NoahFahlgren Phenotyping&BioinformaticsEricaFishel TechnologyTransfer

SolmazHajmohammadi SensorFusion

StephenKresovich BreedingJeremySchmutz Sequencing

GeoffMorris Gene-traitAssociationsWilliamRooney Breeding

PedroAndrade-Sanchez Agronomy&PhenomicsMichaelOttman Physiology

MariaNewcomb FieldMeasurementsJeffWhite Agronomy

DavidLeBauer Informatics&ComputingRobertPless ImageAnalysis

RomanGarnett PredictionAlgorithmsWasitWalamu Sensing&Physiology

MaxBurnetteCraigWillis

RobKooperJeffTerstreip

ZongyangLi

ZhenbinHuNickHeyek

CharlieZenderHenryButowsky

Team

77

Page 78: Software and data as scaffolds for integrative science

• MikeDietze,BostonUniversity

• DavidLeBauer,UniversityofIllinois• ShawnSerbin,BrookhavenNationalLab• AnkurDesai,UniversityofWisconsin

• KentonMcHenry,NationalCenterforSupercomputingApplications

• andmanyotheruser/contributors

78

Page 79: Software and data as scaffolds for integrative science

DavidLeBauer

[email protected]

TERRAREF

terraref.org

github.com/terraref

@terra_ref

PEcAnProject

pecanproject.org

github.com/pecanproject

@pecanproject79