Research Design forCollaborative Computational Approaches
andScientific Workflows
Deana Pennington
January 8, 2007
Informatics and Informatics and the Research Cyclethe Research Cycle
MentalModel
ResearchDesign
Publish
Data-intensiveData mining
Bio-inspired algs.Exp. Data Analysis
Visualization
Compute-intensive
Parallel processingHigh throughput
Knowledge-intensive
Human cognitionOntologies
Sem. mediation
CollectData
Inductive, DescriptiveStatistics
Deductive, PrescriptiveMechanistic
ConductAnalyses
Scientific WorkflowSystem
•Automation => replication•Access to distributed resources•Reusability & sharing•Empowered by knowledge-intensive approaches***
DataManagementData models
MetadataStorage
Cyberinfrastructure: Sharing data, analyses, mental models
Scientific WorkflowsScientific Workflows
• Scientists do their analyses now by:Scientists do their analyses now by:– Focus on data collection and the analytical stepsFocus on data collection and the analytical steps– Manually coordinate export and import of data among Manually coordinate export and import of data among
software systemssoftware systems
• Workflow systems collaborate with the scientist to:Workflow systems collaborate with the scientist to:– Discover existing dataDiscover existing data– Handle data flow between componentsHandle data flow between components– Document the analytical processDocument the analytical process
Query EcoGrid to find data
Archive output to EcoGrid with workflow
metadata
– Not linearNot linear– Involve multiple data setsInvolve multiple data sets– Involve multiple analytical stepsInvolve multiple analytical steps
Automated WorkflowsAutomated Workflows
• ScriptsScripts Single platformSingle platform• Visual modelingVisual modeling Single environment Single environment
environmentenvironment
• Workflows: Workflows: – Cross-platformCross-platform– Cross-environmentCross-environment– Distributed data & analysesDistributed data & analyses
Productivity Productivity ExampleExample
Mental ModelMental Model Biomass Temp Soil Et al.== f (
C Concept
Climate Temp
Soil
Biomass
Merge Model Predict
Conceptual WorkflowConceptual Workflow
AS AS ASAS
TS
TSTSDS
DSDS
DS
DS TS
TSTS
Transformation Step
DessiminationDS
Executable WorkflowExecutable Workflow
AS Analysis Step
Data StepDS
AS AS ASASDS
DS
DS
Abstract WorkflowAbstract Workflow
“View1”: Excel GIS SAS GIS“View2”: VBScript R Script GA R
Scientists design their Scientists design their research at the research at the conceptual workflow levelconceptual workflow level
•Often done on the fly over the period of time the research is being conducted
•For automated approaches, this must be well thought out from the beginning
•HOWEVER, because of the automation it is easy to modify the analysis and rerun it many times, so you are not locked into the original design
BenefitsBenefits
•Reusable analysis steps, pipelines, and workflows•Formal documentation of methods
(output in report format)•Reproducibility of methods•Visual creation and communication of methods•Versioning•Automated data typing and transformation
Nested workflowsNested workflows
ASx TS1 ASy ASz ASrTS2
Search forrelevant
dataand
analyses(Query)
SW0
Image Processing
Pipeline
SignalProcessing
Pipeline
ASrTS2
FieldData
GroundSensors
Imagery
Semantically-integrated
Ecological niche modeling Ecological niche modeling conceptual workflowconceptual workflow
Training sample
GARPrule set
Test sample
Species pres. & abs.
points
EcoGridQuery
EcoGridQuery
LayerIntegration
SampleData
+A3+A2
+A1
DataCalculation
MapGeneration
Validation
User
Model qualityparameters
Native range prediction map
Env. layers
GenerateMetadata
ArchiveTo Ecogrid
Selectedprediction
maps
Transformation
Scaling
EcoGridDataBase
EcoGridDataBase
EcoGridDataBase
EcoGridDataBase
Integrated layers
Integrated layers
GARPrule set
Species pres. & abs.
points
Ecological niche modeling Ecological niche modeling conceptual workflowconceptual workflow
Training sample
GARPrule set
Test sample
Species pres. & abs.
points
EcoGridQuery
EcoGridQuery
LayerIntegration
SampleData
+A3+A2
+A1
DataCalculation
MapGeneration
Validation
User
Model qualityparameters
Native range prediction map
Env. layers
GenerateMetadata
ArchiveTo Ecogrid
Selectedprediction
maps
Transformation
Scaling
EcoGridDataBase
EcoGridDataBase
EcoGridDataBase
EcoGridDataBase
Integrated layers
Integrated layers
GARPrule set
Species pres. & abs.
points
Spatial locationTemporal extent
Generic WorkflowGeneric Workflow
Training sample
GARP rule set
Test sample
OccurrenceData
Binary, Categorical or Numeric
EcoGridQuery
EcoGridQuery
LayerIntegration
SampleData
+A3+A2
+A1
DataCalculation
MapGeneration
Validation
User
Model qualityparameters
Prediction map
Environmental
layers
GenerateMetadata
ArchiveTo Ecogrid
Selectedprediction
maps
PhysicalTransformatio
n
Scaling
EcoGridDataBase
EcoGridDataBase
EcoGridDataBase
EcoGridDataBase
Integrated layers
Integrated layers
GARPrule set
Temperature Interpolation Temperature Interpolation WorkflowWorkflow
Training sample
GARPrule set
Test sample
Weather stationtemperature
data
EcoGridQuery
EcoGridQuery
LayerIntegration
SampleData
+A3+A2
+A1
DataCalculation
MapGeneration
Validation
User
Model qualityparameters
Prediction map:
Interpolated temperature
grid
Environmental
layers:elevation, aspect,
land cover
GenerateMetadata
ArchiveTo Ecogrid
Selectedprediction
maps
PhysicalTransformatio
n
Scaling
EcoGridDataBase
EcoGridDataBase
EcoGridDataBase
EcoGridDataBase
Integrated layers
Integrated layers
GARPrule set
Temperature Interpolation Temperature Interpolation WorkflowWorkflow
Training sample
GARPrule set
Test sample
Sinkholeoccurrence
EcoGridQuery
EcoGridQuery
LayerIntegration
SampleData
+A3+A2
+A1
DataCalculation
MapGeneration
Validation
User
Model qualityparameters
Prediction map:
Sinkholedistribution
Environmental
layers:Groundwate
r level, chemistry,
etc
GenerateMetadata
ArchiveTo Ecogrid
Selectedprediction
maps
PhysicalTransformatio
n
Scaling
EcoGridDataBase
EcoGridDataBase
EcoGridDataBase
EcoGridDataBase
Integrated layers
Integrated layers
GARPrule set
ExerciseExercise1. Divide into groups of 4 (or so) with similar research interests2. Pick a research topic to collaborate on3. Construct a workflow diagram for an analysis that could be
conducted4. Discuss how it could be reused for other related or unrelated
analyses