application of provenance for automated and research driven workflows tara gibson june 17, 2008
Post on 02-Jan-2016
220 Views
Preview:
TRANSCRIPT
Application of Provenance for Automated and Research
Driven Workflows
Tara Gibson
June 17, 2008
Motivation
Identify provenance models and architectures that will support a variety of real world scientific researchPromote collaboration and interoperability
Review requirements identified by the community Identify new requirements from our own use case studies that span a number of domains
Methods
Use case studies
Encountered two types of workflow
Automated (eg. Pipelines)
User-Driven, research oriented (eg. Digital Libraries, Data Lineage)
Use case type comparison
Automated User-driven Sequence of processes and data to accomplish a given task
Enables collaboration by saving context and details
Driven by workflow engine Directed by researcher
Follows predefined pattern Ad-hoc, no set pattern
Pre-determined completion strategy
Completion determined by researcher
Sensor Analysis
SOA based runtime intrusion detection system to prevent attacks on sensitive systems.
Large scale data streaming (~30TB per day)
Too much provenance, system would be quickly overwhelmed, record only significant events
Subsurface Modelling
Understand how contaminants react and move through environments by simulating experiments that would not be feasible otherwiseResearch often follows many branches of investigation with complex relationships between simulations.
Alt Parameters
Alt MaterialGeometry
Alt Parameters
Variable Flow
Alt MaterialGeometry
Alt Parameters
Alt InclusionGeom
AltMaterial
Geometry
HeterogeneousFlow
HomogeneousFlow
Alt Parameters
Alt MaterialGeometry
Alt Parameters
Variable Flow
Alt MaterialGeometry
Alt Parameters
Alt InclusionGeom
AltMaterial
Geometry
HeterogeneousFlow
HomogeneousFlow
Archive, Data MiningDocument data context and relationships to improve effectiveness of facility
Use of data extraction and harvesting to capture provenance and meta-data
Track relationships between experiments and computations
Allows for better collaboration and understanding
Requirements Summary
Record provenance about process, data, relationshipsGroup items together for comparisonRecord arbitrary meta-dataStandards-based search capabilityExamine process and data that led to result Identify the overall impact on a workflow due to changes in process/data
Influences on Architecture
Requirements Influences Record only the provenance from significant events and the processes and data that led to the identification of the event
Transaction based event recording
Record provenance of high throughput pipelines with minimal impact on performance
Asynchronous messaging (JMS)
Extract and record customized file metadata for context searching
Meta-data extraction/harvesting
Query for derivation graph, filtering on level of detail
Create views based on level of detail
Challenges
Multiple language bindingsInformation overloadScalability
Should scale to billions of triples
Augmentation – user annotationFilteringUser/Application specific views
Questions...
Email: Tara.Gibson@pnl.gov
top related