![Page 1: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649cfb5503460f949ccf48/html5/thumbnails/1.jpg)
National Center for Supercomputing Applications
The Way Things Go
e-Science is a complex activity
Scientific knowledge is comprehensible only in the context of those activities
Adopt the Rube Goldberg view
Rube Goldberg
![Page 2: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649cfb5503460f949ccf48/html5/thumbnails/2.jpg)
National Center for Supercomputing Applications
Grand challenge: systems-scale science
Observation and modeling of multiple systems at multiple scales
Linking data and tools from different disciplines
to get a valid global result!
“... modeling complex systems will be a major research challenge for the 21st century”- National Science Foundation
![Page 3: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649cfb5503460f949ccf48/html5/thumbnails/3.jpg)
National Center for Supercomputing Applications
Building current practices up isn't working
Heterogeneous tools, data formats
Little global coordination of research
Little funding for sustained stewardship of tools and data
M.C. Escher, “Tower of Babel” (1928)
![Page 4: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649cfb5503460f949ccf48/html5/thumbnails/4.jpg)
National Center for Supercomputing Applications
Proposed solutions aren't working
e-Journals – not machine-interpretable Collaboration tools
scientists just use email like everyone else Portals and digital libraries – typically:
centralized domain-specific
The Grid – can orchestrate complex processing jobs, but that's not science
![Page 5: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649cfb5503460f949ccf48/html5/thumbnails/5.jpg)
National Center for Supercomputing Applications
Only networks work at scale
Single researcher Ad hoc data mgt,
single-user apps Community
Community tools, resources, control
Global No global practice,
tools, control
Desktop
Workgroup
Network
![Page 6: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649cfb5503460f949ccf48/html5/thumbnails/6.jpg)
National Center for Supercomputing Applications
How do we get there?
e-Science means managing Process, and Data
Current approaches favor one or the other
Information is getting lost
model
refine
observe
predict
data
criticalinterface
![Page 7: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649cfb5503460f949ccf48/html5/thumbnails/7.jpg)
National Center for Supercomputing Applications
Trends: process data
Data Semantics
Batch
Metadata
Interactive
Workflow
* mainframes
* digital libraries
* portals
* ontologies
* provenance
* desktop apps
* formats
* e-notebooks
* the grid
process
data
* rules
![Page 8: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649cfb5503460f949ccf48/html5/thumbnails/8.jpg)
National Center for Supercomputing Applications
Key technologies
Semantic web: data/metadata Provides means of merging descriptive
information even if it only partially agrees (e.g., comes from two different communities)
Workflow: process Describes complex procedures independently
of how they are executed Provenance: process + data/metadata
Links workflow, data, and any ancillary descriptive information (e.g., attribution)
![Page 9: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649cfb5503460f949ccf48/html5/thumbnails/9.jpg)
National Center for Supercomputing Applications
Semantics: data to knowledge
Data
Information
Knowledge
Concrete
Abstract
Aggregation, annotation
Learning, inference
Streams, arrays,swaths, etc.(a.k.a. files)
Collections, tags,attributes, etc.(a.k.a. metadata)
Ontologies, rules,models, etc.(a.k.a. semantics)
(cf Reagan Moore)
![Page 10: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649cfb5503460f949ccf48/html5/thumbnails/10.jpg)
National Center for Supercomputing Applications
Semantic web: RDF triple
Declarative: asserts a fact Subject and object URI's identify arbitrary
entities (things, people, concepts, events) Predicate identifies the relationship
between them
subject objectpredicate
![Page 11: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649cfb5503460f949ccf48/html5/thumbnails/11.jpg)
National Center for Supercomputing Applications
Triples form an open network
Subject nodes aren't “owned” by any single agent or container
Any actor can add arcs to the implicit, total, world graph
Any two graphs can be joined
hasBreed
![Page 12: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649cfb5503460f949ccf48/html5/thumbnails/12.jpg)
National Center for Supercomputing Applications
Non satis non scire(to know is not enough)
Semantic web “layer cake”
Where do we manage process? User interface? Applications?
“Semantic Grid” (D. DeRoure, C. Goble)
(source: World Wide Web Consortium)
![Page 13: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649cfb5503460f949ccf48/html5/thumbnails/13.jpg)
National Center for Supercomputing Applications
Workflow: process description
Describe complex operations as networks of simpler operations
Abstract operation execution from description
Can be shared (but may not be portable)
(Taverna)
(Kepler)
![Page 14: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649cfb5503460f949ccf48/html5/thumbnails/14.jpg)
National Center for Supercomputing Applications
Anatomy of a workflow
Declarative: says what do to
Modules identify arbitrary procedures
Arcs identify flow of control and/or data (data flow is usually implicit)“Module”
Control flow
Execution model (usu. implicit)
![Page 15: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649cfb5503460f949ccf48/html5/thumbnails/15.jpg)
National Center for Supercomputing Applications
Workflow systems
Modules representing units of computation
Language for specifying WF modules control flow
Engine for executing WF
D2K (source: NCSA)
![Page 16: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649cfb5503460f949ccf48/html5/thumbnails/16.jpg)
National Center for Supercomputing Applications
Work vs. workflow systems
Scientists are not WF modules
Science work also involves social organization
incl. funding field and “wet lab”
manual work discourse: review,
validation(source: CNRS/UCSD)
![Page 17: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649cfb5503460f949ccf48/html5/thumbnails/17.jpg)
National Center for Supercomputing Applications
Provenance: what happened
Answers critical questions What led to this
result? When and how
were observations made, conclusions reached?
Is a causal network of events
![Page 18: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649cfb5503460f949ccf48/html5/thumbnails/18.jpg)
National Center for Supercomputing Applications
Complementary incomplete notions of provenance
Artifact-centric (e.g., digital libraries) “lineage”= events
in lifecycle of artifact e.g., custody
IR's focus on curation events (not antecedent processes)
Process-centric (e.g., workflow) computational
events (e.g., service invocations)
control flow artifacts are either
not mentioned or opaque (tool-specific)
![Page 19: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649cfb5503460f949ccf48/html5/thumbnails/19.jpg)
National Center for Supercomputing Applications
Provenance Challenges 1 & 2
IPAW 2006, HPDC 2007
20 teams, 1 workflow, 9 queries major players
Interoperability? lots of manual work
required call for standards
(source: gridprovenance.org)
![Page 20: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649cfb5503460f949ccf48/html5/thumbnails/20.jpg)
National Center for Supercomputing Applications
Artifact + process provenance = “open provenance”
Can describe any process, not just WF execution (e.g., science!)
Allows alternate accounts by different observers
Rules for inferring transitive causal relationships
(source: Luc Moreau et al)
![Page 21: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649cfb5503460f949ccf48/html5/thumbnails/21.jpg)
National Center for Supercomputing Applications
Open Provenance Model
3 node types – artifact, process, agent 5 arc types – used, generated, triggered,
derived, controlled – and inference rules Generic – extensibility via annotation Choice of granularity and focus (e.g.,
artifact or process-centric)
(source: Luc Moreau et al)
![Page 22: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649cfb5503460f949ccf48/html5/thumbnails/22.jpg)
National Center for Supercomputing Applications
NCSA Provenance Infrastructure
Open Provenance Model
Tupelo Semantic Content Repository
Context ContextContext
OPM toolkit
Store Store Store
OPM toolkit
Visualization,interaction
Tracking,modeling,presentation
Abstraction,inference,storage
destkop,portal,etc.
![Page 23: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649cfb5503460f949ccf48/html5/thumbnails/23.jpg)
National Center for Supercomputing Applications
Tupelo: semantic content
Abstracts content from storage impls (e.g., Sesame, Mulgara)
Provides location-independent addressing of content and metadata
Supports transparent mirroring, caching, failover, etc.
(tupeloproject.org)
![Page 24: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649cfb5503460f949ccf48/html5/thumbnails/24.jpg)
National Center for Supercomputing Applications
CyberIntegrator: workflow by example
Records what users do as provenance source,
intermediate, and final artifacts
steps and parameters
Can re-enact interaction as a workflow
![Page 25: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649cfb5503460f949ccf48/html5/thumbnails/25.jpg)
National Center for Supercomputing Applications
MAEviz: analaysis/viz app, workflow “behind the scenes”
GIS app. platform Earthquake hazard
analysis plug-in Data catalog
built environment fragility/hazard
models Driven by workflow
-> provenance
![Page 26: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649cfb5503460f949ccf48/html5/thumbnails/26.jpg)
National Center for Supercomputing Applications
CyberCollaboratory: collaboration + provenance
User interaction with tools generates events
Events are captured using the OPM and published to Tupelo
Non-portal apps can browse / use provenance
![Page 27: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649cfb5503460f949ccf48/html5/thumbnails/27.jpg)
National Center for Supercomputing Applications
Summary
“The way things go” is critical to e-Science at scale
Provenance is an open causal network
New infrastructure supports provenance
![Page 28: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649cfb5503460f949ccf48/html5/thumbnails/28.jpg)
National Center for Supercomputing Applications
Resources / acknowledgements Grid Provenance Challenge
http://twiki.gridprovenance.org/ NCSA technologies
Tupelo: http://tupeloproject.org/ CyberIntegrator: http://isda.ncsa.uiuc.edu/ MAEviz: http://maeviz.cee.uiuc.edu/ CyberCollaboratory:
http://ecid.ncsa.uiuc.edu/cybercollab/ Acknowledgements:
Jim Myers, Luc Moreau, Juliana Friere, Patrick Paulson, Simon Miles, Bob McGrath, and more ...