![Page 1: Keepit Course 3: Provenance (and OPM), based on slides by Luc Moreau](https://reader036.vdocuments.us/reader036/viewer/2022082623/5479ef78b4af9fa5158b4952/html5/thumbnails/1.jpg)
Brief Introduction to Provenance
"As data becomes plentiful, verifiable truth becomes scarce”
http://go-to-hellman.blogspot.com/2010/02/named-graphs-argleton-and-truth-economy.html
For JISC KeepIt course on Digital Preservation Tools for Repository ManagersModule 3, Primer on preservation workflow, formats and characterisation
Westminster-Kingsway College, London, 2 March 2010
![Page 2: Keepit Course 3: Provenance (and OPM), based on slides by Luc Moreau](https://reader036.vdocuments.us/reader036/viewer/2022082623/5479ef78b4af9fa5158b4952/html5/thumbnails/2.jpg)
Provenance: exampleThe following excerpt and slides are taken with permission from Moreau, L.The Open Provenance Model: Towards inter-operability of Provenance Systems http://users.ecs.soton.ac.uk/lavm/talks/iam09.pdf
Example The provenance of a bottle of wine includes:• Grapes from which it is made• Where those grapes grew• Process in the wine’s preparation• How the wine was stored• Between which parties the wine was transported, e.g. producer to distributer to retailer• Where it was auctioned
![Page 3: Keepit Course 3: Provenance (and OPM), based on slides by Luc Moreau](https://reader036.vdocuments.us/reader036/viewer/2022082623/5479ef78b4af9fa5158b4952/html5/thumbnails/3.jpg)
Provenance Definition
• Oxford English Dictionary: – the fact of coming from some particular source or quarter;
origin, derivation– the history or pedigree of a work of art, manuscript, rare
book, etc.; – concretely, a record of the passage of an item through its various owners.
• The provenance of a piece of data is the process that led to that piece of data
![Page 4: Keepit Course 3: Provenance (and OPM), based on slides by Luc Moreau](https://reader036.vdocuments.us/reader036/viewer/2022082623/5479ef78b4af9fa5158b4952/html5/thumbnails/4.jpg)
The Science Lifecycle
scientists
LocalWebRepositories
Graduate Students
Undergraduate Students
Virtual Learning Environment
Technical Reports
Reprints
Peer-Reviewed Journal &
Conference Papers
Preprints &
Metadata
Certified Experimental Results
& Analyses
experimentation
Data, Metadata, Provenance, Scripts, Workflows, Services,Ontologies, Blogs, ...
Digital Libraries
Next Generation Researchers
Adapted from David De Roure’s slides
![Page 5: Keepit Course 3: Provenance (and OPM), based on slides by Luc Moreau](https://reader036.vdocuments.us/reader036/viewer/2022082623/5479ef78b4af9fa5158b4952/html5/thumbnails/5.jpg)
scientists
LocalWebRepositories
Graduate Students
Undergraduate Students
Virtual Learning Environment
Technical Reports
Reprints
Peer-Reviewed Journal &
Conference Papers
Preprints &
Metadata
Certified Experimental Results
& Analyses
experimentation
Data, Metadata, Provenance, Scripts, Workflows, Services,Ontologies, Blogs, ...
Digital Libraries
Next Generation Researchers
Finding the Provenance of research outputs
across all the systemsdata transited through
![Page 6: Keepit Course 3: Provenance (and OPM), based on slides by Luc Moreau](https://reader036.vdocuments.us/reader036/viewer/2022082623/5479ef78b4af9fa5158b4952/html5/thumbnails/6.jpg)
Open Provenance Model (OPM)
• Allows us to express all the causes of an item• Allow for process-oriented and dataflow
oriented views• Based on a notion of annotated causality
graphMoreau, L., et al. v1.00 (Dec 2007), OPM v1.01
(Jul 2008), OPM v1.1 (Dec 2009)
![Page 7: Keepit Course 3: Provenance (and OPM), based on slides by Luc Moreau](https://reader036.vdocuments.us/reader036/viewer/2022082623/5479ef78b4af9fa5158b4952/html5/thumbnails/7.jpg)
OPM Requirements• To allow provenance information to be
exchanged between systems, by means of a compatibility layer based on a shared provenance model.
• To allow developers to build and share tools that operate on such provenance model.
• To define the model in a precise, technology-agnostic manner.
• To define bindings to XML/RDF separately• To support a digital representation of provenance
for any “thing”, whether produced by computer systems or not
![Page 8: Keepit Course 3: Provenance (and OPM), based on slides by Luc Moreau](https://reader036.vdocuments.us/reader036/viewer/2022082623/5479ef78b4af9fa5158b4952/html5/thumbnails/8.jpg)
OPM Serialisation
• OPM is an abstract data model to represent past execution and what causes data and processes to occur
• OPM can be serialised in different formats, referred to as “technology bindings” or serializations
• OPM XML schema (http://openprovenance.org/model/v1.01.a)
• OPM RDF schema• OPM OWL ontology• Effort underway to ensure full equivalence of
representations
![Page 9: Keepit Course 3: Provenance (and OPM), based on slides by Luc Moreau](https://reader036.vdocuments.us/reader036/viewer/2022082623/5479ef78b4af9fa5158b4952/html5/thumbnails/9.jpg)
Nodes• Artifact: Immutable piece of state, which
may have a physical embodiment in a physical object, or a digital representation in a computer system.
• Process: Action or series of actions performed on or caused by artifacts, and resulting in new artifacts.
• Agent: Contextual entity acting as a catalyst of a process, enabling, facilitating, controlling, affecting its execution.
A
P
Ag
![Page 10: Keepit Course 3: Provenance (and OPM), based on slides by Luc Moreau](https://reader036.vdocuments.us/reader036/viewer/2022082623/5479ef78b4af9fa5158b4952/html5/thumbnails/10.jpg)
Edges
A1 A2
P1 P2wasTriggeredBy
wasDerivedFrom
A Pused(R)
APwasGeneratedBy(R)
Ag PwasControlledBy(R)
Edge labels are in the past to express that these are used to describe past executions
![Page 11: Keepit Course 3: Provenance (and OPM), based on slides by Luc Moreau](https://reader036.vdocuments.us/reader036/viewer/2022082623/5479ef78b4af9fa5158b4952/html5/thumbnails/11.jpg)
Illustration• Process “used” artifacts and
“generated” artifact• Edge “roles” indicate the
function of the artifact with respect to the process (akin to function parameters)
• Edges and nodes can be typed
Causation chain:• P was caused by A1 and A2• A3 and A4 were caused by P• Does it mean that A3 and A4
were caused by A1 and A2?
P
A1 A2
A3 A4
used(divisor)used(dividend)
wasGeneratedBy(rest)wasGeneratedBy(quotient)
type=division
![Page 12: Keepit Course 3: Provenance (and OPM), based on slides by Luc Moreau](https://reader036.vdocuments.us/reader036/viewer/2022082623/5479ef78b4af9fa5158b4952/html5/thumbnails/12.jpg)
Time Constraints
A Pused(R) AwasGeneratedBy(R)
Ag
wasControlledBy(R)start: T2end: T5
T4T3
T1<T3 (artifact must exist before being used)T2<T3 (process must have started before using artifacts)T3<T5 (process uses artifacts before it ends)T2<T4 (process must have started before generating artifacts)T4<T5 (process generates artifacts before it ends)T4<T6 (artifact must exist before being used)T2<T5 (process must have started before ending)no constraint between t3 and t4
wasGeneratedBy(R)
T1
used(R)
T6
![Page 13: Keepit Course 3: Provenance (and OPM), based on slides by Luc Moreau](https://reader036.vdocuments.us/reader036/viewer/2022082623/5479ef78b4af9fa5158b4952/html5/thumbnails/13.jpg)
Dublin Core Profile (draft)
• To many people, provenance is primarily about attribution, citation, bibliographic information
• DC provides terms to relate resources to such information
• DC profile aims to use of Dublin Core terms to OPM concepts and graph patterns
with Simon Miles and Joe Futrelle
![Page 14: Keepit Course 3: Provenance (and OPM), based on slides by Luc Moreau](https://reader036.vdocuments.us/reader036/viewer/2022082623/5479ef78b4af9fa5158b4952/html5/thumbnails/14.jpg)
DC to OPM example: dc:publisher
A2
A1
P
publish
wasSameResourceAs
state=published
AgwasActionOf
state=unpublished
personname=Luc
used
wasGeneratedBy
![Page 15: Keepit Course 3: Provenance (and OPM), based on slides by Luc Moreau](https://reader036.vdocuments.us/reader036/viewer/2022082623/5479ef78b4af9fa5158b4952/html5/thumbnails/15.jpg)
What have we learned about provenance?
• Provenance: describes and records the results of processes on objects over time• OPM represents provenance as XML• OPM can be serialised in different formats
• RDF, Semantic Web
• OPM is a work in progress
By working with an open standard model, that can pass information as XML and in standard serialisation formats (e.g. RDF), it should be possible to build provenance services into repository environments