provenance abdul saboor department of computer science software engineering research group, berlin,...
TRANSCRIPT
![Page 1: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/1.jpg)
PROVENANCE
Abdul Saboor
Department of Computer Science
Software Engineering Research Group, Berlin,
Germany
Welcome to this Presentation
![Page 2: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/2.jpg)
Presentation Agenda
What is Provenance? Why Provenance is important and two major
strands of Provenance? Provenance and Linked Data Provenance Data Model Provenance Vocabularies The Open Provenance Model Provenance Data Quality Assessment Summary - Scientific and Technical Challenges of
Provenance
1
![Page 3: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/3.jpg)
What is Provenance?
Provenance Recording the history of data and its place of origin
Provenance Dictionary Definitions1. The Merriam-Webster online diction – Origin , Source 2. Oxford English Dictionary – The place of origin or
earliest known history of something; origin, derivation.Provenance Definitions1. Provenance refers to the source of Information such as
entities and processes involved in producing or delivering an artifact. (Yolanda)
2. Provenance is a description of how things came to be, and how they came to be in the state they are in today. Statements about the provenance can themselves be considered to have provenance. (Jim M)
Continues ...2
![Page 4: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/4.jpg)
What is Provenance?
Provenance Working Definitions 3. Provenance of a resource is a record that describes
entities and processes involved in producing and delivering or otherwise influencing that resource. Provenance provides a critical foundation for assessing authenticity, enabling trust, and allowing reproducibility. Provenance assertions are a form of contextual metadata and can themselves become important records with their own provenance. (W3C)
Provenance Web Definition4. On the web, provenance would include information
about the creation and publication of web resources as well as information about access of those resources, and activities related to their discussion, linking, and reuse. Continues ...
3
![Page 5: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/5.jpg)
What is Provenance?
Provenance Definitions
5. Provenance is documentation of the set of artifacts, processes, and agents that have caused a artifact to be, and of the contexts of these entities. Provenance provides a critical foundation for assessing authenticity, enabling trust, and allowing reproducibility and assertions of provenance can themselves become important records with their own provenance. (Jim M)
4
![Page 6: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/6.jpg)
What kind of History?
Data Creator/Data Publisher Data Creation Date Data Modifier & Modification Date Data Description Etc...
5
![Page 7: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/7.jpg)
Why Provenance is Important?
The need of Provenance for data integration and reuse
Data comes from various diverse data sources
Varying Quality
Different Scope
Different Assumptions
6
![Page 8: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/8.jpg)
Two major strands of Provenance
7
![Page 9: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/9.jpg)
Data And Workflow Provenance
Data ProvenanceData ProvenanceWhen information describing that how data has moved through a network of databases is referred to as “fine-grain” or “data” provenance. Fine-grain provenance can further categorized into: where, how and why-Provenance. A query execution simply copy data elements from some source to some target database and where-provenance identifies these source elements where the data in the target is copied from. Why-provenance provides justification for the data elements appearing in the output and how-provenance describes some parts of the input influenced certain parts of the output.
When information describing that how data has moved through a network of databases is referred to as “fine-grain” or “data” provenance. Fine-grain provenance can further categorized into: where, how and why-Provenance. A query execution simply copy data elements from some source to some target database and where-provenance identifies these source elements where the data in the target is copied from. Why-provenance provides justification for the data elements appearing in the output and how-provenance describes some parts of the input influenced certain parts of the output.
Workflow ProvenanceWorkflow Provenance
When Information describing how derived data has been calculated from raw observations that is referred to as “coarse-grain” or “workflow” provenance. The widespread use of workflow flow tools for processing scientific data facilitate for capturing provenance information. The workflow process describes all the steps involved in producing a given data set and, hence captures it provenance information.
When Information describing how derived data has been calculated from raw observations that is referred to as “coarse-grain” or “workflow” provenance. The widespread use of workflow flow tools for processing scientific data facilitate for capturing provenance information. The workflow process describes all the steps involved in producing a given data set and, hence captures it provenance information.
7A
![Page 10: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/10.jpg)
Provenance Dimensions - 1
Content of Provenance Information
Attribution - provenance as the sources or entities that were used to create a new result
Responsibility - knowing who endorses a particular piece of information or result
Origin - recorded vs reconstructed, verified vs non-verified, asserted vs inferred
Process - provenance as the process that yielded an artifact Reproducibility (e.g. workflows, mashups, text extraction) Data Access (e.g. access time, accessed server, party responsible for
accessed server)Evolution and versioning
Republishing (e.g. re-tweeting, re-blogging, re-publishing) Updates (e.g. a document with content from various sources and that
changes over time)Justification for decisions – Includes argumentation, hypotheses, why-not
questionsEntailment - given the results to a particular query, what tuples led to those results 8
![Page 11: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/11.jpg)
Provenance Dimensions - 2
Management of Provenance Information
Publication - Making provenance information available (expose, distribute)Access - Finding and querying provenance informationDissemination control – Track policies specified by creator for when/how an artifact can be used
Access Control - incorporate access control policies to access provenance information
Licensing - stating what rights the object creators and users have based on provenance
Law enforcement (e.g. enforcing privacy policies on the use of personal information)
Scale - how to operate with large amounts of provenance information
Use of Provenance InformationUnderstanding - End user consumption of provenance
abstraction, multiple levels of description, summary presentation, visualization
9
![Page 12: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/12.jpg)
Provenance Dimensions - 3
Interoperability - combining provenance produced by multiple different systems
Comparison - finding what is common in the provenance of two or more entities
Accountability - the ability to check the provenance of an object with respect to some expectation Verification - of a set of requirements Compliance - with a set of policies
Trust - making trust judgments based on provenance Information quality - choosing among competing evidence from diverse sources
(e.g. linked data use cases) Incorporating reputation and reliability ratings with attribution information
Imperfections - reasoning about provenance information that is not complete or correct Incomplete provenance Uncertain/probabilistic provenance Erroneous provenance Fraudulent provenance
Debugging10
![Page 13: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/13.jpg)
Web of Data
11Adapted from Cetinia, iSOCO Innovation Lab, J.M.G Perez, Provenance: eScience to the Web of Data, 11/09
![Page 14: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/14.jpg)
The Linked Data Paradigm
How can we exploit all the available data?
Data can be reuse and remix
Common flexible and usable APIs
Standard vocabularies to describe interlinked datasets
Various Tools
Understand the Semantic Web vision
12
![Page 15: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/15.jpg)
Provenance and Link Data
Provenance provides the ability Trace the sources of various kinds of data Enable the exploration of relationships between datasets,
their authors and affiliations
Provenance analysis provides an insight on how data is produced and exploited
Provenance create a notion of information quality Is a certain dataset consistent and up to date? Is the connection between two datasets meaningful? Is a given dataset relevant for a particular domain?
Provenance to establish information trustworthiness Provenance to provide data views relating to some
criteria
13
![Page 16: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/16.jpg)
The Provenance Data Model
Institutional Level
Institutional Level
Experimental Protocol Level Experimental
Protocol Level
Data Analysis and Significance
Level
Data Analysis and Significance
Level
Dataset Description Level
Dataset Description Level
Metadata associated with origin in terms of its data attributes (e.g, AuthorName, Title, URL, etc.)Metadata associated with origin in terms of its data attributes (e.g, AuthorName, Title, URL, etc.)
The Origin of datasets (e.g. History area, region, organisation or institution)The Origin of datasets (e.g. History area, region, organisation or institution)
Datasets statistical analysis methodology for selecting relevant attributes (e.g. Either datasets divided into parts, output values, versions, etc)
Datasets statistical analysis methodology for selecting relevant attributes (e.g. Either datasets divided into parts, output values, versions, etc)
Who published that datasets. The vocabulary of interlinked datasets such as Dublin Core, voiD, PRV, etc.
Who published that datasets. The vocabulary of interlinked datasets such as Dublin Core, voiD, PRV, etc.
14
![Page 17: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/17.jpg)
The Provenance Related Vocabularies
DC – Dublin Core FOAF – Friend of a Friend SIOC – Semantic Interlinked online communities WOT – Web of Trust Schema OMV – Ontology Metadata vocabulary SWP – Semantic Web Publishing VoiD – Vocabulary for interlinked datasets PRV – Provenance Vocabulary PML – Proof Markup Language PAV – SWAN provenance ontology OUZO – Provenance ontology CS – Changeset Vocabulary Etc.
15
![Page 18: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/18.jpg)
Provenance Related Metadata
Provenance related metadata is either directly attached to data item or its host the documents or it is available as additional data on web.
For example – Attached metadata are RDF statements about an RDF graph that contains the statements, AuthorName and Creation date of blog entries added to syndication feed, or information about an image and detached metadata can be represented in RDF using vocabularies.
16
![Page 19: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/19.jpg)
A Provenance Architecture for the Web
of Data
Authoritative agencies require to certify and keep data provenance
secure
Applica
tion
Layer
17Adapted from Cetinia, iSOCO Innovation Lab, J.M.G Perez, Provenance: eScience to the Web of Data, 11/09
![Page 20: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/20.jpg)
Main Action Points
Provenance VocabulariesProvenance Vocabularies
Represent and reason with trust and information
quality
Represent and reason with trust and information
quality
Extend emerging Linked data vocabularies
Extend emerging Linked data vocabularies
VOiDVOiD
Awareness of Data Providers
Awareness of Data Providers
W3C Provenance Incubator GroupW3C Provenance Incubator Group
Linked Data Standards
(VOiD)
Linked Data Standards
(VOiD)
Tools for Data Providers
Tools for Data Providers
Generalization of Provenance Metadata
Generalization of Provenance Metadata
Provenance Authoritative
Agencies
Provenance Authoritative
Agencies
Provenance VisualizationProvenance Visualization
18Adapted from Cetinia, iSOCO Innovation Lab, J.M.G Perez, Provenance: eScience to the Web of Data, 11/09
![Page 21: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/21.jpg)
The Open Provenance Model
The Open Provenance Model in which data is being produced/transformed into new state. It can also represent the one or more data items from an old to a new state.
OPM graph model for provenance which describes the graph whose edges denote the relationship between occurrence presented by the nodes.
The main purpose of OPM is to support the assessment of various data qualities such as reliability, accuracy and timeliness.
19
![Page 22: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/22.jpg)
OPM Classifies nodes into three parts
ArtifactsArtifacts
Artifacts are the parts of data of fixed value and context that possibly represent an entity in a given state. Edges can also have annotations for providing the information on how occurrence cause another.
Artifacts are the parts of data of fixed value and context that possibly represent an entity in a given state. Edges can also have annotations for providing the information on how occurrence cause another.
ProcessProcess
Process are performed on artifacts in order to produce another artifact.Process are performed on artifacts in order to produce another artifact.
AgentsAgents
Agents indicate the entities which are controlling the process such as user.Agents indicate the entities which are controlling the process such as user.
20
![Page 23: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/23.jpg)
Model of Web Data Provenance
Provenance Graph – It describes the provenance of data Items:Provenance Graph – It describes the provenance of data Items:
NodesNodes
Provenance elements (Pieces of provenance information)
Provenance elements (Pieces of provenance information)
EdgesEdges
Relating Provenance elements to each other
Relating Provenance elements to each other
Sub-graphs Sub-graphs
Related data items if possibleRelated data items if possible
21
![Page 24: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/24.jpg)
Main Focus of Provenance of Web Data
Provenance Models Define
Types of Provenance elements (roles) Relationship between those elements
22Adapted from Olaf Hartig’s, Humboldt University Berlin, Provenance Information in the Web of Data, 04/09
![Page 25: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/25.jpg)
Provenance Data Quality Assessment
The Quality of Information
Main Objectives are accessing the quality of datasetsQuality of datasets in multidimensional perspectives
Relevance of criteria determined by preferences and performing certain tasks on available datasets
23
![Page 26: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/26.jpg)
Provenance Data Quality
Data Trustworthiness Data Authenticity Data Reliability
Dimensions of Believability Trustworthiness of source
Data Lineage – The origin of data Related Artifacts and actors
Reasonableness of data Possibility – The extent to which
data value is possible Consistency – The extent to which
a data value is consistent with other values of same data
Quality of Data Provenance has Three dimensions:
Correctness
Completeness
Relevancy
24
![Page 27: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/27.jpg)
Provenance Data Quality
Quality of Datasets Timeliness Consistency between datasets
Consistency over source – The extent to which a data value is consistent with other values of the same data
Consistency over time – The extent to which the data value is consistent with past data values
Stable and meaningful data
Temporal of Data Transaction valid times closeness – The extent to which a data
value is credible based on proximity of transaction time to valid times.
Transaction time overlap – The extent to which a data value is derived from data values with overlapping valid times.
25
![Page 28: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/28.jpg)
Trust Evaluation
Some Questions must need to be considered while provenance data trust evaluation…
1. Who created that content(s) (author or attributions)?
2. Was the contents manipulated? If yes then by what process or source?
3. Who is providing those contents (repositories)?
26
![Page 29: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/29.jpg)
Quality of Data Assessment
Assign numeric values to Quality Criteria of Datasets or Scoring/Rating Systems
Proactive ApproachPrecision vs Practicality
Manual ApproachManual Approach
Questionnaires base system
Questionnaires base system
Semi-Automatic Approach
Semi-Automatic Approach
Rating based system Reputation based
system
Rating based system Reputation based
system
27
![Page 30: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/30.jpg)
Reasons of Assessment
Main Reasons
Provenance of assessed data on the web
Primary Objectives
Identify the methods / approaches to automatically assess the quality of data on the web
Or Identify the methods to assess the Quality Criteria of Data automatically of web data.
28
![Page 31: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/31.jpg)
A Generalize Assessment Approach
Step - 1Step - 1
Step - 2Step - 2
Step - 3Step - 3
Generate a provenance graph for the data itemGenerate a provenance graph for the data item
Annotate the provenance graph with impact valuesAnnotate the provenance graph with impact values
Execute the assessment function/program (script) Execute the assessment function/program (script)
29
![Page 32: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/32.jpg)
Generate a Provenance Graph
1. What types of provenance elements are necessarily require?
1. What types of details (i.e. granularity) are necessarily require?
2. Where and how do we get provenance information?
Two complementary options Recordings Analyzing the metadata
30
![Page 33: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/33.jpg)
Annotation with Impact Values
1. How might each Provenance element can influence the quality of data?
Each type of element has to analyze systematically
1. What kinds of impact values are necessary and how to represent the influence through impact values?
It is not necessary that impact values should be numeric
It also depends on the assessment functions
1. How do we determine the impact values? 31
![Page 34: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/34.jpg)
Determine the Impact Values
1. From Provenance Information2. From user Input
Rating-based systems, or reputation-based systems Configuration options
1. Through Content Analysis Comparison of data contents Adoption of information retrieval methods Adoption of data cleansing techniques
2. Through Context Analysis Further metadata Domain knowledge
32
![Page 35: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/35.jpg)
Annotation with Impact Values
How might each Provenance element can influence the quality of data?
Provenance Element Type
Creation Date
Creation Guidelines
Source data items
Data creator
Impact Values
Creation time
Weights
Expiry time
33
![Page 36: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/36.jpg)
Assessment Function (s)
1. How the assessment function look alike?
Develop function together with impact values
Take incompleteness into consideration
Provenance graph could be fragmentary
Annotation could be missing
34
![Page 37: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/37.jpg)
Scientific and Technical Challenges of Provenance –
1(SUMMARY)
Provenance information need to be:
Represented
Captured and recorded
Stored and secured, queries and reasoned about
Visualized and browsed
35
![Page 38: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/38.jpg)
Scientific and Technical Challenges of Provenance -
2
Vocabularies for representation of provenance contents
Need representation of process (workflow), entities roles, data collections, meta-assertions, etc.
The open provenance model (OPM)
Granularity of provenance records
How much detail is useful, manageable/scalable in practice?
Size of provenance can be orders of magnitude larger than base data.
Provenance evaluation for information quality and trust management
36
![Page 39: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/39.jpg)
Scientific and Technical Challenges of Provenance –
2a
Evaluation and updates
Shelf timeliness of data
Determine when data becomes obsolete based on provenance information
Versioning of data sources
Relate updates of data based on provenance information
Provenance-aware visualization, navigation and resource consumption
37
![Page 40: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/40.jpg)
Scientific and Technical Challenges of Provenance and
Trust – 3
Policies based on Provenance information Association-based policies
Source is cited in Spiegel Source is cited in Wikipedia
Bias-based policies Source is an Oil company
Distrust policies Source is a blog
Policies may be restricted to a context Topic of search, topics of pages, tags of page
Trust policies may be shared across users
38
![Page 41: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/41.jpg)
Thanks for your attentions !
Freie University BerlinComputer Science DepartmentSoftware Engineering Research GroupTakuStr 9, Berlin, Germany.
Any Questions?
39
![Page 42: PROVENANCE Abdul Saboor Department of Computer Science Software Engineering Research Group, Berlin, Germany Welcome to this Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062621/551c1b4c550346ad4f8b589b/html5/thumbnails/42.jpg)
References1. W3C Website, What is provenance? Modified at November 2010,
http://www.w3.org/2005/Incubator/prov/wiki/What_Is_Provenance2. W3C Website, A working Definition of Provenance, Modified at November 2010,
http://www.w3.org/2005/Incubator/prov/wiki/What_Is_Provenance#A_Working_Definition_of_Provenance3. Hartig, O. Provenance information in the Web of data. In Proceedings of LDOW 2009 (Madrid, Spain,
April 2009).4. O. Hartig and J. Zhao. Using web data provenance for quality assessment. Pro-ceedings of the 1st Int.
Workshop on the Role of Semantic Web in Provenance5. D. Brickley and L. Miller, FOAF Vocabulary Specification, November 2007. http://xmlns.com/foaf/spec6. U. Bojars and J. G. Breslin. SIOC Core Ontology Specification, Revision 1.30, Jan. 2009.
http://rdfs.org/sioc/spec/7. Luc Moreau, Juliana Freire, Joe Futrelle, Robert E. McGrath, Jim Myers, and Patrick Paulson. The open
provenance model: An overview. In IPAW, pages 323–326, 2008. 8. L. L. Pipino, Y. W. Lee, and R. Y. Wang, “Data Quality Assessment,”Communications of the ACM, vol. 45,
Issue no. 4, p. 211-218, 2009.9. You-Wei cheah, Beth Plale. Provenance Analysis: Towards qaulity provenance. In proceeding of 8 th IEEE
International conference on eScience, Chicago Illinois, Oct. 2012. http://www.ci.uchicago.edu/escience2012/pdf/Provenance_Analysis-Towards_Quality_Provenance.pdf
10. Yogesh Simmhan, Beth Plale, and Dennis Gannon. A survey of data provenance in e-science. SIGMOD Record, 34(3):31–36, 2005.
11. Prat, N., and Madnick, S. Evaluating and aggregating data believability across quality sub-dimensions and data lineage. In Proceedings of WITS 2007 (Montreal, Canada, December 2007), p.169-174.
12. Y. Simmhan, B. Plale, and D. Gannon. A Survey of Data Provenance in e-Science. SIGMOD Record, Computer Science Department, Indiana University. Vol. 34, Issue No. 3, p31–36, ACM, Sept. 2005.
13. P. Buneman, S. Khanna, and W. C. Tan. Data Provenance: Some Basic Issues. In Proceedings of the 20th Conference on Foundations of Software Technology and Theoretical Computer Science (FST TCS), p87-93, Springer, Dec. 2000.
14. Prat, N., and Madnick, S. Measuring data believability: A provenance approach. Proceedings of HICSS-41 (Big Island, HI, January 2008), IEEE, p.1-10.
15. Jose Manuel Gomez-Perez, Invited Lectures on Programmable web and the web of data, November 2009, URJC, Campus de Mostoles, Departmental II, Salon de grados, Madrid, Spain, Website, http://www.cetinia.urjc.es/es/node/331
16. Website : http://www.w3.org/2005/Incubator/prov/wiki/images/0/02/Provenance-XG-Overview.pdf17. http://www.w3.org/2005/Incubator/prov/wiki/Provenance_Dimensions18. http://www.w3.org/2005/Incubator/prov/wiki/W3C_Provenance_Incubator_Group_Wiki