2011linked science4mccuskermcguinnessfinal
DESCRIPTION
Linked Science 2011 talk on the importance of modeling sources and their usage, such as PML's source usage, and how it can be generalized using FRBRTRANSCRIPT
Where did you hear that?
Information and the Sources They Come From
James P. McCusker,1 Timothy Lebo,1 Li Ding,1 Cynthia Chang,1 Paulo Pinheiro da Silva,2 and Deborah L. McGuinness1
1Tetherless World Constellation2CyberShARE Center, University of Texas at El Paso
Linked Science 2011, Bonn, Germany
Background
To do Linked Science, you need to know where your data comes
from.
Many studies show people will not use the applications if they do not have access to the information relied on
Studies done for DARPA (CALO), IARPA (NIMD), NSF (VSTO), AT&T (PROSE & FindUR), …
Selected Dimensions of Provenance
Derivational Provenance•x derived_from y using rule z•agent initiated and controlled process•process used x•y generated_by process•subprocess triggered_by process
This work came from our work on provenance languages and their environments, in particular on PML: Proof Markup LanguageAndInference WebAnd nowPROV: Provenance Model (W3C, in progress)
One perspective on Abstractive Provenance:•Work•Expression•Manifestation•ItemFRBR: Functional Requirements for Bibliographic ReferencesPROV: wasComplementOf
Exploring in another area of Cell Lines vs. Cell Line Colonies
FRBR Stack for PML Primer
WorkThe web page URL.
ExpressionThe page content
ManifestationThe bytes that were downloaded.
ItemThe specific physical copy.
Information, Source, SourceUsage in PML
PML provides an explicit mechanism for linking Information to its Source.
Includes:
• Location (URI)
• Date of access
• Extraction method (file offsets, cell contents, etc).
Why FRBR for Information Sources?
Work – SourceExpression – Abstract InformationManifestation – Concrete InformationItem – Specific Copy
These constructs can be reused for other provenance:• File copy• Format conversion• Reproducibility
Source and Information in FRBR
Discussion
• Abstractive + derivational provenance is really powerful.• Ability to identify content regardless of format and across
multiple copies of data.• Reproducibility is verifiable across different file formats and
algorithms as long as the Expressions are the same.• We have found that FRBR generalizes to information
resources.• Could be considered FRIR (Functional Requirements for
Information Resources)• We’ve implemented it in our LOGD converter
(csv2rdf4lod).• Maintain content links (same Expression) even when
manually changing the data, like converting Excel to CSV.• Particularly useful in much of our Linked Open Data work.
Conclusions
• Any serious provenance model for linked science must provide a mechanism for describing information sources and their usage.– Achieved with modeling primitives in PML.
• Abstractive + derivational provenance can express nuanced explanations of data access, transformation, and analysis.
• Links from information to source can be modeled using a combination of FRBR and a derivational provenance model.– Allows for unambiguous descriptions of data and
information access and transformation.
Questions?
Thanks!
Also, come to SemantAqua demo on Tues and talk on Wed aft. 5:30
The Tetherless World Constellation is partially funded by DARPA, U.S. Department of Energy, Fujitsu, LGS, Lockheed Martin, Microsoft Research,
NASA, National Ecological Observatory Network (NEON), the National Science Foundation, Qualcomm, and the Woods Hole Oceonographic
Institution (WHOI). This research was partially funded by the National Science Foundation under CREST Grant No. HRD-0734825.
Source, Information, and SourceUsage in PML
Implementation: pcurl.py
Part of csv2rdf4lod:$ pcurl.py --help
usage: pcurl.py [--help|-h] [--format|-f xml|turtle|n3|nt] [url ...]
Download a URL and compute Functional Requirements for Bibliographic Resources (FRBR) stacks using cryptograhic digests for the resulting content.
Refer to http://purl.org/twc/pub/mccusker2012parallel for more information and examples.
optional arguments:
url url to compute a FRBR stack for.
-h, --help Show this help message and exit.
-f, --format File format for FRBR stacks. One of xml, turtle, n3, or nt.