reproducible,science,viaseman2cs,and, provenance… · reproducible,science,viaseman2cs,and,...
TRANSCRIPT
Reproducible science via seman2cs and provenance for ecological data
@metama7j [email protected] 0000-0003-0077-4738
Ma#hew B. Jones Christopher Jones Lauren Walker Peter Slaughter Benjamin Leinfelder Na@onal Center for Ecological Analysis and Synthesis (NCEAS)
Science
Smith, Melinda D., Alan K. Knapp, and Scott L. Collins. "A framework for assessing ecosystem dynamics in response to chronic resource alterations induced by global change." Ecology 90.12 (2009): 3279-3289. doi:10.1890/08-1815.1 !
Reproducible Science
Capturing provenance is crucial for transparency, interpretation, debugging, … => repeatable experiments, => reproducible science Slide credit: B. Ludaescher
Kinds of Provenance • Prospec@ve Provenance • method/workflow descrip2on (“workflow-‐land”)
• Retrospec@ve Provenance • Run2me tracking (“trace-‐land”) • “This created from that”
4!
Common Uses of Provenance
• Audit trail: data trace and possible errors • A#ribu@on: credit and responsibility for data and scien2fic results
• Data quality: assess input data, computa2on • Discovery: find versions, derived products • Replica@on: computa2ons are repeatable • Re-‐use: adapt and adopt for new uses
• Goal: Facilitate reproducible science
• Track data deriva@on history • Track data inputs and outputs of analyses • Track analysis and model execu@ons • Preserve and document soQware
• Link all of these to publica@ons
Provenance: 2me travel in DataONE
Using a common model
W3C has published the ‘PROV’ family of recommenda2ons
Entity
Activity
Agent
wasAssociatedWith
wasAttributedTo
used wasGeneratedBy
See w3.org/TR/prov-o/
Using a common model
Example: Scien2fic workflow
map image
R script Execution
Scientist
wasAssociatedWith
wasAttributedTo
wasGeneratedBy
Using a common model
Example: Scien2fic workflow
map image
R script Execution
Scientist
wasAssociatedWith
wasAttributedTo
wasGeneratedBy!
CSV data used
wasDerivedFrom
Using a common model
Example: Scien2fic workflow
map image
R script Execution
Scientist
wasAssociatedWith
wasAttributedTo
wasGeneratedBy
CSV data used
wasDerivedFrom
< “map image” wasDerivedFrom “CSV data” >
Data package with ProvONE trace
resource map
science metadata
system metadata
science data
system metadata
system metadata
ProvONE trace showing relationships
figures
system metadata
soQware
system metadata
Provenance search and browse
DataONE harvests provenance informa2on and indexes it
Repository! DataONE!
ITK Client!
publish!
harvest!
R ‘recordr’ package
1 # Generate map of locations by type
2 library(recordr)
3 recordr <- new(“Recordr”)
4 pkg <- record(recordr, “./hcdbSites.R”, “loc-by-type-png”)
‘recordr’ func2ons
record()
startRecord()
endRecord()
listRuns()
deleteRuns()
viewRun()
publish()
set()
get()
saveConfig()
loadConfig()
listConfig()
See: Run Manager API document
R: managing script runs
> listRuns(recordr)
Script StartTime EndTime Published Tag RunID
hcdbSites.R 18:53:09 18:53:09 unpublished loc-by-type-png C85A ...
> deleteRuns(recordr, “loc-by-type-png”)
C85A188-B72E-49F1-AEF4-7BFC24DA186B
> viewRun(recordr, “loc-by-type-png”)
… details about the run listed here ...
> publishRun(recordr, “loc-by-type-png”)
C85A188-B72E-49F1-AEF4-7BFC24DA186B
• Now, when a user cites a pub, we know:
• Which data produced it • What soQware produced it • What was derived from it • Who to credit down the
a7ribu2on stack
• Katz & Smith. 2014. Implemen@ng Transi@ve Credit with JSON-‐LD. arXiv:1407.5117
Transi2ve credit
Open So\ware and Specifica2ons
R Packages (in tes@ng) recordr: h7ps://github.com/NCEAS/recordr dataone: h7ps://github.com/DataONEorg/rdataone datapackage: h7ps://github.com/ropensci/datapackage
Matlab Toolbox (in development)
matlab-‐dataone: h7ps://github.com/DataONEorg/matlab-‐dataone
ProvONE Specifica@on (draQ) h7p://purl.dataone.org/provone-‐v1-‐dev
Image credit: @ESAOpenSci!
Thank You
DataONE Provenance Contributors Funding DataONE: NSF Grant # 0830944 and 1430508 Community Dynamics: NSF Grant # 1262463
• Matt Jones!• Chris Jones!• Lauren Walker!• Peter Slaughter!• Ben Leinfelder!• Mark Schildhauer!• Steve Aulenbach!• Christopher Schwalm!
• Paolo Missier!• Bertram Ludäscher!• Rachel Volentine!• Susanna Yang Cao!• Dave Vieglais!• Yaxing Wei!• Tim McPhillips!• Phase I Provenance group!