Oscar Corcho(with contributions from Olga Giraldo, Alexander García,
and Idafen Santana)
http://www.oeg-upm.net/index.php/en/researchareas/3-
semanticscience/index.html
Ontology Engineering Group
Universidad Politécnica de Madrid, Spain
Towards Reproducible Science: a
few building blocks from my
personal experience
@ocorcho
22/10/2017
S4BioDiv2017, Vienna
Towards Reproducible Science
Introduction
2
HYPOTHESISCONVINCE
AUDIENCE
REPEATABLE
SCIENTIFIC EXPERIMENTS
Towards Reproducible Science
Introduction
3
SCIENTIFIC EXPERIMENTS
IN VIVO/VITRO IN SILICO
Alison’s
biodiversity
scientists
Towards Reproducible Science
Introduction
4
SCIENTIFIC EXPERIMENTS
IN VIVO/VITRO IN SILICO
REPEATABILITY
Alison’s
biodiversity
scientists
Towards Reproducible Science 5
Before continuing….
What does reproducibility
mean for you?
And for your colleagues?
And for the colleagues from
other disciplines?
Towards Reproducible Science
The R* brouhaha
6
Source: The R* brouhaha. Goble C. RDA-Europe’s workshop on RepScience 2016.
Towards Reproducible Science
My own take on terminology
PRESERVATION
CONSERVATION
REPLICABILITY
REPRODUCIBILITY
8
Towards Reproducible Science
Experiment components
9
DATA SCIENTIFIC PROCEDURE EQUIPMENT
IN V
IVO
/VIT
RO
IN S
ILIC
O
Towards Reproducible Science
Experiment components
10
DATA SCIENTIFIC PROCEDURE EQUIPMENT
IN V
IVO
/VIT
RO
IN S
ILIC
O
This has attracted most
of the attention so far
Towards Reproducible Science
Block 1. Experimental Protocols
11
Olga Giraldo
Alexander Garcia
Explore alternative ways for documenting and
retrieving information from experimental protocols
Using Semantics and NLP in the SMART Protocols Repository. Giraldo O, García-Castro
A, Corcho O - ICBO, 2015
Using Semantics and Natural Language Processing in Experimental Protocols. Giraldo
O, García-Castro A, Figueredo J, Corcho O - J Biomedical Semantics, to appear
SMART protocols: semantic representation for experimental protocols. Giraldo O,
García-Castro A, Corcho O – Linked Science 2014
Towards Reproducible Science
What is an experimental protocol
Experimental protocols
are like cooking recipes
They have ingredients:
reagents and sample
They have appliances:
equipment,
They have a list of instructions,
The protocols should have
complete information that
allows anybody to recreate an
experiment.
They have a total time
They have critical steps…
Towards Reproducible Science
Some of the issues we aim at addressing
• Incubate the
centrifuge tubes in a
water bath.
• Incubate the samples
for 5 min with gentle
shaking.
• Rinse DNA briefly in
1-2 ml of wash.
• Incubate at -20C
overnight.
some protocols present insufficient
granularity,
the instructions can be imprecise or
ambiguous due to the use of natural
language.
The protocols lack structure
Towards Reproducible Science
Bio-ontologies
OBI, EXPO, EXACT, BAO, IAO, ERO…
Data repository
for making data
available
few efforts focus on
representing and
standardizing
experimental protocols.
For reproducibility
purposes, if the data
must be available, so
does the experimental
protocol detailing the
methodology followed
to derive the data.
Resources for
reporting guidelines or
Minimum Information
standards
Ingredients for Improving Reproducibility
Towards Reproducible Science
Main research question
How to formalize the information from
laboratory protocols as a knowledge base?
Towards Reproducible Science
Our approach
• Ontology model representing lab protocols
• Gazetteer-based method: use existing lists of named
entities Lists of proper nouns, which refer to real-life entities
• Rule-based approaches:
write manual extraction
rules
• Development of a Gold
Standard of protocols
annotated manually
Towards Reproducible Science
SMART Protocols ontology
17
http://vocab.linkeddata.es/SMARTProtocols/
https://smartprotocols.github.io/
Towards Reproducible Science
The SIRO model
Sample/Specimen(whole organism, anatomical part, bodily fluids, etc.)
Instruments(equipment, devices, consumables, software)
Reagents(chemical compounds, mixtures)
Objective(purpose)
The SIRO model supports search, retrieval and classification of experimental protocols
Towards Reproducible Science
Design of semantic Gazetteer and JAPE rules
Design of semantic Gazetteers• Facilitate the annotation of instances
related to:
Experimental actions
Instruments
Samples/ organisms
Reagents
Design of grammar
rules• Facilitate the
annotation of
instructions
Towards Reproducible Science
Development of a Gold Standard
100 protocols published in
several repositories
Annotators - experts in
life sciences
http://smart-
protocols.labs.linkingdata.io/dist/d
ev/#/login
The SMART Protocols
Annotation Tool
Guidelines about What
and How annotate
Materials:
• BioTechniques,
• CSH-Protocols,
• Current protocols,
• Genet and Mol. Res,
• Journal of Biolog. Methods,
• Jove,
• MethodsX,
• Nature protocols exchange,
• Nature protocols
• Curso BIOS 2016, Colombia
• Universidad del Valle,
Colombia
• Japan (Database Center for
Life Science (DBCLS),
Robotic Biology Institute
(RBI), Spiber, Yachie-Lab,
University of Tokyo).
• Universidad Santiago de
Cali, Colombia
Towards Reproducible Science
Preliminary results
Entities sample instrument reagent objective
Sample Neural cell 3 0 0 0
neural stem cells (NSCs) 3 0 0 0
Instrument Cell culture centrifuge 0 3 0 0
cell culture incubator 0 3 0 0
Microscope 0 3 0 0
Millicell culture plate inserts 8-?m pore size 0 3 0 0
reagent B27 supplement 0 0 3 0
DMEM/F12 0 0 3 0
FGF2 neutralizing antibody 0 0 3 0
glucose 0 0 3 0
objective Here we describe two migration assays, a matrigel migration assay
and a Boyden chamber migration assay, which allow the in
vitro assessment of neural migration under defined conditions
(Ladewig, Koch and Brüstle, 2014).
0 0 0 3
entities sample instrument reagent
Reagent - Sample/Organism Ac-omega viral DNA 1 2
baculoviral 1 2
DNA insert 2 1
I-Sce I meganuclease 1 2
Sample/Organism Insect cells 3
Instrument spinner 3
Centrifuge 3
Flask 3
Reagent IPL-41 powdered 3
Liposome formulation 3
Phenol:chloroform 3
Fleiss Kappa for 3
raters = 1.0
Fleiss Kappa for 3
raters = 0.755
Towards Reproducible Science
Our ongoing work
22
So far, this is ok for handling protocols that have
been already reported in papers
Can we actually change the way in which
these protocols are produced?
Towards Reproducible Science
Platform for publishing semantic protocols
Features:
Open semantic publishing platform
o The protocols are born semantic
Self describing documents
o Meaningful entities
o Machine procesable workflows
Documents will reference existing URIs
o Samples/organisms
o Reagents/chemical compounds
o Instruments
SMART Protocols Ontology /
Gazetteers / Grammar rules
UniProt
NCBI taxonomy
PubChem
Vendors
Towards Reproducible Science
Platform available at: http://smartprotocols.labs.linkingdata.io/app/protocols
The platform
Towards Reproducible Science
Organisms come from the UniProt Taxon API
26
After selecting
an organism,
the
correspondent
ID is
automatically
recorded
Towards Reproducible Science
Block 2. Computational Environments
30
Idafen Santana
Is it possible to describe the main properties of the
Execution Environment of a Computational Scientific
Experiment and, based on this description, derive a
reproduction process for generating an equivalent
environment using virtualization techniques?
Conservation of Computational Scientific Execution Environments for Workflow-
based Experiments Using Ontologies. Santana-Pérez I. PhD thesis, 2016.
http://oa.upm.es/39520/
Towards Reproducible Science
Experiment components
31
DATA SCIENTIFIC PROCEDURE EQUIPMENT
IN V
IVO
/VIT
RO
IN S
ILIC
O
Towards Reproducible Science
Experiment components
32
DATA SCIENTIFIC PROCEDURE EQUIPMENT
IN S
ILIC
O
Towards Reproducible Science
Experiment components
33
DATA SCIENTIFIC PROCEDURE EQUIPMENT
IN S
ILIC
O
Towards Reproducible Science
Experiment components
34
DATA SCIENTIFIC PROCEDURE EQUIPMENT
IN S
ILIC
O
Towards Reproducible Science
Experiment components
35
DATA SCIENTIFIC PROCEDURE EQUIPMENT
IN S
ILIC
O
Towards Reproducible Science
bundles and relates digital resources of a scientific experiment
or investigation using standard mechanisms, “tool middleware”
http://www.w3.org/community/rosc/http://www.researchobject.org/
Towards Reproducible Science
Experiment components
38
DATA SCIENTIFIC PROCEDURE EQUIPMENT
IN V
IVO
/VIT
RO
IN S
ILIC
O
Towards Reproducible Science
Open Research Problems
40
Computational Infrastructures are usually a predefined
element of a Computational Scientific Workflow.
Towards Reproducible Science
Open Research Problems
41
Computational Infrastructures are usually a predefined
element of a Computational Scientific Workflow.
Execution Environments are poorly described.
Towards Reproducible Science
Open Research Problems
42
Computational Infrastructures are usually a predefined
element of a Computational Scientific Workflow.
Execution Environments are poorly described.
Current reproducibility approaches for computational
experiments consider mostly data and procedure.
Towards Reproducible Science
Representation
43
CLOUD
Describing execution environments
FORMER
EQUIPMENT
ANNOTATE REPRODUCE
SEMANTIC
ANNOTATIONS
EQUIVALENT
EXECUTION
ENVIRONMENT
Towards Reproducible Science
Representation
WICUS ontology network
o Workflow Infrastructure Conservation Using Semantics
o http://purl.org/net/wicus
o 5 ontologies
• WICUS Workflow Execution Requirements ontology
• WICUS Software Stack ontology
• WICUS Hardware Specs ontology
• WICUS Scientific Virtual Appliance ontology
• WICUS Ontology: links the previous ontologies
44
Towards Reproducible Science
WICUS ontology network
WICUS Workflow Execution Requirements ontology
o http://purl.org/net/wicus-reqs
45
Towards Reproducible Science
WICUS ontology network
WICUS Software Stack ontology
o http://purl.org/net/wicus-stack
46
Towards Reproducible Science
WICUS ontology network
WICUS Scientific Virtual Appliance ontology
o http://purl.org/net/wicus-sva
47
Towards Reproducible Science
WICUS ontology network
WICUS Hardware Specs ontology
o http://purl.org/net/wicus-hwspecs
48
Towards Reproducible Science
WICUS ontology network
WICUS ontology network
o http://purl.org/net/wicus
49
Towards Reproducible Science
WICUS ontology network
WICUS ontology network
o http://purl.org/net/wicus
50
Towards Reproducible Science
Evaluation
Workflows reproduced
o 3 scientific domains
o 3 workflow management systems
o 6 different workflows
52
Domain Seismic Astronomy Bio
WMS dispel4py Pegasus Makeflow
Name xcorrInternal
ExtinctionMontage Epigenomics SoyKB BLAST
(2003) (2014)(2014) (2015) (2011)(2011)
Towards Reproducible Science
Evaluation
53
Domain Seismic Astronomy Bio
WMS dispel4py Pegasus Makeflow
Name xcorrInternal
ExtinctionMontage Epigenomics SoyKB BLAST
Results
FORMER
EQUIPMENT
ANNOTATE REPRODUCE
CLOU
D
EQUIVALENT EXECUTION
ENVIRONMENTSEMANTIC
ANNOTATIONS
COMPARE
Towards Reproducible Science
Evaluation
54
Domain Seismic Astronomy Bio
WMS dispel4py Pegasus Makeflow
Name xcorrInternal
ExtinctionMontage Epigenomics SoyKB BLAST
Results
CLOU
D
FORMER
EQUIPMENT
ANNOTATE REPRODUCE
SEMANTIC
ANNOTATIONS
EQUIVALENT EXECUTION
ENVIRONMENT
COMPARE
Towards Reproducible Science
Evaluation
55
Domain Seismic Astronomy Bio
WMS dispel4py Pegasus Makeflow
Name xcorrInternal
ExtinctionMontage Epigenomics SoyKB BLAST
Results
CLOU
D
FORMER
EQUIPMENT
ANNOTATE REPRODUCE
SEMANTIC
ANNOTATIONS
EQUIVALENT EXECUTION
ENVIRONMENT
COMPARE
• Non-deterministic
• Standard and error output
• Generated files equivalent
Towards Reproducible Science
Evaluation
56
Domain Seismic Astronomy Bio
WMS dispel4py Pegasus Makeflow
Name xcorrInternal
ExtinctionMontage Epigenomics SoyKB BLAST
Results
CLOU
D
FORMER
EQUIPMENT
ANNOTATE REPRODUCE
SEMANTIC
ANNOTATIONS
EQUIVALENT EXECUTION
ENVIRONMENT
COMPARE
• Same results
• Results from Int. Extinction
may vary
Towards Reproducible Science
Evaluation
57
Domain Seismic Astronomy Bio
WMS dispel4py Pegasus Makeflow
Name xcorrInternal
ExtinctionMontage Epigenomics SoyKB BLAST
Results
CLOU
D
FORMER
EQUIPMENT
ANNOTATE REPRODUCE
SEMANTIC
ANNOTATIONS
EQUIVALENT EXECUTION
ENVIRONMENT
COMPARE
• Genomic data
• Exact match
Towards Reproducible Science
Evaluation
58
Domain Seismic Astronomy Bio
WMS dispel4py Pegasus Makeflow
Name xcorrInternal
ExtinctionMontage Epigenomics SoyKB BLAST
Results
CLOU
D
FORMER
EQUIPMENT
ANNOTATE REPRODUCE
SEMANTIC
ANNOTATIONS
EQUIVALENT EXECUTION
ENVIRONMENT
COMPARE
Towards Reproducible Science
Summarizing
Two building blocks towards reproducibility of
scientific experiments
o In vivo/vitro
• Focus on providing structured descriptions of methods
(laboratory protocols)
• Our tools: ontologies, gazeteers, NLP tools and
automatic and manual annotation tools
• Challenge: make protocols be more structured (and
semantic) from the beginning
o In silico
• Focus on the equipment (computational infrastructure)
for workflow-based experiments
• Ontologies, automatic and manual annotation tools, and
an execution environment
• Challenge: keep track of all types of appliances, and
make scientists work on providing annotations
Is this enough?
o Clearly not, but a step forward towards reproducibility59
Towards Reproducible Science
Summarizing
Is this enough?
Clearly not, but a step forward
towards ensuring reproducibility
(with a focus on methods)
60
Oscar Corcho(with contributions from Olga Giraldo, Alexander García,
and Idafen Santana)
Ontology Engineering Group
Universidad Politécnica de Madrid, Spain
Towards Reproducible Science: a
few building blocks from my
personal experience
@ocorcho
22/10/2017
S4BioDiv2017, Vienna