using semantics and nlp in experimental protocols
TRANSCRIPT
Using Semantics and NLP in Experimental Protocols
Olga Giraldo★ Alexander Garcia★, PhD
José Figueredov
Oscar Corcho★, PhD
★Universidad Politécnica de Madrid, Spain vUniversidad Simon Bolivar, Venezuela
Agenda
• Background
• Our approach
• Preliminary results
• Future work
2
We are exploring an alternative for documenting and retrieving information from experimental protocols
Improving the reproducibility
Data repository
Standards for reporting biomedical investigations:
Making data available:
For reproducibility purposes, if the data must be available, so does the experimental protocol detailing the methodology followed to derive the data.
OBI, EXPO, EXACT…
The methods: a way to understand the published results
Source: Plant Mol Biol. 2015; 89(3): 215-227
3
• Experimental protocols are like cooking recipes ü They have ingredients:
reagents and sample
ü They have appliances: equipment,
ü They have a list of instructions,
The protocols should have complete information that allows anybody to recreate an experiment.
ü They have a total time ü They have critical steps…
What is an experimental protocol
4
Instrument Mortar and pestle
Reagent Triton X-100
Sample DNA
Cell disruption
Grind the leaf tissue using a mortar and pestle
Precipitation reaction
Precipitate the DNA with 0.6 mL of 2-propanol.
How to accurately document and retrieve meaningful information from experimental protocols
Goal
5
• Ontology model representing lab protocols • Gazetteer-based method: use existing lists of named
entities ü Lists of proper nouns, which refer to real-life entities
• Rule-based approaches: write manual extraction rules
• Combination of the above
Our approach
6
SMART Protocols: Document module sp:RNA extraction protocol
sp:title of the protocol
sp:protocol identifier
sp:application of the protocol
iao:author list
sp:title1
Extraction of total RNA from fresh/frozen tissue (FT)
sp:author name
Kim M. Linton Yvonne Hey Sian Dibben Crispin J. Miller Anthony J. Freemont John A. Radford Stuart D. Pepper
sp:author list1
sp:protocol identifier1
DOI: 10.2144/000113260
sp:application1
Methods comparison for high-resolution transcriptional analysis of archival material on Affymetrix Plus 2.0 and Exon 1.0 microarrays
sp:author name
sp:provenance of the protocol
sp:provenance1
The extraction method (steps 2–21) is taken from the method supplied with TRIzol reagent (Invitrogen, Paisley, UK).
sp:reagent list
sp:reagentList1
Chloroform Ethyl alcohol Isopropyl alcohol
sp:reagent name
TRIzol
sp:reagent name
Invitrogen
sp:manufacturer name
Sigma-Aldrich
sp:manufacturer name
sp:manufacturer name
sp:equipment and supplies list
sp:equipmentList1
Tissue storage container Homogenizer blades Forceps Scalpel Scalpel holder
sp:equipment name
sp:specimen name
sp:tumor tissue
http://vocab.linkeddata.es/SMARTProtocols/sp-documentV2.0.htm 7
SMART Protocols: workflow module
sp:RNA extraction protocol
sp:RNA extraction procedure
sp:precipitation reaction sp:denaturation reaction sp:cell disruption sp:washing nucleic acids
sp:cell disruption1
Homogenize sample using tissue homogenizer.
sp:denaturation reaction 1
Add 0.2 mL chloroform per 1 mL TRIzol and cap tube tightly.
sp:precipitation reaction 1
Add 0.5 mL isopropyl alcohol per 1 mL TRIzol
sp:washing nucleic acids1
Add 1 mL 75% ethanol per 1 mL TRIzol and vortex for 10 s.
Actions Sample/Specimen/Organism Instrument Reagent
http://vocab.linkeddata.es/SMARTProtocols/sp-workflowV2.0.htm 8
SIRO model
Sample Instrument Reagent Objective
• Commonalities across our corpus of protocols were evaluated and characterized
Collection of protocols available here: https://github.com/oxgiraldo/SMART-Protocols/tree/master/corpus_of_protocols
• What imaging analysis software is used for quantitative analysis of locomotor movements, buccal pumping and cardiac activity on X. tropicalis?
• How to prepare the stock solutions of the H2DCF and DHE dyes?
imaging analysis software quantitative analysis of locomotor movements, buccal pumping and cardiac activity X. tropicalis
H2DCF DHE
• What bacteria have been used in protocols for persister cells isolation? persister cells isolation
bacteria
Sample Instrument Reagent Objective
9
Semantic annotation of protocols
Action Instrument
ANNIE Gazetteer NLP layer
Reagent Organism
Large KB Gazetteer Color code
ACTION
SAMPLE
INSTRUMENT
REAGENT Sample
JAPE rules
10
Future work
• Testing, editing and generating new JAPE rules.
• Evaluating precision, error rate and recall in annotated protocols.
11
Lessons learnt
Overcoming some issues such as: • Problem
• Uploading gazetteers including around a million terms and their corresponding features.
• Solution • We develop a database and connect this with GATE.
Retrieving the “objective” from the protocols • Problem:
• Complex rhetoric
• Solution • Manual annotation
13