ontogene in the bionlp shared task and in …nactem.ac.uk/talk_slides/rinaldi.pdfoutline background...
TRANSCRIPT
![Page 1: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/1.jpg)
OntoGene in the BioNLP Shared Task
and in BioCreative II.5
Fabio Rinaldi
![Page 2: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/2.jpg)
Outline
Background our work on GENIA participation to BioCreative II
The IntAct activity Interactors (AIME 09), Methods (SMBM08), Organisms
(BioNLP 09), Interactions (CICLING 09)
Recent work: BioNLP shared task BioCreative II.5
![Page 3: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/3.jpg)
Outline
Background our work on GENIA participation to BioCreative II
The IntAct activity Interactors (AIME 09), Methods (SMBM08), Organisms
(BioNLP 09), Interactions (CICLING 09)
Recent work: BioNLP shared task BioCreative II.5
![Page 4: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/4.jpg)
OntoGene: the beginnings
BioNLP identified as a 'hot' area for research Leverage on the work done on terminology
structuring original focus: ontology learning later refocused on ontology usage
Gradually moved into relation extraction leverage upon dependency structures (Pro3Gres) organize different tools into an NLP pipeline
![Page 5: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/5.jpg)
![Page 6: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/6.jpg)
OG-RM
![Page 7: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/7.jpg)
How does it work?
The pipeline delivers: tokens with unique identifiers terms and their heads chunks and their heads dependency relations; encoded as (sentenceid, type,
head, dependent); can be delivered either as CSV or XML
OGRM application makes use of this information (stored in a Prolog database) in order to extract domain relations by means of cascading rules
![Page 8: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/8.jpg)
NLP Pipeline sentence splitting (mxterminator), tokenizer
(Penn Treebank tokenizer), POStagger (MXPOST), lemmatizer (morpha), NG/VG chunker (LTCHUNK), dependency parser (Pro3Gres)
each tool has a wrapper to make inputoutput XMLbased
other outputs are possible: CSV, Prolog
integrates LingPipe, Term Detection...
support various postprocessing of dependency relations
Performance: 1 hour to parse the GENIA corpus
dual core AMD opteron 2.5 Ghz, 8GB ram 45 min for parsing
![Page 9: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/9.jpg)
Pro3Gres parse example
![Page 10: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/10.jpg)
OG-RM: cascading rules
X1
By [by, through, via]
X3
X2
subj
prep
pobj
A B
H
agent target
A regulates BB is regulated by Athe regulation of B by A
semRel(xrel([H,A,B]), direct_transitive([H,A,B])).semRel(xrel([H,A,B]), passive([H,B,A])).semRel(xrel([H,A,B]), nominalization([H,B,A])).
![Page 11: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/11.jpg)
OG-RM: cascading rules
X1
By [by, through, via]
X3
X2
subj
prep
pobj
A B
H
agent target
A H [nominalisation]
trigger
BPrep [of, ..]
xrel
deprel
A triggers the H of B
![Page 12: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/12.jpg)
References
Fabio Rinaldi, Gerold Schneider, Kaarel Kaljurand, Michael Hess, Martin Romacker. An environment for relation mining over richly annotated corpora: the case of GENIA. BMC Bioinformatics 2006, 7(Suppl 3):S3. doi:10.1186/147121057S3S3
![Page 13: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/13.jpg)
Outline
Background our work on GENIA participation to BioCreative II
The IntAct activity Interactors (AIME 09), Methods (SMBM08), Organisms
(BioNLP 09), Interactions (CICLING 09)
Recent work: BioNLP shared task BioCreative II.5
![Page 14: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/14.jpg)
Krallinger et al., Overview of the proteinprotein interaction annotation task of BioCreative II,
Genome Biology (2008), vol. 9, suppl. 2, pp. S4
![Page 15: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/15.jpg)
IPS: Our Approach
Protein Name Detection and Disambiguation identification of proteins organismbased disambiguation further disambiguation
Interaction Detection generation of potential interactions filtering of candidate interactions
Syntaxbased filter Novelty filter
Evaluation
![Page 16: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/16.jpg)
UniProt
![Page 17: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/17.jpg)
NEWT
![Page 18: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/18.jpg)
Annotated Abstract
![Page 19: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/19.jpg)
![Page 20: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/20.jpg)
![Page 21: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/21.jpg)
IMS
Detection of experimental methods, based on PSIMI taxonomy
Best official results !!!!!
![Page 22: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/22.jpg)
References
Fabio Rinaldi, Thomas Kappeler, Kaarel Kaljurand, Gerold Schneider, Manfred Klenner, Simon Clematide, Michael Hess, JeanMarc von Allmen, Pierre Parisot, Martin Romacker, Therese Vachon. OntoGene in BioCreative II. Genome Biology, 2008, 9:S13.
![Page 23: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/23.jpg)
Outline
Background our work on GENIA participation to BioCreative II
The IntAct activity Methods (SMBM08), Interactors (AIME 09), Organisms
(BioNLP 09), Interactions (CICLING 09)
Recent work: BioNLP shared task BioCreative II.5
![Page 24: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/24.jpg)
Detection of Biological Interactions
from Biomedical Literature[SNF 100014 / 118396]
Duration: 18 months (April 2008 – September 2009) SNF Funding: 114'046 CHF Novartis Funding: ~ 70'000 CHF University funding: 50% Fabio's position
![Page 25: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/25.jpg)
![Page 26: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/26.jpg)
IntAct
Can be used as source of interactions, interactors, methods, organisms, “snippets” Used to derive distributional frequencies Used to derive a gold standard for testing purposes (for
IMS and TX): 621 PubMedindexed articles
Subtasks: IMS: Experimental Methods TX: Organism Detection PID: Protein Identification and Disambiguation PPI: Protein Interactions
![Page 27: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/27.jpg)
Balance so far
Highlights: IntAct, BioNLP shared task, BioCreative Publications:
Genome Biology paper finally published 4 poster presentations (G2S, LREC, CICLING, ISWC) 4 conference papers (SMBM, OWLED, CICLING,
AIME) 2 workshop presentations [BioNLP workshop & shared
task]
Invited presentations: FBK, Trento; DBTA, Basel; CCP, Denver.
![Page 28: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/28.jpg)
Outline
Background our work on GENIA participation to BioCreative II
The IntAct activity Interactors (AIME 09), Methods (SMBM08), Organisms
(BioNLP 09), Interactions (CICLING 09)
Recent work: BioNLP shared task BioCreative II.5
![Page 29: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/29.jpg)
Introduction
Approach originally developed for participation in the BioCreative proteinprotein interaction task
Used also on an internal project based on the IntAct dataset of protein interactions
Adaptation to the BioNLP shared task took approximately one month
Based on straightforward rewriting of syntactic structures to event structures, taking statistics from training data into account
![Page 30: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/30.jpg)
Preprocessing
Lingpipe for sentence splitting, tokenization, and PoS tagging (GENIA training model)
Term annotation: only terms provided in a1,a2 files (in 10 cases not compatible with tokenization)
Lemmatization (morpha) used only by dep. Parser Chunking using LTCHUNK & detecting chunk heads Dependency parsing with Pro3Gres, only among
chunks
![Page 31: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/31.jpg)
![Page 32: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/32.jpg)
Data format
Tokens (tokID > lemma, Pos Tag, offset) Chunks (tokID > chunk, chunk type, head) Terms (tokID > term ID) Sentences (sent ID > tokens Ids) Dependences (dependent ID > head ID)
![Page 33: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/33.jpg)
Data from training
word_to_freq(+Word, F) eword_to_event(+EventWord, EventType,
EventArgs, F1, F2) F1: frequency of EventWord, EventType, EventArgs F2: frequency of EventWord as trigger
Domination path Direct domination
“regulates expression” Chunk internal domination
“inducible Oct2 expression”
![Page 34: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/34.jpg)
![Page 35: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/35.jpg)
Trigger generation
![Page 36: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/36.jpg)
Event structure generation
![Page 37: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/37.jpg)
Event argument filling
![Page 38: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/38.jpg)
BioNLP shared task
![Page 39: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/39.jpg)
BioNLP shared task
![Page 40: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/40.jpg)
Outline
Background our work on GENIA participation to BioCreative II
The IntAct activity Interactors (AIME 09), Methods (SMBM08), Organisms
(BioNLP 09), Interactions (CICLING 09)
Recent work: BioNLP shared task BioCreative II.5
![Page 41: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/41.jpg)
Our approach
Core pipeline delivers rich annotation format Used to process training and test data
Entities Detection and Disambiguation (IntAct approach) [Orgbased disambiguation]
Candidate interactions Initial training based on GENIA (IntAct approach) Statistics adjusted using training data
Results: “impressively good” AUC training: ~ 22%
![Page 42: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/42.jpg)
BioCreative II.5
![Page 43: OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background our work on GENIA participation to BioCreative II The IntAct activity Interactors](https://reader031.vdocuments.us/reader031/viewer/2022011914/5fc3f0b6b2ccdc3fb32580d0/html5/thumbnails/43.jpg)
Acknowledgments Kaarel Kaljurand Gerold Schneider Thomas Kappeler Simon Clematide
Therese Vachon Martin Romacker Josef Scheiber
www.ontogene.org