Download - Improving Text Mining with Controlled Natural Language: A Case Study for Protein Interactions
![Page 1: Improving Text Mining with Controlled Natural Language: A Case Study for Protein Interactions](https://reader038.vdocuments.us/reader038/viewer/2022100600/554e816eb4c905f66a8b5538/html5/thumbnails/1.jpg)
Improving Text Mining with Controlled Natural Language:
A Case Study for Protein Interactions
Tobias Kuhn (speaker)Loïc Royer
Norbert E. FuchsMichael Schroeder
DILS'06, Hinxton (UK)21 July 2006
![Page 2: Improving Text Mining with Controlled Natural Language: A Case Study for Protein Interactions](https://reader038.vdocuments.us/reader038/viewer/2022100600/554e816eb4c905f66a8b5538/html5/thumbnails/2.jpg)
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 2
Cooperation of
University of Zurich(Norbert E. Fuchs, Tobias Kuhn)
and
TU Dresden(Loïc Royer, Michael Schroeder)
![Page 3: Improving Text Mining with Controlled Natural Language: A Case Study for Protein Interactions](https://reader038.vdocuments.us/reader038/viewer/2022100600/554e816eb4c905f66a8b5538/html5/thumbnails/3.jpg)
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 3
Introduction
Biomedical literature is growing at a tremendous pace
PubMed contains 16 million articles and grows by over 600'000 articles per year
Computational support is needed!
![Page 4: Improving Text Mining with Controlled Natural Language: A Case Study for Protein Interactions](https://reader038.vdocuments.us/reader038/viewer/2022100600/554e816eb4c905f66a8b5538/html5/thumbnails/4.jpg)
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 4
Today's Solution
NLP, manual annotation
![Page 5: Improving Text Mining with Controlled Natural Language: A Case Study for Protein Interactions](https://reader038.vdocuments.us/reader038/viewer/2022100600/554e816eb4c905f66a8b5538/html5/thumbnails/5.jpg)
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 5
Our Approach
Let the researchers express their own results in a formal language
Perfect processing of scientific results by computers
This formal language has to be ... easy to learn and understand
expressive enough to express even complicated scientific results
![Page 6: Improving Text Mining with Controlled Natural Language: A Case Study for Protein Interactions](https://reader038.vdocuments.us/reader038/viewer/2022100600/554e816eb4c905f66a8b5538/html5/thumbnails/6.jpg)
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 6
Knowledge Representation Languages
OWL with RDF/XML
Description Logics
first-order logic
ACE
UML
has
![Page 7: Improving Text Mining with Controlled Natural Language: A Case Study for Protein Interactions](https://reader038.vdocuments.us/reader038/viewer/2022100600/554e816eb4c905f66a8b5538/html5/thumbnails/7.jpg)
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 7
Attempto Controlled English (ACE)
Formal language that looks like natural English
Unambiguously translatable into first-order logic
Restricted grammar
Unlimited vocabulary
www.ifi.unizh.ch/attempto
![Page 8: Improving Text Mining with Controlled Natural Language: A Case Study for Protein Interactions](https://reader038.vdocuments.us/reader038/viewer/2022100600/554e816eb4c905f66a8b5538/html5/thumbnails/8.jpg)
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 8
Formal Summaries
![Page 9: Improving Text Mining with Controlled Natural Language: A Case Study for Protein Interactions](https://reader038.vdocuments.us/reader038/viewer/2022100600/554e816eb4c905f66a8b5538/html5/thumbnails/9.jpg)
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 9
Formal Summaries
BubR1 interacts-with a trunk-domain of Beta2-Adaptin.
[A, B, C, D]named(A, BubR1)-1object(A, atomic, named_entity, object, cardinality, count_unit, eq, 1)-1named(B, Beta2-Adaptin)-1object(B, atomic, named_entity, object, cardinality, count_unit, eq, 1)-1object(C, atomic, trunk-domain, unspecified, cardinality, count_unit, eq, 1)-1relation(C, trunk-domain, of, B)-1predicate(D, unspecified, interact_with, A, C)-1
ACE text
Logical representation (DRS)
![Page 10: Improving Text Mining with Controlled Natural Language: A Case Study for Protein Interactions](https://reader038.vdocuments.us/reader038/viewer/2022100600/554e816eb4c905f66a8b5538/html5/thumbnails/10.jpg)
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 10
Ontology for Protein Interactions
![Page 11: Improving Text Mining with Controlled Natural Language: A Case Study for Protein Interactions](https://reader038.vdocuments.us/reader038/viewer/2022100600/554e816eb4c905f66a8b5538/html5/thumbnails/11.jpg)
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 11
Empirical Study
“How suitable is ACE together with our ontology to express scientific results of protein interactions?”
Manual translation of 273 facts about protein interactions
These facts are subheadings of the “Results”-sections of 89 articles (journals by Elsevier)
![Page 12: Improving Text Mining with Controlled Natural Language: A Case Study for Protein Interactions](https://reader038.vdocuments.us/reader038/viewer/2022100600/554e816eb4c905f66a8b5538/html5/thumbnails/12.jpg)
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 12
Empirical Study
154
57
62
matched perfectly
matched partially
unmatched not covered by the model
relations of relations
fuzzy21
5611
31
not understood
Total: Non-perfect:
![Page 13: Improving Text Mining with Controlled Natural Language: A Case Study for Protein Interactions](https://reader038.vdocuments.us/reader038/viewer/2022100600/554e816eb4c905f66a8b5538/html5/thumbnails/13.jpg)
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 13
Authoring tool
Helps writing ACE sentences
Shows step by step the possible continuations of the sentence
New words can be created on-the-fly
Awareness of the underlying ontology
The users do not need to know the details of the ACE syntax and of the underlying ontology
![Page 14: Improving Text Mining with Controlled Natural Language: A Case Study for Protein Interactions](https://reader038.vdocuments.us/reader038/viewer/2022100600/554e816eb4c905f66a8b5538/html5/thumbnails/14.jpg)
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 14
Authoring tool:Prototype demo
http://gopubmed.biotec.tu-dresden.de/AceWiki/
![Page 15: Improving Text Mining with Controlled Natural Language: A Case Study for Protein Interactions](https://reader038.vdocuments.us/reader038/viewer/2022100600/554e816eb4c905f66a8b5538/html5/thumbnails/15.jpg)
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 15
Benefits of our Approach
Consistency / redundancy checks “Is there a paper that contradicts my results?”
“Is there a paper that comes to the same or similar results?”
Answer extraction “Which proteins interact with a certain domain of
protein X?”
Automatically updated knowledge bases “Give me an overview of the relations of a protein X
to other proteins!”
![Page 16: Improving Text Mining with Controlled Natural Language: A Case Study for Protein Interactions](https://reader038.vdocuments.us/reader038/viewer/2022100600/554e816eb4c905f66a8b5538/html5/thumbnails/16.jpg)
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 16
Conclusions
Formal summaries for scientific articles can make text mining easier and more powerful
ACE combines the power of ontologies with the convenience of natural language
Let the researchers formalize their own results!
![Page 17: Improving Text Mining with Controlled Natural Language: A Case Study for Protein Interactions](https://reader038.vdocuments.us/reader038/viewer/2022100600/554e816eb4c905f66a8b5538/html5/thumbnails/17.jpg)
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 17
Thank you for your attention!
Questions&
Discussion
![Page 18: Improving Text Mining with Controlled Natural Language: A Case Study for Protein Interactions](https://reader038.vdocuments.us/reader038/viewer/2022100600/554e816eb4c905f66a8b5538/html5/thumbnails/18.jpg)
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 18
Subheadings: Example
![Page 19: Improving Text Mining with Controlled Natural Language: A Case Study for Protein Interactions](https://reader038.vdocuments.us/reader038/viewer/2022100600/554e816eb4c905f66a8b5538/html5/thumbnails/19.jpg)
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 19
Degree of Matching: Examples
Matched perfectly: Interaction of Act1 with TRAF6
→ Act1 interacts-with TRAF6.
Matched partially: The mtFabD protein is part of the core of the FAS-II
complex
→ MtFabD is a subunit of FAS-II.
Unmatched: Cav1 interacts differentially with distinct Dyn2 forms
![Page 20: Improving Text Mining with Controlled Natural Language: A Case Study for Protein Interactions](https://reader038.vdocuments.us/reader038/viewer/2022100600/554e816eb4c905f66a8b5538/html5/thumbnails/20.jpg)
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 20
Reasons for Non-perfect Matching: Examples
Not covered by the model: Daxx Potentiates Fas-Mediated Apoptosis
Relations of relations: Kal-GEF1 activation of Pak does not require GEF activity
Fuzzy: ANKRD1 contains potential CASQ2 binding sequences
located in both its NT- and CT-regions
Not understood: hSrb7 does not interact with other nuclear receptors