towards building a database of phosphorylate interactions extracting information from the literature...
TRANSCRIPT
![Page 1: Towards Building A Database of Phosphorylate Interactions Extracting Information from the Literature M. Narayanaswamy & K. E. Ravikumar AU-KBC Center,](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eb65503460f94bc05b9/html5/thumbnails/1.jpg)
Towards Building A Database of Phosphorylate Interactions
Extracting Information from the Literature
M. Narayanaswamy & K. E. RavikumarAU-KBC Center, Chennai, India
&K. Vijay-Shanker
University of Delaware
![Page 2: Towards Building A Database of Phosphorylate Interactions Extracting Information from the Literature M. Narayanaswamy & K. E. Ravikumar AU-KBC Center,](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eb65503460f94bc05b9/html5/thumbnails/2.jpg)
Information Extraction from the Literature
• Wealth of information available (only) in unstructured form (scientific literature)
• Need to store data in structured form (databases) for bioinformatics applications
• Information extraction is an active field.
• Focus in the biological domain -- extracting information on protein pairs that interact
![Page 3: Towards Building A Database of Phosphorylate Interactions Extracting Information from the Literature M. Narayanaswamy & K. E. Ravikumar AU-KBC Center,](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eb65503460f94bc05b9/html5/thumbnails/3.jpg)
Phosphorylation Extraction
• <Agent = Frp-1
Theme = p53
Site = Ser 15>
• <Agent = JNK
Theme = c-Jun
Site = unk>
![Page 4: Towards Building A Database of Phosphorylate Interactions Extracting Information from the Literature M. Narayanaswamy & K. E. Ravikumar AU-KBC Center,](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eb65503460f94bc05b9/html5/thumbnails/4.jpg)
Why Phosphorylation
• Central role in signal transduction.
• One of the more widely studied
• IE generalize to other post-translational modifications and binding.
• Different challenges – Agent and target – not just proteins– Site of phosphorylation
![Page 5: Towards Building A Database of Phosphorylate Interactions Extracting Information from the Literature M. Narayanaswamy & K. E. Ravikumar AU-KBC Center,](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eb65503460f94bc05b9/html5/thumbnails/5.jpg)
Steps in Text Processing and Information Extraction
• Basic Text Processing – e.g., sentence boundary detection.
• Part of Speech Tagging• Name/term Detection• Phrase (esp., Noun and Verb Phrase)
chunking• Type Classification of Terms and noun
phrases• Template Pattern Matching
![Page 6: Towards Building A Database of Phosphorylate Interactions Extracting Information from the Literature M. Narayanaswamy & K. E. Ravikumar AU-KBC Center,](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eb65503460f94bc05b9/html5/thumbnails/6.jpg)
BioNEx (PSB 2003)
• Detects Names/Terms of following types:– Protein/gene– Protein/gene parts– Chemicals– Source– Others
• Two tasks – Detection and Classification
![Page 7: Towards Building A Database of Phosphorylate Interactions Extracting Information from the Literature M. Narayanaswamy & K. E. Ravikumar AU-KBC Center,](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eb65503460f94bc05b9/html5/thumbnails/7.jpg)
Classification -- F-Term
• Names are often descriptive NPs– Simian immundeficiency virus– T cell– Mitogen-activated protein kinase– Ras guanine nucleotide exchange factor
• Description of function/class of entities
• Useful to assign types
![Page 8: Towards Building A Database of Phosphorylate Interactions Extracting Information from the Literature M. Narayanaswamy & K. E. Ravikumar AU-KBC Center,](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eb65503460f94bc05b9/html5/thumbnails/8.jpg)
Additional Sources for Classification
• Using context – h-terms such as “expression” – “…IL-2 expression…”
• Appositives – “Mek1, a tyrosine kinase,…”
• Acronyms– Mitogen-activated protein kinase (MAPK)
… ...MAPK …
• Coordination• High precision and recall (PSB 2003)
![Page 9: Towards Building A Database of Phosphorylate Interactions Extracting Information from the Literature M. Narayanaswamy & K. E. Ravikumar AU-KBC Center,](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eb65503460f94bc05b9/html5/thumbnails/9.jpg)
Steps in IE
• Basic Text Processing – e.g., sentence boundary detection.
• Part of Speech Tagging• Name/term Detection• Phrase (esp., Noun and Verb Phrase)
chunking• Type Classification of Terms and noun
phrases• Template Pattern Matching
![Page 10: Towards Building A Database of Phosphorylate Interactions Extracting Information from the Literature M. Narayanaswamy & K. E. Ravikumar AU-KBC Center,](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eb65503460f94bc05b9/html5/thumbnails/10.jpg)
Phrase Chunking
• Detect BaseNPs – Active p90Rsk2 was found to be able to
phosphorylate histone H3 at Ser10
![Page 11: Towards Building A Database of Phosphorylate Interactions Extracting Information from the Literature M. Narayanaswamy & K. E. Ravikumar AU-KBC Center,](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eb65503460f94bc05b9/html5/thumbnails/11.jpg)
Phrase Chunking
• Detect BaseNPs and Verb Groups– Active p90Rsk2 was found to be able to
phosphorylate histone H3 at Ser10
![Page 12: Towards Building A Database of Phosphorylate Interactions Extracting Information from the Literature M. Narayanaswamy & K. E. Ravikumar AU-KBC Center,](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eb65503460f94bc05b9/html5/thumbnails/12.jpg)
Phrase Chunking
• Detect BaseNPs– Active p90Rsk2 was found to be able to
phosphorylate histone H3 at Ser10
• Verb groups (passive vs. active forms)
![Page 13: Towards Building A Database of Phosphorylate Interactions Extracting Information from the Literature M. Narayanaswamy & K. E. Ravikumar AU-KBC Center,](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eb65503460f94bc05b9/html5/thumbnails/13.jpg)
Phrase Chunking
• Detect BaseNPs– Active p90Rsk2 was found to be able to
phosphorylate histone H3 at Ser10
• Verb groups
• Appositives – … Sic1, an inhibitor …, is phosphorylated
• Relative Clauses– … Ser38 which is phosphorylated …
![Page 14: Towards Building A Database of Phosphorylate Interactions Extracting Information from the Literature M. Narayanaswamy & K. E. Ravikumar AU-KBC Center,](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eb65503460f94bc05b9/html5/thumbnails/14.jpg)
Steps in IE
• Basic Text Processing – e.g., sentence boundary detection.
• Part of Speech Tagging• Name/term Detection• Phrase (esp., Noun and Verb Phrase)
chunking• Type Classification of Terms and noun
phrases• Template Pattern Matching
![Page 15: Towards Building A Database of Phosphorylate Interactions Extracting Information from the Literature M. Narayanaswamy & K. E. Ravikumar AU-KBC Center,](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eb65503460f94bc05b9/html5/thumbnails/15.jpg)
Why type classification?
• A phosphorylated B in C– ATR/FRP-1 also phosphorylated p53 in Ser
15 …– Active Chk2 phosphorylated the SQ/TQ sites
in Ckk2 SCD …– cdk9/cyclinT2 could phosphorylate the
retinoblastoma gene (pRb) in human cell lines
![Page 16: Towards Building A Database of Phosphorylate Interactions Extracting Information from the Literature M. Narayanaswamy & K. E. Ravikumar AU-KBC Center,](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eb65503460f94bc05b9/html5/thumbnails/16.jpg)
Type Classification
• Extensive use of type information in rules
• Typing done by means of – Phrase internal -- e.g., Ras guanine
nucleotide exchange factor Sos– Contextual – e.g., homolog of TAF(II) – syntactic information – appositive,
coordination etc.
![Page 17: Towards Building A Database of Phosphorylate Interactions Extracting Information from the Literature M. Narayanaswamy & K. E. Ravikumar AU-KBC Center,](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eb65503460f94bc05b9/html5/thumbnails/17.jpg)
Steps in IE
• Basic Text Processing – e.g., sentence boundary detection.
• Part of Speech Tagging• Name/term Detection• Phrase (esp., Noun and Verb Phrase)
chunking• Type Classification of Terms and noun
phrases• Template Pattern Matching
![Page 18: Towards Building A Database of Phosphorylate Interactions Extracting Information from the Literature M. Narayanaswamy & K. E. Ravikumar AU-KBC Center,](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eb65503460f94bc05b9/html5/thumbnails/18.jpg)
Patterns and Templates
• <Agent> <VG-active-phosphorylate> <Target> (in/at <SITE>)?– Active p90Rsk2 was found to be able to
phosphorylate histone H3 at Ser10
• Active, Passive, Adjectival forms for phoshorylate/phosphorylated
• Different orders and optionality of arguments
![Page 19: Towards Building A Database of Phosphorylate Interactions Extracting Information from the Literature M. Narayanaswamy & K. E. Ravikumar AU-KBC Center,](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eb65503460f94bc05b9/html5/thumbnails/19.jpg)
Patterns for Phosphorylation
• Non-Verbal (not common) but frequent • Phosphorylation of <Target> (by <Agent>)?
(in/at <Site>)?
• Phosphorylation of <Site> …• <Agent> <VG-active> <Target> by/via
phosphorylation (at <Site>)?• Altogether, large number of patterns from
examining 300 abstracts and 10 journal articles.
![Page 20: Towards Building A Database of Phosphorylate Interactions Extracting Information from the Literature M. Narayanaswamy & K. E. Ravikumar AU-KBC Center,](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eb65503460f94bc05b9/html5/thumbnails/20.jpg)
Sentence-Based Evaluation
Precision Recall F-measure
Agent 91 88 89
Theme 96 87 92
Site 94 73 82
Relation 89 77 83
Agent 89 89 81
Theme 98 93 96
Site 100 96 96
Relation 96 89 92
![Page 21: Towards Building A Database of Phosphorylate Interactions Extracting Information from the Literature M. Narayanaswamy & K. E. Ravikumar AU-KBC Center,](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eb65503460f94bc05b9/html5/thumbnails/21.jpg)
Utility in Building Databases
• IE on 1000 abstracts – 5m/3s
• Precision on 200 abstracts – Relation > 92%
• Scales up well
• Useful for constructing DBs.
![Page 22: Towards Building A Database of Phosphorylate Interactions Extracting Information from the Literature M. Narayanaswamy & K. E. Ravikumar AU-KBC Center,](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eb65503460f94bc05b9/html5/thumbnails/22.jpg)
Discussion
• High precision and recall
• Beyond protein-protein (e.g., site)
• Non-verbal
• Generalizes to other post-translational modifications? (acetylate, methylation,…)
![Page 23: Towards Building A Database of Phosphorylate Interactions Extracting Information from the Literature M. Narayanaswamy & K. E. Ravikumar AU-KBC Center,](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eb65503460f94bc05b9/html5/thumbnails/23.jpg)
Future Work
• Piecemeal information specification
• X phosphorylates Y
+
phosphorylation of Y at Z
=
X phosphorylates Y at Z
• Fusion/Information Merging