![Page 1: 1 Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE Ryan McDonald Fernando Pereira Seth Kulick CIS and IRCS, University](https://reader036.vdocuments.us/reader036/viewer/2022070306/5519d0755503468b0c8b47bc/html5/thumbnails/1.jpg)
1
Simple Algorithms for Complex Relation Extraction with Applications to Bio
medical IE
Ryan McDonald Fernando Pereira Seth KulickCIS and IRCS, University of Pennsylvania, Philadelphia, PA
Scott Winters Yang Jin Pete WhiteDivision of Oncology, Children’s Hospital of Pennsylvania, Philadelphia, PA
ACL 2005
![Page 2: 1 Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE Ryan McDonald Fernando Pereira Seth Kulick CIS and IRCS, University](https://reader036.vdocuments.us/reader036/viewer/2022070306/5519d0755503468b0c8b47bc/html5/thumbnails/2.jpg)
2
Abstract
• Simple two-stage method for extracting complex relations between named entities in text. – n-ary relation– first stage: create a graph from pairs of entities– two stage: maximal cliques in the graph
• Experiment on biomedical text
![Page 3: 1 Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE Ryan McDonald Fernando Pereira Seth Kulick CIS and IRCS, University](https://reader036.vdocuments.us/reader036/viewer/2022070306/5519d0755503468b0c8b47bc/html5/thumbnails/3.jpg)
3
Introduction - 1/2
• n-ary relation– The relation is definded by the schema (t1,…, tn)
• ti is entity types
– The tuple in the relations is a list of entities (e1,...,en) • Type(e1)=t1 or ei=
• Example : – Type : {person, job, company}
• “John Smith is the CEO at Inc. Corp. “• (John Smith, CEO, Inc. Corp.)• “Everyday John Smith goes to his office at Inc. Corp.”• (John Smith, , Inc. Corp.)
![Page 4: 1 Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE Ryan McDonald Fernando Pereira Seth Kulick CIS and IRCS, University](https://reader036.vdocuments.us/reader036/viewer/2022070306/5519d0755503468b0c8b47bc/html5/thumbnails/4.jpg)
4
Introduction - 2/2
• Application :– Question answer– Automatic database generation– Intelligent document searching and indexing
• Most relation extraction systems focus on:– Binary relation : Such as
• employee of relation
• protein-protein interaction relation
– Extracting keyphrases to represent relation in social networks from Web. (Matsuo et al., IJCAI-07)
![Page 5: 1 Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE Ryan McDonald Fernando Pereira Seth Kulick CIS and IRCS, University](https://reader036.vdocuments.us/reader036/viewer/2022070306/5519d0755503468b0c8b47bc/html5/thumbnails/5.jpg)
5
Previous Work
• Zelenko et al., 2003– Binary relation in news text
• “John Smith, not Jane Smith, works at IBM.”• (John Smith, IBM) : positive• (Jane Smith, IBM) : negative
• Miller et al., 2000– Identify all relations
• Relation extraction from probabilistic parsing tree
• Rosario and Hearst, 2004– Extracting seven relationships between treatments
and diseases
![Page 6: 1 Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE Ryan McDonald Fernando Pereira Seth Kulick CIS and IRCS, University](https://reader036.vdocuments.us/reader036/viewer/2022070306/5519d0755503468b0c8b47bc/html5/thumbnails/6.jpg)
6
Definitions
• n-ary relation– The relation is definded by the schema (t1,…, tn)
• ti is entity types
– The tuple in the relations is a list of entities (e1,...,en) • Type(e1)=t1 or ei=
• A maximal clique– An undirected graph G=(V,E)
• V: vertices , E: a set of edges
– A clique C of G is a subgraph of G in which there is an edge between every pair of vertices.
– A maximal clique of G is a clique C=(Vc, Ec) such that there is no other clique C’=(Vc’, Ec’) such that Vc Vc’.
![Page 7: 1 Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE Ryan McDonald Fernando Pereira Seth Kulick CIS and IRCS, University](https://reader036.vdocuments.us/reader036/viewer/2022070306/5519d0755503468b0c8b47bc/html5/thumbnails/7.jpg)
7
Methods : Classifying Binary Relations-1/3
• Example : {person, job, company}– John and Jane are CEOs at Inc. Corp. and Biz. Corp. respectively.
– 12 possible tuples
• Problems with building a classifier– Exponential run time
– How to manage incomplete but correct instances• (John, ,Inc. Corp.)
• If it is marked as negative, – the model might incorrectly disfavor features that correl
ate John to Inc.Corp..
• If it is labeled as positive , – the model may tend to prefer the shorter and more comp
act incomplete relations.
• If we ignore instances of this form, – the data would be heavily skewed towards negative insta
nces.
![Page 8: 1 Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE Ryan McDonald Fernando Pereira Seth Kulick CIS and IRCS, University](https://reader036.vdocuments.us/reader036/viewer/2022070306/5519d0755503468b0c8b47bc/html5/thumbnails/8.jpg)
8
Methods : Classifying Binary Relations-2/3
• Solution :– The set of all possible pairs is much smaller then the set of
all possible complex relation instances.
– To train a classifier to identify pairs of related entities.
• Positive : – (John,CEO), (John, Inc. Corp.), (CEO, In
c. Corp.), (CEO, Biz. Corp.), (Jane,CEO) and (Jane, Biz. Corp.).
• Negative :– (John, Biz. Corp.) and (Jane, Inc. Corp.)
![Page 9: 1 Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE Ryan McDonald Fernando Pereira Seth Kulick CIS and IRCS, University](https://reader036.vdocuments.us/reader036/viewer/2022070306/5519d0755503468b0c8b47bc/html5/thumbnails/9.jpg)
9
Methods : Classifying Binary Relations-3/3
• Learning a binary relation classifier :– A standard maximum entropy classifier (Berger et
al., 1996) implemented as part of MALLET (McCallum, 2002)
![Page 10: 1 Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE Ryan McDonald Fernando Pereira Seth Kulick CIS and IRCS, University](https://reader036.vdocuments.us/reader036/viewer/2022070306/5519d0755503468b0c8b47bc/html5/thumbnails/10.jpg)
10
Methods : Reconstructing Complex Relations
• Example : According to binary classifier– (John,CEO), (John, Inc. Corp.), (John, Biz. Corp.), (CEO, Inc. Corp.),
(CEO, Biz. Corp.) and (Jane,CEO). – Relation Graph : Figure 2a– Cliques : Figure 2b
• Algorithm for finding all maximal cliques :– Born and Kerbosch, 1973
![Page 11: 1 Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE Ryan McDonald Fernando Pereira Seth Kulick CIS and IRCS, University](https://reader036.vdocuments.us/reader036/viewer/2022070306/5519d0755503468b0c8b47bc/html5/thumbnails/11.jpg)
11
Methods : Probabilistic Cliques
• The above approach has a major shortcoming in that it assumes the output of the binary classifier to be absolutely correct.
• Weight of a clique (C)
– w(e) : weight (probabilistic) of edge e
• A vaild tuple : (C) 0.5
![Page 12: 1 Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE Ryan McDonald Fernando Pereira Seth Kulick CIS and IRCS, University](https://reader036.vdocuments.us/reader036/viewer/2022070306/5519d0755503468b0c8b47bc/html5/thumbnails/12.jpg)
12
Experiments-1/2• Extracting genomic variation events from biomedical
text (Mcdonal et al., 2004)• (var-type, location, initial-state, altered-state)
– “At codons 12 and 61, the occurrence of point mutations from G/A to T/G were observed”
– (point mutation, codon 12, G, T)– (point mutation, codon 61, A, G)
• 447 abstracts selected from MEDLINE– 4691 sentences– 4773 entities and 1218 relations– Of the 1218 relations :
• 760 have two , 283 have one , 175 have no arguments• 38% cannot be handled using binary relations• 4% of the relations annotated are non-sentential • Maximum recall : 96%
![Page 13: 1 Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE Ryan McDonald Fernando Pereira Seth Kulick CIS and IRCS, University](https://reader036.vdocuments.us/reader036/viewer/2022070306/5519d0755503468b0c8b47bc/html5/thumbnails/13.jpg)
13
Experiments-2/2
• MC: – Uses the maximum entropy binary classifier coupl
ed with the maximal clique complex relation reconstructor.
• PC: – Same as above, except it uses the probabilistic cliq
ue complex relation reconstructor.
• NE: – A maximum entropy classifier that naively enumer
ates all possible relation instances as described in Page 7.
![Page 14: 1 Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE Ryan McDonald Fernando Pereira Seth Kulick CIS and IRCS, University](https://reader036.vdocuments.us/reader036/viewer/2022070306/5519d0755503468b0c8b47bc/html5/thumbnails/14.jpg)
14
Experiments : Results-1/2
![Page 15: 1 Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE Ryan McDonald Fernando Pereira Seth Kulick CIS and IRCS, University](https://reader036.vdocuments.us/reader036/viewer/2022070306/5519d0755503468b0c8b47bc/html5/thumbnails/15.jpg)
15
Experiments : Results-2/2
![Page 16: 1 Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE Ryan McDonald Fernando Pereira Seth Kulick CIS and IRCS, University](https://reader036.vdocuments.us/reader036/viewer/2022070306/5519d0755503468b0c8b47bc/html5/thumbnails/16.jpg)
16
Conclusions and Future Work
• Complex relation extraction:– Binary relation learning: Maximum Entropy
Classifier – Finding maximal cliques in graph– Genomic variation relations
• Future work– Parse trees– Learn how to cluster vertices into relational groups– A vertex/entity can participate in one or more
relation
![Page 17: 1 Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE Ryan McDonald Fernando Pereira Seth Kulick CIS and IRCS, University](https://reader036.vdocuments.us/reader036/viewer/2022070306/5519d0755503468b0c8b47bc/html5/thumbnails/17.jpg)
17
• Learning Field Compatibilities to Extract Database Records from Unstructured Text – M Wick, A Culotta, A McCallum - (EMNLP 2006)
• Using Dependency Parsing and Probabilistic Inference to Extract Rela-tionships between Genes – B Goertzel, H Pinto, A Heljakka, IF Goertzel, M –(B
ioNLP 2006)
• Relation Extraction for Semantic Intranet Annotations – L Specia, C Baldassarre, E Motta - kmi.open.ac.uk – Relation Extraction for Semantic Intranet Annotatio
ns Technical Report