![Page 1: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/1.jpg)
Page 1
INARC Report
Dan Roth, UIUC
March 2011
Local and Global Algorithms for
Disambiguation to Wikipedia
Lev Ratinov & Dan RothDepartment of Computer ScienceUniversity of Illinois at Urbana-Champaign
![Page 2: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/2.jpg)
INARC Activities I: Dan Roth, UIUC
I1.1: Fundamentals of Context-aware Real-time Data Fusion Advances in Learning & Inference of Constrained Conditional Models
CCM: A computational framework for learning and inference with interdependent variables in constrained settings.
Formulating Information Fusion as CCMs; Preliminary theoretical and experimental work on Information Fusion Key Publications:
R. Samdani and D. Roth, Efficient Learning for Constrained Structured Prediction, submitted. G. Kundu, D. Roth and R. Samdani, Constrained Classification Models for Information Fusion,
submitted. M. Chang, M. Connor and D. Roth, The Necessity of Combining Adaptation Methods,
EMNLP’10. M. Chang, V. Srikumar, D. Goldwasser and D. Roth, Structured Output Learning with Indirect
Supervision, ICML’10. M. Chang, D. Goldwasser, D. Roth and V. Srikumar, Discriminative Learning over Constrained
Latent Representations, NAACL’10
2
![Page 3: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/3.jpg)
3
I3.2: Modeling and Mining of Text-Rich Information Networks Large heterogeneous information networks of structured and unstructured data. State-of-the-art algorithmic tools for knowledge acquisition and information extraction, using
content & structure of the network. Make use of both explicit network structure and hidden `ontological’ structure (e.g., category
structure). Acquire and extract information from heterogeneous information networks when data is noisy,
volatile, uncertain, and incomplete Key Publications:
Lev Ratinov, Doug Downey, Mike Anderson, Dan Roth, Local and Global Algorithms for Disambiguation to Wikipedia , ACL’11,
Q. Do and D. Roth, Constraints based Taxonomic Relation Classification, EMNLP’10 Y. Chan and D. Roth, Exploiting Background Knowledge for Relation Extraction, COLING’10 Y. Chan and D. Roth, Exploiting Syntactico-Semantic Structures for Relation Extraction, ACL’11 J. Pasternack and D. Roth, Knowing What to Believe (when you already know something), COLING’10, J. Pasternack and Dan Roth, Generalized Fact-Finding, WWW’10. J. Pasternack and Dan Roth, Comprehensive Trust Metrics for Information Networks , Army Science
Conference‘10
INARC activities II: Dan Roth, UIUC
![Page 4: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/4.jpg)
Page 4
INARC Report
Dan Roth, UIUC
March 2011
Local and Global Algorithms for
Disambiguation to Wikipedia
Lev Ratinov & Dan RothDepartment of Computer ScienceUniversity of Illinois at Urbana-Champaign
![Page 5: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/5.jpg)
Information overload
5
![Page 6: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/6.jpg)
Organizing knowledge
6
It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.
Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997..
Chicago VIII was one of the early 70s-era Chicago albums to catch myear, along with Chicago II.
![Page 7: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/7.jpg)
Cross-document co-reference resolution
7
It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.
Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997..
Chicago VIII was one of the early 70s-era Chicago albums to catch myear, along with Chicago II.
![Page 8: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/8.jpg)
Reference resolution: (disambiguation to Wikipedia)
8
It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.
Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997..
Chicago VIII was one of the early 70s-era Chicago albums to catch myear, along with Chicago II.
![Page 9: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/9.jpg)
The “reference” collection has structure
9
It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.
Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997..
Chicago VIII was one of the early 70s-era Chicago albums to catch myear, along with Chicago II.
Used_In
Is_aIs_a
Succeeded
Released
![Page 10: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/10.jpg)
Analysis of Information Networks
10
It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.
Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997..
Chicago VIII was one of the early 70s-era Chicago albums to catch myear, along with Chicago II.
![Page 11: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/11.jpg)
Here – Wikipedia as a knowledge resource …. but we can use other resources
11
Used_In
Is_aIs_a
Succeeded
Released
![Page 12: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/12.jpg)
Talk outline
High-level algorithmic approach. Bi-partite graph matching with global and local inference.
Local Inference. Experiments & Results
Global Inference. Experiments & Results
Results, Conclusions
Demo
12
![Page 13: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/13.jpg)
Problem formulation - matching/ranking problem
13
Text Document(s)—News, Blogs,…
Wikipedia Articles
![Page 14: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/14.jpg)
Local approach
14
Γ is a solution to the problem A set of pairs (m,t)
m: a mention in the document t: the matched Wikipedia title
Text Document(s)—News, Blogs,…
Wikipedia Articles
![Page 15: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/15.jpg)
Local approach
15
Γ is a solution to the problem A set of pairs (m,t)
m: a mention in the document t: the matched Wikipedia title
Local score of matchingthe mention to the title
Text Document(s)—News, Blogs,…
Wikipedia Articles
![Page 16: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/16.jpg)
Local + Global : using the Wikipedia structure
16
A “global” term – evaluating how good the structure of
the solution is
Text Document(s)—News, Blogs,…
Wikipedia Articles
![Page 17: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/17.jpg)
Can be reduced to an NP-hard problem
17
Text Document(s)—News, Blogs,…
Wikipedia Articles
![Page 18: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/18.jpg)
A tractable variation
18
1. Invent a surrogate solution Γ’; • disambiguate each mention
independently.2. Evaluate the structure based on pair-
wise coherence scores Ψ(ti,tj)
Text Document(s)—News, Blogs,…
Wikipedia Articles
![Page 19: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/19.jpg)
Talk outline
High-level algorithmic approach. Bi-partite graph matching with global and local inference.
Local Inference. Experiments & Results
Global Inference. Experiments & Results
Results, Conclusions
Demo
19
![Page 20: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/20.jpg)
I. Baseline : P(Title|Surface Form)
20
P(Title|”Chicago”)
![Page 21: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/21.jpg)
II. Context(Title)
21
Context(Charcoal)+=“a font called __ is used to”
![Page 22: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/22.jpg)
III. Text(Title)
22
Just the text of the page (one per title)
![Page 23: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/23.jpg)
Putting it all together
City Vs Font: (0.99-0.0001, 0.01-0.2, 0.03-0.01) Band Vs Font: (0.001-0.0001, 0.001-0.2, 0.02-0.01) Training a ranking SVM:
Consider all title pairs. Train a ranker on the pairs (learn to prefer the correct solution). Inference = knockout tournament. Key: Abstracts over the text – learns which scores are important.
23
ScoreBaseline
ScoreContext
ScoreText
Chicago_city 0.99 0.01 0.03
Chicago_font 0.0001 0.2 0.01
Chicago_band 0.001 0.001 0.02
![Page 24: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/24.jpg)
Example: font or city?
24
It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.
Text(Chicago_city), Context(Chicago_city)
Text(Chicago_font), Context(Chicago_font)
![Page 25: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/25.jpg)
Lexical matching
25
It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.
Text(Chicago_city), Context(Chicago_city)
Text(Chicago_font), Context(Chicago_font)
Cosine similarity,TF-IDF weighting
![Page 26: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/26.jpg)
Ranking – font vs. city
26
It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.
Text(Chicago_city), Context(Chicago_city)
Text(Chicago_font), Context(Chicago_font)
0.5 0.2 0.1 0.8
0.3 0.2 0.3 0.5
![Page 27: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/27.jpg)
Train a ranking SVM
27
It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.
Text(Chicago_city), Context(Chicago_city)
Text(Chicago_font), Context(Chicago_font)
(0.5, 0.2 , 0.1, 0.8)
(0.3, 0.2, 0.3, 0.5)
[(0.2, 0, -0.2, 0.3), -1]
![Page 28: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/28.jpg)
Scaling issues – one of our key contributions
28
It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.
Text(Chicago_city), Context(Chicago_city)
Text(Chicago_font), Context(Chicago_font)
![Page 29: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/29.jpg)
Scaling issues
29
It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.
Text(Chicago_city), Context(Chicago_city)
Text(Chicago_font), Context(Chicago_font)
This stuff is big, and is loaded into the memory
from the disk
![Page 30: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/30.jpg)
Improving performance
30
It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.
Text(Chicago_city), Context(Chicago_city)
Text(Chicago_font), Context(Chicago_font)
Rather than computing TF-IDF weighted cosine
similarity, we want to train a classifier on the fly. But
due to the aggressive feature pruning, we
choose PrTFIDF
![Page 31: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/31.jpg)
Performance (local only): ranking accuracy
31
Dataset Baseline(solvable)
+Local TFIDF(solvable)
+Local PrTFIDF(solvable)
ACE 94.05 95.67 96.21
MSN News 81.91 84.04 85.10
AQUAINT 93.19 94.38 95.57
Wikipedia Test 85.88 92.76 93.59
![Page 32: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/32.jpg)
Talk outline
High-level algorithmic approach. Bi-partite graph matching with global and local inference.
Local Inference. Experiments & Results
Global Inference. Experiments & Results
Results, Conclusions
Demo
32
![Page 33: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/33.jpg)
Co-occurrence(Title1,Title2)
33
The city senses of Boston and Chicago
appear together often.
![Page 34: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/34.jpg)
Co-occurrence(Title1,Title2)
34
Rock music and albums appear together often
![Page 35: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/35.jpg)
Global ranking
How to approximate the “global semantic context” in the document”? (What is Γ’?) Use only non-ambiguous mentions for Γ’ Use the top baseline disambiguation for NER surface forms. Use the top baseline disambiguation for all the surface forms.
How to define relatedness between two titles? (What is Ψ?)
35
![Page 36: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/36.jpg)
Ψ : Pair-wise relatedness between 2 titles:
Normalized Google Distance
Pointwise Mutual Information
36
![Page 37: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/37.jpg)
What is best the Γ’? (ranker accuracy, solvable mentions)
37
Dataset Baseline Baseline+Lexical
Baseline+GlobalUnambiguous
Baseline+GlobalNER
Baseline+Global, AllMentions
ACE 94.05 94.56 96.21 96.75
MSN News 81.91 84.46 84.04 88.51
AQUAINT 93.19 95.40 94.04 95.91
Wikipedia Test 85.88 89.67 89.59 89.79
![Page 38: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/38.jpg)
Results – ranker accuracy (solvable mentions)
38
Dataset Baseline Baseline+Lexical
Baseline+GlobalUnambiguous
Baseline+GlobalNER
Baseline+Global, AllMentions
ACE 94.05 96.21 96.75
MSN News 81.91 85.10 88.51
AQUAINT 93.19 95.57 95.91
Wikipedia Test 85.88 93.59 89.79
![Page 39: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/39.jpg)
Results: Local + Global
39
Dataset Baseline Baseline+Lexical
Baseline+Lexical+Global
ACE 94.05 96.21 97.83
MSN News 81.91 85.10 87.02
AQUAINT 93.19 95.57 94.38
Wikipedia Test 85.88 93.59 94.18
![Page 40: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/40.jpg)
Talk outline
High-level algorithmic approach. Bi-partite graph matching with global and local inference.
Local Inference. Experiments & Results
Global Inference. Experiments & Results
Results, Conclusions
Demo
40
![Page 41: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/41.jpg)
Conclusions:
Dealing with a very large scale knowledge acquisition and extraction problem
State-of-the-art algorithmic tools that exploit using content & structure of the network.
Formulated a framework for Local & Global reference resolution and disambiguation into knowledge networks
Proposed local and global algorithms: state of the art performance. Addressed scaling issue: a major issue. Identified key remaining challenges (next slide).
41
![Page 42: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/42.jpg)
Future: We want to know what we don’t know
Not dealt well in the literature “As Peter Thompson, a 16-year-old hunter, said ..” “Dorothy Byrne, a state coordinator for the Florida Green Party…”
We train a separate SVM classifier to identify such cases. The features are: All the baseline, lexical and semantic scores of the top candidate. Score assigned to the top candidate by the ranker. The “confidence” of the ranker on the top candidate with respect to
second-best disambiguation. Good-Turing probability of out-of-Wikipedia occurrence for the
mention. Limited success; future research.
42
![Page 43: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/43.jpg)
Comparison to the previous state of the art (all mentions, including OOW)
43
Dataset Baseline Milne&Witten Our System-GLOW
ACE 69.52 72.76 77.25
MSN News 72.83 68.49 74.88
AQUAINT 82.64 83.61 83.94
Wikipedia Test 81.77 80.32 90.54
![Page 44: Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062409/5697bf8b1a28abf838c8aeeb/html5/thumbnails/44.jpg)
Demo
44