named entity recognition in tweets: an experimental study alan ritter sam clark mausam oren etzioni...
TRANSCRIPT
![Page 1: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/1.jpg)
Named Entity Recognition In Tweets: An Experimental Study
Alan RitterSam ClarkMausam
Oren EtzioniUniversity of Washington
![Page 2: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/2.jpg)
Information Extraction:Motivation
Status Updates = short realtime messagesLow Overhead: Can be created quickly• Even on mobile devices
Realtime: users report events in progress• Often the most up-to date source of information
Huge Volume of Users• People Tweet about things they find interesting• Can use redundancy as a measure of importance
![Page 3: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/3.jpg)
Information Extraction:Motivation
Status Updates = short realtime messagesLow Overhead: Can be created quickly• Even on mobile devices
Realtime: users report events in progress• Often the most up-to date source of information
Huge Volume of Users• People Tweet about things they find interesting• Can use redundancy as a measure of importance
![Page 4: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/4.jpg)
Related Work (Applications)• Extracting music performers and locations– (Benson et. al 2011)
• Predicting Polls• (O’Connor et. al. 2010)
• Product Sentiment• (Brody et. al. 2011)
• Outbreak detection– (Aramaki et. al. 2011)
![Page 5: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/5.jpg)
Outline
• Motivation• Error Analysis of Off The Shelf Tools• POS Tagger• Named Entity Segmentation• Named Entity Classification– Distant Supervision Using Topic Models
• Tools available: https://github.com/aritter/twitter_nlp
![Page 6: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/6.jpg)
Off The Shelf NLP Tools Fail
![Page 7: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/7.jpg)
Off The Shelf NLP Tools Fail
Twitter Has Noisy & Unique Style
![Page 8: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/8.jpg)
Noisy Text: Challenges
• Lexical Variation (misspellings, abbreviations)– `2m', `2ma', `2mar', `2mara', `2maro', `2marrow', `2mor', `2mora', `2moro', `2morow',
`2morr', `2morro', `2morrow', `2moz', `2mr', `2mro', `2mrrw', `2mrw', `2mw', `tmmrw', `tmo', `tmoro', `tmorrow', `tmoz', `tmr', `tmro', `tmrow', `tmrrow', `tmrrw', `tmrw', `tmrww', `tmw', `tomaro', `tomarow', `tomarro', `tomarrow', `tomm', `tommarow', `tommarrow', `tommoro', `tommorow', `tommorrow', `tommorw', `tommrow', `tomo', `tomolo', `tomoro', `tomorow', `tomorro', `tomorrw', `tomoz', `tomrw', `tomz‘
• Unreliable Capitalization– “The Hobbit has FINALLY started filming! I cannot wait!”
• Unique Grammar– “watchng american dad.”
![Page 9: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/9.jpg)
PART OF SPEECH TAGGING
![Page 10: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/10.jpg)
Part Of Speech Tagging: Accuracy Drops on Tweets
• Most Common Tag : 76% (90% on brown corpus)• Stanford POS : 80% (97% on news)
![Page 11: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/11.jpg)
Part Of Speech Tagging: Accuracy Drops on Tweets
• Most Common Tag : 76% (90% on brown corpus)• Stanford POS : 80% (97% on news)• Most Common Errors:– Confusing Common/Proper nouns– Misclassifying interjections as nouns– Misclassifying verbs as nouns
![Page 12: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/12.jpg)
POS Tagging
• Labeled 800 tweets w/ POS tags– About 16,000 tokens
• Also used labeled news + IRC chat data (Forsyth and Martell 07)
• CRF + Standard set of features– Contextual– Dictionary– Orthographic
![Page 13: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/13.jpg)
Results
![Page 14: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/14.jpg)
NN/NNP UH/NN VB/NN NNP/NN UH/NNP0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
Error
StanfordT-POS
XX/YY = XX is misclassified as YY
![Page 15: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/15.jpg)
Named Entity Segmentation
• Off the shelf taggers perform poorly• Stanford NER: F1=0.44
not including classification
![Page 16: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/16.jpg)
Named Entity Segmentation
• Off the shelf taggers perform poorly• Stanford NER: F1=0.44
not including classification
![Page 17: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/17.jpg)
Annotating Named Entities
• Annotated 2400 tweets (about 34K tokens)• Train on in-domain data
![Page 18: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/18.jpg)
Learning
• Sequence Labeling Task• IOB encoding
• Conditional Random Fields • Features:– Orthographic– Dictionaries– Contextual
Word Label T-Mobile B-ENTITY
to O
release O
Dell B-ENTITY
Streak I-ENTITY
7 I-ENTITY
on O
Feb O
2nd O
![Page 19: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/19.jpg)
Performance (Segmentation Only)
![Page 20: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/20.jpg)
NAMED ENTITY CLASSIFICATION
![Page 21: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/21.jpg)
Challenges
• Plethora of distinctive, infrequent types– Bands, Movies, Products, etc…– Very Little training data for these– Can’t simply rely on supervised classification
• Very terse (often contain insufficient context)
![Page 22: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/22.jpg)
Challenges
• Plethora of distinctive, infrequent types– Bands, Movies, Products, etc…– Very Little training data for these– Can’t simply rely on supervised classification
• Very terse (often contain insufficient context)
![Page 23: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/23.jpg)
Weakly Supervised NE Classification(Collins and Singer 99) (Etzioni et. al. 05) (Kozareva 06)
• Freebase lists provide a source of supervision• But entities often appear in many different
lists, for example “China” could be:– A country– A band– A person (member of the band “metal boys”)– A film (released in 1943)
![Page 24: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/24.jpg)
Weakly Supervised NE Classification(Collins and Singer 99) (Etzioni et. al. 05) (Kozareva 06)
• Freebase lists provide a source of supervision• But entities often appear in many different
lists, for example “China” could be:– A country– A band– A person (member of the band “metal boys”)– A film (released in 1943) We need Some way
to disambiguate
![Page 25: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/25.jpg)
Distant Supervision With Topic Models
• Treat each entity as a “document”– Words in document are those which co-occur with
entity• LabeledLDA (Ramage et. al. 2009)– Constrained Topic Model– Each entity is associated with a distribution over
topics• Constrained based on FB dictionaries
– Each topic is associated with a type (in Freebase)
![Page 26: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/26.jpg)
26
Generative Story
![Page 27: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/27.jpg)
27
For each type, pick a random
distribution over words
Generative Story
![Page 28: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/28.jpg)
28
Type 1: TEAM P(victory|T1)= 0.02 P(played|T1)= 0.01 …
Type 2: LOCATION P(visiting|T2)=0.05 P(airport|T2)=0.02 …
For each type, pick a random
distribution over words
Generative Story
![Page 29: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/29.jpg)
29
Type 1: TEAM P(victory|T1)= 0.02 P(played|T1)= 0.01 …
Type 2: LOCATION P(visiting|T2)=0.05 P(airport|T2)=0.02 …
For each type, pick a random
distribution over words
For each entity, pick a distribution
over types
(constrained by Freebase)
Generative Story
![Page 30: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/30.jpg)
30
Type 1: TEAM P(victory|T1)= 0.02 P(played|T1)= 0.01 …
Type 2: LOCATION P(visiting|T2)=0.05 P(airport|T2)=0.02 …
Seattle P(TEAM|Seattle)= 0.6 P(LOCATION|Seattle)= 0.4
For each type, pick a random
distribution over words
For each entity, pick a distribution
over types
(constrained by Freebase)
Generative Story
![Page 31: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/31.jpg)
31
Type 1: TEAM P(victory|T1)= 0.02 P(played|T1)= 0.01 …
Type 2: LOCATION P(visiting|T2)=0.05 P(airport|T2)=0.02 …
Seattle P(TEAM|Seattle)= 0.6 P(LOCATION|Seattle)= 0.4
For each type, pick a random
distribution over words
For each entity, pick a distribution
over types
(constrained by Freebase)
For each position, first
pick a type
Generative Story
![Page 32: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/32.jpg)
32
Type 1: TEAM P(victory|T1)= 0.02 P(played|T1)= 0.01 …
Type 2: LOCATION P(visiting|T2)=0.05 P(airport|T2)=0.02 …
Seattle P(TEAM|Seattle)= 0.6 P(LOCATION|Seattle)= 0.4
Is a TEAM
For each type, pick a random
distribution over words
For each entity, pick a distribution
over types
(constrained by Freebase)
For each position, first
pick a type
Generative Story
![Page 33: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/33.jpg)
33
Type 1: TEAM P(victory|T1)= 0.02 P(played|T1)= 0.01 …
Type 2: LOCATION P(visiting|T2)=0.05 P(airport|T2)=0.02 …
Seattle P(TEAM|Seattle)= 0.6 P(LOCATION|Seattle)= 0.4
Is a TEAM
For each type, pick a random
distribution over words
For each entity, pick a distribution
over types
(constrained by Freebase)
For each position, first
pick a type
Then pick an word based on
type
Generative Story
![Page 34: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/34.jpg)
34
Type 1: TEAM P(victory|T1)= 0.02 P(played|T1)= 0.01 …
Type 2: LOCATION P(visiting|T2)=0.05 P(airport|T2)=0.02 …
Seattle P(TEAM|Seattle)= 0.6 P(LOCATION|Seattle)= 0.4
Is a TEAM
victory
For each type, pick a random
distribution over words
For each entity, pick a distribution
over types
(constrained by Freebase)
For each position, first
pick a type
Then pick an word based on
type
Generative Story
![Page 35: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/35.jpg)
35
Type 1: TEAM P(victory|T1)= 0.02 P(played|T1)= 0.01 …
Type 2: LOCATION P(visiting|T2)=0.05 P(airport|T2)=0.02 …
Seattle P(TEAM|Seattle)= 0.6 P(LOCATION|Seattle)= 0.4
Is a TEAM
victory
Is a LOCATION
For each type, pick a random
distribution over words
For each entity, pick a distribution
over types
(constrained by Freebase)
For each position, first
pick a type
Then pick an word based on
type
Generative Story
![Page 36: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/36.jpg)
36
Type 1: TEAM P(victory|T1)= 0.02 P(played|T1)= 0.01 …
Type 2: LOCATION P(visiting|T2)=0.05 P(airport|T2)=0.02 …
Seattle P(TEAM|Seattle)= 0.6 P(LOCATION|Seattle)= 0.4
Is a TEAM
victory
Is a LOCATION
airport
For each type, pick a random
distribution over words
For each entity, pick a distribution
over types
(constrained by Freebase)
For each position, first
pick a type
Then pick an word based on
type
Generative Story
![Page 37: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/37.jpg)
Data/Inference
• Gather entities and words which co-occur– Extract Entities from about 60M status messages
• Used a set of 10 types from Freebase– Commonly occur in Tweets– Good coverage in Freebase
• Inference: Collapsed Gibbs sampling:– Constrain types using Freebase– For entities not in Freebase, don’t constrain
![Page 38: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/38.jpg)
Type Lists
![Page 39: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/39.jpg)
Type Lists
• KKTNY = Kourtney and Kim Take New York• RHOBH = Real Housewives of Beverly Hills
![Page 40: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/40.jpg)
Evaluation
• Manually Annotated the 2,400 tweets with the 10 entity types– Only used for testing purposes– No labeled examples for LLDA & Cotraining
![Page 41: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/41.jpg)
Classification Results: 10 Types(Gold Segmentation)
Majo
rity B
aselin
e
Freebase
Baselin
e
Supervi
sed Base
line
DL-Cotra
in
LabeledLD
A0
0.10.20.30.40.50.60.7
F1
![Page 42: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/42.jpg)
Classification Results: 10 Types(Gold Segmentation)
Majo
rity B
aselin
e
Freebase
Baselin
e
Supervi
sed Base
line
DL-Cotra
in
LabeledLD
A0
0.10.20.30.40.50.60.7
F1Precision =0.85Recall=0.24
![Page 43: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/43.jpg)
Classification Results: 10 Types(Gold Segmentation)
Majo
rity B
aselin
e
Freebase
Baselin
e
Supervi
sed Base
line
DL-Cotra
in
LabeledLD
A0
0.10.20.30.40.50.60.7
F1
![Page 44: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/44.jpg)
Why is LDA winning?
• Share type info. across mentions– Unambiguous mentions help to disambiguate– Unlabeled examples provide entity-specific prior
• Explicitly models ambiguity– Each “entity string” is modeled as (constrained)
distribution over types– Takes better advantage of ambiguous training data
![Page 45: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/45.jpg)
Segmentation + Classification
![Page 46: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/46.jpg)
Related Work
• Named Entity Recognition– (Liu et. al. 2011)
• POS Tagging– (Gimpel et. al. 2011)
![Page 47: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/47.jpg)
Calendar Demo
http://statuscalendar.com
• Extract Entities from millions of Tweets– Using NER trained on Labeled Tweets
• Extract and Resolve Temporal Expressions– For example “Next Friday” = 02-24-11
• Count Entity/Day co-occurrences– G2 Log Likelihood Ratio
• Plot Top 20 Entities for Each Day
![Page 48: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/48.jpg)
Contributions
• Analysis of challenges in noisy text• Adapted NLP tools to Twitter• Distant Supervision using Topic Models• Tools available:
https://github.com/aritter/twitter_nlp
![Page 49: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/49.jpg)
Contributions
• Analysis of challenges in noisy text• Adapted NLP tools to Twitter• Distant Supervision using Topic Models• Tools available:
https://github.com/aritter/twitter_nlp
THANKS!
![Page 50: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/50.jpg)
Classification Results(Gold Segmentation)
![Page 51: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/51.jpg)
Classification Results By Type(Gold Segmentation)
![Page 52: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/52.jpg)
Performance (Segmentation Only)
Stanford NER T-Seg T-Seg (T-Pos) T-Seg (All Features)0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
F1 Score
![Page 53: Named Entity Recognition In Tweets: An Experimental Study Alan Ritter Sam Clark Mausam Oren Etzioni University of Washington](https://reader035.vdocuments.us/reader035/viewer/2022070407/56649e315503460f94b22818/html5/thumbnails/53.jpg)
Part Of Speech Tagging: Accuracy Drops on Tweets
• Most Common Tag : 76% (90% on brown corpus)• Stanford POS : 80% (97% on news)
NN/NNP UH/NN VB/NN NNP/NN UH/NNP0
0.050.1
0.150.2
0.250.3
0.350.4
0.45
Error
Stanford