![Page 1: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/1.jpg)
DAY 8: INFORMATION EXTRACTION ANDADVANCED STATISTICAL NLP
PART II: ADVANCED STATISTICAL NLP
Mark Granroth-Wilding
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 1/44
![Page 2: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/2.jpg)
OUTLINE
Statistical NLP
Advanced Statistical Methods
Closing remarks
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 2/44
![Page 3: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/3.jpg)
OUTLINE
Statistical NLP
Advanced Statistical Methods
Closing remarks
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 3/44
![Page 4: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/4.jpg)
REMINDER: AMBIGUITY OFINTERPRETATION
What is the mean temperature inKumpula?
SELECT day_mean FROM daily_forecast:
WHERE station = ’Helsinki Kumpula’:
AND date = ’2019-05-21’;
SELECT day_mean FROM weekly_forecast
WHERE station = ’Helsinki Kumpula’
AND week = ’w22’;
SELECT MEAN(day_temp)
FROM weather_history
WHERE station = ’Helsinki Kumpula’
AND year = ’2019’;
. . .?• Many forms of ambiguity
• Every level/step of analysis
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 4/44
![Page 5: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/5.jpg)
REMINDER: AMBIGUITY INSYNTAX
S
NP
PropN
Alice
VP
VP
Vt
saw
NP
Det
Art
the
N
duck
PP
Prep
with
NP
Det
the
N
telescope
S
NP
PropN
Alice
VP
Vt
saw
NP
NP
Det
Art
the
N
duck
PP
Prep
with
NP
Det
the
N
telescope
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 5/44
![Page 6: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/6.jpg)
REMINDER: STATISTICAL MODELS
• Statistics over previous analyses can help estimate confidences
• Often use probabilistic models
• Local ambiguity: probabilities/confidences
• Multiple hypotheses about meaning/structure
• Update hypotheses as larger units are combined
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 6/44
![Page 7: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/7.jpg)
STATISTICS IN NLP
• Statistical models hit NLP in 1990s
• Now almost ubiquitous
• Seen already: rule-based systems augmented with statisticsE.g.: PCFG
• Also: derive rules from dataE.g. treebank parser
• So far: carefully defined models for specific sub-tasks
• Derived from linguistic theory
• Some exceptions: word embeddings, topic models
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 7/44
![Page 8: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/8.jpg)
STATISTICS IN NLP
• Statistical models hit NLP in 1990s
• Now almost ubiquitous
• Seen already: rule-based systems augmented with statisticsE.g.: PCFG
• Also: derive rules from dataE.g. treebank parser
• So far: carefully defined models for specific sub-tasks
• Derived from linguistic theory
• Some exceptions: word embeddings, topic models
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 7/44
![Page 9: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/9.jpg)
STATISTICS IN NLP
• Statistical models hit NLP in 1990s
• Now almost ubiquitous
• Seen already: rule-based systems augmented with statisticsE.g.: PCFG
• Also: derive rules from dataE.g. treebank parser
• So far: carefully defined models for specific sub-tasks
• Derived from linguistic theory
• Some exceptions: word embeddings, topic models
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 7/44
![Page 10: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/10.jpg)
STATISTICS IN NLP
• Statistical models hit NLP in 1990s
• Now almost ubiquitous
• Seen already: rule-based systems augmented with statisticsE.g.: PCFG
• Also: derive rules from dataE.g. treebank parser
• So far: carefully defined models for specific sub-tasks
• Derived from linguistic theory
• Some exceptions: word embeddings, topic models
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 7/44
![Page 11: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/11.jpg)
STATISTICS IN NLP
• Statistical models hit NLP in 1990s
• Now almost ubiquitous
• Seen already: rule-based systems augmented with statisticsE.g.: PCFG
• Also: derive rules from dataE.g. treebank parser
• So far: carefully defined models for specific sub-tasks
• Derived from linguistic theory
• Some exceptions: word embeddings, topic models
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 7/44
![Page 12: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/12.jpg)
STATISTICS IN NLP
• Statistical models hit NLP in 1990s
• Now almost ubiquitous
• Seen already: rule-based systems augmented with statisticsE.g.: PCFG
• Also: derive rules from dataE.g. treebank parser
• So far: carefully defined models for specific sub-tasks
• Derived from linguistic theory
• Some exceptions: word embeddings, topic models
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 7/44
![Page 13: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/13.jpg)
STATISTICS IN NLP
• Statistical models hit NLP in 1990s
• Now almost ubiquitous
• Seen already: rule-based systems augmented with statisticsE.g.: PCFG
• Also: derive rules from dataE.g. treebank parser
• So far: carefully defined models for specific sub-tasks
• Derived from linguistic theory
• Some exceptions: word embeddings, topic models
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 7/44
![Page 14: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/14.jpg)
DISAMBIGUATION WITH A PCFG
S
NP
PropN
Alice
VP
VP
Vt
saw
NP
Det
Art
the
N
duck
PP
Prep
with
NP
Det
Art
the
N
telescope
0.2
1.0
1.0 0.7
0.5
0.55
0.6 0.7
0.5
0.45
1.0
0.35
0.65
0.5
1.0
0.5
p(t) = 2.07× 10−4
S
NP
PropN
Alice
VP
Vt
saw
NP
NP
Det
Art
the
N
duck
PP
Prep
with
NP
Det
Art
the
N
telescope
0.2
1.0 1.0 0.7
0.5
0.55 0.6 0.7
0.5
0.45
1.0
0.65
0.3
0.5 1.0
0.5
p(t) = 2.96× 10−4
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 8/44
![Page 15: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/15.jpg)
PCFGs
Advantages:
• Linguistic structures (relatively) easy to understand
• Simple statistical model, easy to estimate (treebank)
• Further learning by semi-supervised estimation
• Quite efficient parsing with beam search
Disadvantages:
• Not expressive enough to capture all dependencies
• Not much like syntactic formalisms linguists use
• Statistical model limited• Independence assumptions• Can extend, efficient parsing becomes complicated
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 9/44
![Page 16: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/16.jpg)
PCFGs
Advantages:
• Linguistic structures (relatively) easy to understand
• Simple statistical model, easy to estimate (treebank)
• Further learning by semi-supervised estimation
• Quite efficient parsing with beam search
Disadvantages:
• Not expressive enough to capture all dependencies
• Not much like syntactic formalisms linguists use
• Statistical model limited• Independence assumptions• Can extend, efficient parsing becomes complicated
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 9/44
![Page 17: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/17.jpg)
MORE DATA, LESS LINGUISTICS
• Can use more advanced models• For parsing and other tasks
• More sophisticated modelling/learning/inference
• Use less linguistic motivation and knowledge
• E.g. neural end-to-end NLG architecture
• Potential advantages:• Learn from data instead of specifying by hand
E.g. tools for new languages• Discover new phenomena we didn’t know about• Learn structures too complex to write by hand• Maybe don’t care about linguistics, psycholinguistics
How the brainprocesses language
:just do well on task
• Help with long tail in some cases: e.g. rare words
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 10/44
![Page 18: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/18.jpg)
MORE DATA, LESS LINGUISTICS
• Can use more advanced models• For parsing and other tasks
• More sophisticated modelling/learning/inference
• Use less linguistic motivation and knowledge
• E.g. neural end-to-end NLG architecture
• Potential advantages:• Learn from data instead of specifying by hand
E.g. tools for new languages• Discover new phenomena we didn’t know about• Learn structures too complex to write by hand• Maybe don’t care about linguistics, psycholinguistics
How the brainprocesses language
:just do well on task
• Help with long tail in some cases: e.g. rare words
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 10/44
![Page 19: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/19.jpg)
MORE DATA, LESS LINGUISTICS
• Can use more advanced models• For parsing and other tasks
• More sophisticated modelling/learning/inference
• Use less linguistic motivation and knowledge
• E.g. neural end-to-end NLG architecture
• Potential advantages:• Learn from data instead of specifying by hand
E.g. tools for new languages• Discover new phenomena we didn’t know about
• Learn structures too complex to write by hand• Maybe don’t care about linguistics, psycholinguistics
How the brainprocesses language
:just do well on task
• Help with long tail in some cases: e.g. rare words
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 10/44
![Page 20: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/20.jpg)
MORE DATA, LESS LINGUISTICS
• Can use more advanced models• For parsing and other tasks
• More sophisticated modelling/learning/inference
• Use less linguistic motivation and knowledge
• E.g. neural end-to-end NLG architecture
• Potential advantages:• Learn from data instead of specifying by hand
E.g. tools for new languages• Discover new phenomena we didn’t know about• Learn structures too complex to write by hand• Maybe don’t care about linguistics, psycholinguistics
How the brainprocesses language
:just do well on task
• Help with long tail in some cases: e.g. rare words
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 10/44
![Page 21: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/21.jpg)
DANGERS OF FORGETTINGLINGUISTICS
A little linguistics takes you a long way
• Linguists know a lot about language!
• Easily fall into traps they’ve known about fordecades/centuries
• Or waste valuable insight
• Don’t claim linguistic insight without knowledge of linguistics
• Data helps with some types of long tail, but:• easy to focus too much on common phenomena• get caught out by less common ones – more informative
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 11/44
![Page 22: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/22.jpg)
DANGERS OF FORGETTINGLINGUISTICS
A little linguistics takes you a long way
• Linguists know a lot about language!
• Easily fall into traps they’ve known about fordecades/centuries
• Or waste valuable insight
• Don’t claim linguistic insight without knowledge of linguistics
• Data helps with some types of long tail, but:• easy to focus too much on common phenomena• get caught out by less common ones – more informative
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 11/44
![Page 23: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/23.jpg)
DANGERS OF FORGETTINGLINGUISTICS
A little linguistics takes you a long way
• Linguists know a lot about language!
• Easily fall into traps they’ve known about fordecades/centuries
• Or waste valuable insight
• Don’t claim linguistic insight without knowledge of linguistics
• Data helps with some types of long tail, but:• easy to focus too much on common phenomena• get caught out by less common ones – more informative
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 11/44
![Page 24: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/24.jpg)
DANGERS OF FORGETTINGLINGUISTICS
A little linguistics takes you a long way
• Linguists know a lot about language!
• Easily fall into traps they’ve known about fordecades/centuries
• Or waste valuable insight
• Don’t claim linguistic insight without knowledge of linguistics
• Data helps with some types of long tail, but:• easy to focus too much on common phenomena
wordssyntactic structuresword classes. . .
• get caught out by less common ones – more informative
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 11/44
![Page 25: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/25.jpg)
CLASSIFICATION AND REGRESSION
• Many practical NLP tasks• Classification: text classification, sentiment analysis, NER
• Regression: sentiment analysis, other continuous predictions
• Formulate task as one of classification/regression→ apply existing advanced techniques
• SVMs, logistic regression, polynomial regression, neuralnetworks, random forests, . . .
• Often, input features from other NLP components:POS tags, syntactic dependencies, lemmas, word embeddings
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 12/44
![Page 26: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/26.jpg)
CLASSIFICATION AND REGRESSION
• Many practical NLP tasks• Classification: text classification, sentiment analysis, NER• Regression: sentiment analysis, other continuous predictions
• Formulate task as one of classification/regression→ apply existing advanced techniques
• SVMs, logistic regression, polynomial regression, neuralnetworks, random forests, . . .
• Often, input features from other NLP components:POS tags, syntactic dependencies, lemmas, word embeddings
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 12/44
![Page 27: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/27.jpg)
CLASSIFICATION AND REGRESSION
• Many practical NLP tasks• Classification: text classification, sentiment analysis, NER• Regression: sentiment analysis, other continuous predictions
• Formulate task as one of classification/regression→ apply existing advanced techniques
• SVMs, logistic regression, polynomial regression, neuralnetworks, random forests, . . .
• Often, input features from other NLP components:POS tags, syntactic dependencies, lemmas, word embeddings
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 12/44
![Page 28: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/28.jpg)
CLASSIFICATION AND REGRESSION
• Many practical NLP tasks• Classification: text classification, sentiment analysis, NER• Regression: sentiment analysis, other continuous predictions
• Formulate task as one of classification/regression→ apply existing advanced techniques
• SVMs, logistic regression, polynomial regression, neuralnetworks, random forests, . . .
• Often, input features from other NLP components:POS tags, syntactic dependencies, lemmas, word embeddings
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 12/44
![Page 29: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/29.jpg)
SENTIMENT ANALYSIS
• Input: text (article, review, . . . )
• Goal: predict positive/negative sentiment
• E.g. good or bad review of product?
I just saw #CaptainMarveland I really enjoyed it frombeginning to end. I thoughtthe humor was great, andI loved Brie Larson. Surethere were some changesfrom the comic books, butI liked them. The changeswere necessary.
I wanted to like this movie,but it flopped for me... Theyshould have went with a dif-ferent route, instead of start-ing her off as a strong heroalready, just like every hero.Larson’s acting was a bitstale and lacked more per-sonality.
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 13/44
![Page 30: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/30.jpg)
SENTIMENT ANALYSIS
• Input: text (article, review, . . . )
• Goal: predict positive/negative sentiment
• E.g. good or bad review of product?
I just saw #CaptainMarveland I really enjoyed it frombeginning to end. I thoughtthe humor was great, andI loved Brie Larson. Surethere were some changesfrom the comic books, butI liked them. The changeswere necessary.
I wanted to like this movie,but it flopped for me... Theyshould have went with a dif-ferent route, instead of start-ing her off as a strong heroalready, just like every hero.Larson’s acting was a bitstale and lacked more per-sonality.
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 13/44
![Page 31: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/31.jpg)
SENTIMENT ANALYSIS
• Input: text (article, review, . . . )
• Goal: predict positive/negative sentiment
• E.g. good or bad review of product?
I just saw #CaptainMarveland I really enjoyed it frombeginning to end. I thoughtthe humor was great, andI loved Brie Larson. Surethere were some changesfrom the comic books, butI liked them. The changeswere necessary.
I wanted to like this movie,but it flopped for me... Theyshould have went with a dif-ferent route, instead of start-ing her off as a strong heroalready, just like every hero.Larson’s acting was a bitstale and lacked more per-sonality.
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 13/44
![Page 32: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/32.jpg)
SENTIMENT ANALYSIS
• Input: text (article, review, . . . )
• Goal: predict positive/negative sentiment
• E.g. good or bad review of product?
I just saw #CaptainMarveland I really enjoyed it frombeginning to end. I thoughtthe humor was great, andI loved Brie Larson. Surethere were some changesfrom the comic books, butI liked them. The changeswere necessary.
I wanted to like this movie,but it flopped for me... Theyshould have went with a dif-ferent route, instead of start-ing her off as a strong heroalready, just like every hero.Larson’s acting was a bitstale and lacked more per-sonality.
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 13/44
![Page 33: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/33.jpg)
SENTIMENT ANALYSIS
I just saw #CaptainMarveland I really enjoyed it frombeginning to end. I thoughtthe humor was great, andI loved Brie Larson. Surethere were some changesfrom the comic books, butI liked them. The changeswere necessary.
I wanted to like this movie,but it flopped for me... Theyshould have went with a dif-ferent route, instead of start-ing her off as a strong heroalready, just like every hero.Larson’s acting was a bitstale and lacked more per-sonality.
• Standard ML problem
• / labels
• What are input features? Words?
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 14/44
![Page 34: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/34.jpg)
SENTIMENT ANALYSIS
I just saw #CaptainMarveland I really enjoyed it frombeginning to end. I thoughtthe humor was great, andI loved Brie Larson. Surethere were some changesfrom the comic books, butI liked them. The changeswere necessary.
I wanted to like this movie,but it flopped for me... Theyshould have went with a dif-ferent route, instead of start-ing her off as a strong heroalready, just like every hero.Larson’s acting was a bitstale and lacked more per-sonality.
• Standard ML problem
• / labels
• What are input features? Words?
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 14/44
![Page 35: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/35.jpg)
SENTIMENT ANALYSIS
I just saw #CaptainMarveland I really enjoyed it frombeginning to end. I thoughtthe humor was great, andI loved Brie Larson. Surethere were some changesfrom the comic books, butI liked them. The changeswere necessary.
I wanted to like this movie,but it flopped for me... Theyshould have went with a dif-ferent route, instead of start-ing her off as a strong heroalready, just like every hero.Larson’s acting was a bitstale and lacked more per-sonality.
• Standard ML problem
• / labels
• What are input features? Words?
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 14/44
![Page 36: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/36.jpg)
SENTIMENT ANALYSIS
I just saw #CaptainMarveland I really enjoyed it frombeginning to end. I thoughtthe humor was great, andI loved Brie Larson. Surethere were some changesfrom the comic books, butI liked them. The changeswere necessary.
I wanted to like this movie,but it flopped for me... Theyshould have went with a dif-ferent route, instead of start-ing her off as a strong heroalready, just like every hero.Larson’s acting was a bitstale and lacked more per-sonality.
• Standard ML problem
• / labels
• What are input features? Words?
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 14/44
![Page 37: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/37.jpg)
SENTIMENT ANALYSIS
• Apply standard classification (or regression) methods
• NLP-based features
• Word embeddings, POS tags, . . . what else?
• Limitation of BOW model
• Solve with better features? Multi-word features?
• Or better model? Compose word meanings
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 15/44
![Page 38: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/38.jpg)
SENTIMENT ANALYSIS
• Apply standard classification (or regression) methods
• NLP-based features
• Word embeddings, POS tags, . . . what else?
• Limitation of BOW model
• Solve with better features? Multi-word features?
• Or better model? Compose word meanings
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 15/44
![Page 39: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/39.jpg)
SENTIMENT ANALYSIS
• Apply standard classification (or regression) methods
• NLP-based features
• Word embeddings, POS tags, . . . what else?
• Limitation of BOW model
• Solve with better features? Multi-word features?
• Or better model? Compose word meanings
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 15/44
![Page 40: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/40.jpg)
SENTIMENT ANALYSIS
• Apply standard classification (or regression) methods
• NLP-based features
• Word embeddings, POS tags, . . . what else?
• Limitation of BOW model
• Solve with better features? Multi-word features?
• Or better model? Compose word meanings
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 15/44
![Page 41: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/41.jpg)
OUTLINE
Statistical NLP
Advanced Statistical Methods
Closing remarks
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 16/44
![Page 42: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/42.jpg)
ADVANCED STATISTICALMETHODS
• SVMs: another classifier
• Topic modelling: model detail on LDA
• Deep learning: hot topic, example use in NLP
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 17/44
![Page 43: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/43.jpg)
SUPPORT VECTOR MACHINES(SVMs)
• Probably seen before. . . ?
• Commonly used classifier
• Still often best choice for classification tasks
• Well developed, efficient libraries
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 18/44
![Page 44: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/44.jpg)
SUPPORT VECTOR MACHINES(SVMs)
• Probably seen before. . . ?
• Commonly used classifier
• Still often best choice for classification tasks
• Well developed, efficient libraries
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 18/44
![Page 45: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/45.jpg)
SVMs
• Linearly separable data
• Find hyperplane to separate data with maximum margin
• High dimensional space
Original image: Fabian BurgerHELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 19/44
![Page 46: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/46.jpg)
SVMs
• Linearly separable data
• Find hyperplane to separate data with maximum margin
• High dimensional space
Original image: Fabian BurgerHELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 19/44
![Page 47: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/47.jpg)
SVMs
• Linearly separable data
• Find hyperplane to separate data with maximum margin
• High dimensional space
Original image: Fabian BurgerHELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 19/44
![Page 48: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/48.jpg)
SVMs
• For non-linearly separable data: use kernel trick
• Map space to new dimensions via non-linear kernels
• Find linear separation
• Different kernels: different types of hyperplane
• Capture interactions between (original) dimensions
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 20/44
![Page 49: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/49.jpg)
SVMs
• For non-linearly separable data: use kernel trick
• Map space to new dimensions via non-linear kernels
• Find linear separation
• Different kernels: different types of hyperplane
• Capture interactions between (original) dimensions
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 20/44
![Page 50: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/50.jpg)
SVMs
• For non-linearly separable data: use kernel trick
• Map space to new dimensions via non-linear kernels
• Find linear separation
• Different kernels: different types of hyperplane
• Capture interactions between (original) dimensions
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 20/44
![Page 51: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/51.jpg)
SVMs
• For non-linearly separable data: use kernel trick
• Map space to new dimensions via non-linear kernels
• Find linear separation
• Different kernels: different types of hyperplane
• Capture interactions between (original) dimensions
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 20/44
![Page 52: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/52.jpg)
SVMs
• For non-linearly separable data: use kernel trick
• Map space to new dimensions via non-linear kernels
• Find linear separation
• Different kernels: different types of hyperplane
• Capture interactions between (original) dimensions
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 20/44
![Page 53: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/53.jpg)
SVM APPLICATIONS
• Apply to sentiment analysis
• Separate positive examples from negative
• Many dimensions (features):words, word-POSs, embeddings, dependencies, . . .
• With kernels, word features no longer independent
• Overcomes some problems with BOW
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 21/44
![Page 54: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/54.jpg)
SVM APPLICATIONS
• Apply to sentiment analysis
• Separate positive examples from negative
• Many dimensions (features):words, word-POSs, embeddings, dependencies, . . .
• With kernels, word features no longer independent
• Overcomes some problems with BOW
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 21/44
![Page 55: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/55.jpg)
SVM APPLICATIONS
• Apply to sentiment analysis
• Separate positive examples from negative
• Many dimensions (features):words, word-POSs, embeddings, dependencies, . . .
• With kernels, word features no longer independent
• Overcomes some problems with BOW
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 21/44
![Page 56: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/56.jpg)
SVM APPLICATIONS
• Apply to other classification problems
• E.g. email spam detection: spam vs ham
• Similar features
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 22/44
![Page 57: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/57.jpg)
TOPIC MODELLING
• Topic modelling: LDA
• Example of unsupervised learning in NLP
• Type of clustering of docs
• Based on their words: BOW model
• More detail on modelling now
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 23/44
![Page 58: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/58.jpg)
TOPIC MODELLING
• Topic modelling: LDA
• Example of unsupervised learning in NLP
• Type of clustering of docs
• Based on their words: BOW model
• More detail on modelling now
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 23/44
![Page 59: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/59.jpg)
TOPIC MODELLING
• Topic modelling: LDA
• Example of unsupervised learning in NLP
• Type of clustering of docs
• Based on their words: BOW model
• More detail on modelling now
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 23/44
![Page 60: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/60.jpg)
LDA
• Latent Dirichlet Allocation (LDA)
• Bayesian modelling: generative probability distribution
• Simplifying assumptions:• Each document generated by simple process• Fixed number of topics in corpus
• Bag of words model• Document covers a few topics• Topic associated (mainly) with small number of words
• Words that often appear in same doc belong to same topic
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 24/44
![Page 61: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/61.jpg)
LDA
• Latent Dirichlet Allocation (LDA)
• Bayesian modelling: generative probability distribution
• Simplifying assumptions:• Each document generated by simple process• Fixed number of topics in corpus
• Bag of words model• Document covers a few topics• Topic associated (mainly) with small number of words
• Words that often appear in same doc belong to same topic
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 24/44
![Page 62: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/62.jpg)
LDA
• Latent Dirichlet Allocation (LDA)
• Bayesian modelling: generative probability distribution
• Simplifying assumptions:• Each document generated by simple process• Fixed number of topics in corpus• Bag of words model
• Document covers a few topics• Topic associated (mainly) with small number of words
• Words that often appear in same doc belong to same topic
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 24/44
![Page 63: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/63.jpg)
LDA
• Latent Dirichlet Allocation (LDA)
• Bayesian modelling: generative probability distribution
• Simplifying assumptions:• Each document generated by simple process• Fixed number of topics in corpus• Bag of words model• Document covers a few topics
• Topic associated (mainly) with small number of words
• Words that often appear in same doc belong to same topic
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 24/44
![Page 64: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/64.jpg)
LDA
• Latent Dirichlet Allocation (LDA)
• Bayesian modelling: generative probability distribution
• Simplifying assumptions:• Each document generated by simple process• Fixed number of topics in corpus• Bag of words model• Document covers a few topics• Topic associated (mainly) with small number of words
• Words that often appear in same doc belong to same topic
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 24/44
![Page 65: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/65.jpg)
LDA
• Latent Dirichlet Allocation (LDA)
• Bayesian modelling: generative probability distribution
• Simplifying assumptions:• Each document generated by simple process• Fixed number of topics in corpus• Bag of words model• Document covers a few topics• Topic associated (mainly) with small number of words
• Words that often appear in same doc belong to same topic
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 24/44
![Page 66: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/66.jpg)
LDA
Generative process (‘story’):
(1) For each topic t:(1) Choose words associated with t(2) Prob dist over words
(2) For each document d :(1) Choose a distribution over topics
(2) For each word in d :(1) Choose a topic t from d ’s distribution
(2) Choose a word from t’s distribution
• Process assumed by model to underlie data
• Not process followed by inference!
• Only words and documents are observed:topics are latent variables discovered by model
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 25/44
![Page 67: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/67.jpg)
LDA
Generative process (‘story’):
(1) For each topic t:(1) Choose words associated with t(2) Prob dist over words
(2) For each document d :(1) Choose a distribution over topics
(2) For each word in d :(1) Choose a topic t from d ’s distribution
(2) Choose a word from t’s distribution
• Process assumed by model to underlie data
• Not process followed by inference!
• Only words and documents are observed:topics are latent variables discovered by model
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 25/44
![Page 68: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/68.jpg)
LDA
Generative process (‘story’):
(1) For each topic t:(1) Choose words associated with t(2) Prob dist over words
(2) For each document d :(1) Choose a distribution over topics(2) For each word in d :
(1) Choose a topic t from d ’s distribution
(2) Choose a word from t’s distribution
• Process assumed by model to underlie data
• Not process followed by inference!
• Only words and documents are observed:topics are latent variables discovered by model
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 25/44
![Page 69: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/69.jpg)
LDA
Generative process (‘story’):
(1) For each topic t:(1) Choose words associated with t(2) Prob dist over words
(2) For each document d :(1) Choose a distribution over topics(2) For each word in d :
(1) Choose a topic t from d ’s distribution(2) Choose a word from t’s distribution
• Process assumed by model to underlie data
• Not process followed by inference!
• Only words and documents are observed:topics are latent variables discovered by model
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 25/44
![Page 70: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/70.jpg)
LDA
Generative process (‘story’):
(1) For each topic t:(1) Choose words associated with t(2) Prob dist over words
(2) For each document d :(1) Choose a distribution over topics(2) For each word in d :
(1) Choose a topic t from d ’s distribution(2) Choose a word from t’s distribution
• Process assumed by model to underlie data
• Not process followed by inference!
• Only words and documents are observed:topics are latent variables discovered by model
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 25/44
![Page 71: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/71.jpg)
LDA
Generative process (‘story’):
(1) For each topic t:(1) Choose words associated with t(2) Prob dist over words
(2) For each document d :(1) Choose a distribution over topics(2) For each word in d :
(1) Choose a topic t from d ’s distribution(2) Choose a word from t’s distribution
• Process assumed by model to underlie data
• Not process followed by inference!
• Only words and documents are observed:topics are latent variables discovered by model
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 25/44
![Page 72: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/72.jpg)
REMINDER: PLATE DIAGRAMS
t0
p(t0|Start)
t1
p(t1|t0)
t2
p(t2|t1)
t3
p(t3|t2). . .
w0
p(this|t0)
w1
p(is|t1)
w2
p(a|t2)
w3
p(sentence|t3)
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 26/44
![Page 73: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/73.jpg)
REMINDER: PLATE DIAGRAMS
t0
p(t0|Start)
t1
p(t1|t0)
t2
p(t2|t1)
t3
p(t3|t2). . .
w0
p(this|t0)
w1
p(is|t1)
w2
p(a|t2)
w3
p(sentence|t3)
Variables
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 26/44
![Page 74: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/74.jpg)
REMINDER: PLATE DIAGRAMS
t0
p(t0|Start)
t1
p(t1|t0)
t2
p(t2|t1)
t3
p(t3|t2). . .
w0
p(this|t0)
w1
p(is|t1)
w2
p(a|t2)
w3
p(sentence|t3)
Variables
Hidden
Observed
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 26/44
![Page 75: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/75.jpg)
REMINDER: PLATE DIAGRAMS
t0
p(t0|Start)
t1
p(t1|t0)
t2
p(t2|t1)
t3
p(t3|t2). . .
w0
p(this|t0)
w1
p(is|t1)
w2
p(a|t2)
w3
p(sentence|t3)
Variables
Hidden
Observed
Conditional distributions
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 26/44
![Page 76: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/76.jpg)
REMINDER: PLATE DIAGRAMS
N
ti−1 ti
wi
More compact notation: box around repeated elements
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 27/44
![Page 77: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/77.jpg)
REMINDER: PLATE DIAGRAMS
N
ti−1 ti
wi
Repeat N times:N words
More compact notation: box around repeated elements
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 27/44
![Page 78: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/78.jpg)
LDA
DN K
wd ,n
zd ,n
θd
α
βk
η
(1) For each topic t:
(1) Choose words associatedwith t
(2) Prob dist over words
(2) For each document d :
(1) Choose a distribution overtopics
(2) For each word in d :
(1) Choose a topic t fromd ’s distribution
(2) Choose a word fromt’s distribution
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 28/44
![Page 79: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/79.jpg)
LDA
DN K
wd ,n
zd ,n
θd
α
βk
η
(1) For each topic t:
(1) Choose words associatedwith t
(2) Prob dist over words
(2) For each document d :
(1) Choose a distribution overtopics
(2) For each word in d :
(1) Choose a topic t fromd ’s distribution
(2) Choose a word fromt’s distribution
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 28/44
![Page 80: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/80.jpg)
LDA
DN K
wd ,n
zd ,n
θd
α
βk
η
(1) For each topic t:
(1) Choose words associatedwith t
(2) Prob dist over words
(2) For each document d :
(1) Choose a distribution overtopics
(2) For each word in d :
(1) Choose a topic t fromd ’s distribution
(2) Choose a word fromt’s distribution
Per-topic distover words
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 28/44
![Page 81: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/81.jpg)
LDA
DN K
wd ,n
zd ,n
θd
α
βk
η
(1) For each topic t:
(1) Choose words associatedwith t
(2) Prob dist over words
(2) For each document d :
(1) Choose a distribution overtopics
(2) For each word in d :
(1) Choose a topic t fromd ’s distribution
(2) Choose a word fromt’s distribution
Per-topic distover words
Hyperparameter for βDist over dists
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 28/44
![Page 82: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/82.jpg)
LDA
DN K
wd ,n
zd ,n
θd
α
βk
η
(1) For each topic t:
(1) Choose words associatedwith t
(2) Prob dist over words
(2) For each document d :
(1) Choose a distribution overtopics
(2) For each word in d :
(1) Choose a topic t fromd ’s distribution
(2) Choose a word fromt’s distribution
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 28/44
![Page 83: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/83.jpg)
LDA
DN K
wd ,n
zd ,n
θd
α
βk
η
(1) For each topic t:
(1) Choose words associatedwith t
(2) Prob dist over words
(2) For each document d :
(1) Choose a distribution overtopics
(2) For each word in d :
(1) Choose a topic t fromd ’s distribution
(2) Choose a word fromt’s distribution
Per-doc distover topics
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 28/44
![Page 84: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/84.jpg)
LDA
DN K
wd ,n
zd ,n
θd
α
βk
η
(1) For each topic t:
(1) Choose words associatedwith t
(2) Prob dist over words
(2) For each document d :
(1) Choose a distribution overtopics
(2) For each word in d :
(1) Choose a topic t fromd ’s distribution
(2) Choose a word fromt’s distribution
Per-doc distover topics
Hyperparameter for θDist over dists
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 28/44
![Page 85: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/85.jpg)
LDA
DN K
wd ,n
zd ,n
θd
α
βk
η
(1) For each topic t:
(1) Choose words associatedwith t
(2) Prob dist over words
(2) For each document d :
(1) Choose a distribution overtopics
(2) For each word in d :
(1) Choose a topic t fromd ’s distribution
(2) Choose a word fromt’s distribution
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 28/44
![Page 86: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/86.jpg)
LDA
DN K
wd ,n
zd ,n
θd
α
βk
η
(1) For each topic t:
(1) Choose words associatedwith t
(2) Prob dist over words
(2) For each document d :
(1) Choose a distribution overtopics
(2) For each word in d :
(1) Choose a topic t fromd ’s distribution
(2) Choose a word fromt’s distribution
Topic for each word
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 28/44
![Page 87: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/87.jpg)
LDA
DN K
wd ,n
zd ,n
θd
α
βk
η
(1) For each topic t:
(1) Choose words associatedwith t
(2) Prob dist over words
(2) For each document d :
(1) Choose a distribution overtopics
(2) For each word in d :
(1) Choose a topic t fromd ’s distribution
(2) Choose a word fromt’s distribution
Topic for each word
Word, drawn from z ’s word dist (βz)HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 28/44
![Page 88: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/88.jpg)
TRAINING LDA
• LDA defines prob of words given topics
• Topics are unknown – words are observed
• Can’t estimate p(w |t) from counts, as with HMM POS tagger
• Bayes’ rule: prob of topic given word (+ other topics)
• Rough idea behind training:• See lots of docs (words only)• Try selecting some topics based on words and current guess at
distributions
• Keep iterating – distributions start to look consistent
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 29/44
![Page 89: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/89.jpg)
TRAINING LDA
• LDA defines prob of words given topics
• Topics are unknown – words are observed
• Can’t estimate p(w |t) from counts, as with HMM POS tagger
• Bayes’ rule: prob of topic given word (+ other topics)
• Rough idea behind training:• See lots of docs (words only)• Try selecting some topics based on words and current guess at
distributions
• Keep iterating – distributions start to look consistent
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 29/44
![Page 90: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/90.jpg)
TRAINING LDA
• LDA defines prob of words given topics
• Topics are unknown – words are observed
• Can’t estimate p(w |t) from counts, as with HMM POS tagger
• Bayes’ rule: prob of topic given word (+ other topics)
• Rough idea behind training:• See lots of docs (words only)• Try selecting some topics based on words and current guess at
distributions
• Keep iterating – distributions start to look consistent
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 29/44
![Page 91: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/91.jpg)
TRAINING LDA
• LDA defines prob of words given topics
• Topics are unknown – words are observed
• Can’t estimate p(w |t) from counts, as with HMM POS tagger
• Bayes’ rule: prob of topic given word (+ other topics)
• Rough idea behind training:• See lots of docs (words only)• Try selecting some topics based on words and current guess at
distributions
Initially random
• Keep iterating – distributions start to look consistent
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 29/44
![Page 92: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/92.jpg)
TRAINING LDA
• LDA defines prob of words given topics
• Topics are unknown – words are observed
• Can’t estimate p(w |t) from counts, as with HMM POS tagger
• Bayes’ rule: prob of topic given word (+ other topics)
• Rough idea behind training:• See lots of docs (words only)• Try selecting some topics based on words and current guess at
distributions
Initially random
• Keep iterating – distributions start to look consistent
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 29/44
![Page 93: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/93.jpg)
TRAINING LDA
• Over time, discover words that tend to occur together
• Grouped into topics
• E.g. many docs use bank and money→ one topic covers these docs
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 30/44
![Page 94: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/94.jpg)
TRAINING LDA
• Over time, discover words that tend to occur together
• Grouped into topics
• E.g. many docs use bank and money→ one topic covers these docs
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 30/44
![Page 95: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/95.jpg)
TRAINING LDA
• Over time, discover words that tend to occur together
• Grouped into topics
• E.g. many docs use bank and money→ one topic covers these docs
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 30/44
![Page 96: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/96.jpg)
TRAINING LDA
• Over time, discover words that tend tooccur together
• Key to estimation: α and η specify priorbelief that
• each document talks about few topics• each topic is discussed using few words
• How few depends on α and η
DN K
wd ,n
zd ,n
θd
α
βk
η
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 31/44
![Page 97: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/97.jpg)
TRAINING LDA
• Over time, discover words that tend tooccur together
• Key to estimation: α and η specify priorbelief that
• each document talks about few topics• each topic is discussed using few words
• How few depends on α and η
DN K
wd ,n
zd ,n
θd
α
βk
η
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 31/44
![Page 98: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/98.jpg)
TRAINING LDA
• Over time, discover words that tend tooccur together
• Key to estimation: α and η specify priorbelief that
• each document talks about few topics
• each topic is discussed using few words
• How few depends on α and η
DN K
wd ,n
zd ,n
θd
α
βk
η
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 31/44
![Page 99: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/99.jpg)
TRAINING LDA
• Over time, discover words that tend tooccur together
• Key to estimation: α and η specify priorbelief that
• each document talks about few topics• each topic is discussed using few words
• How few depends on α and η
DN K
wd ,n
zd ,n
θd
α
βk
η
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 31/44
![Page 100: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/100.jpg)
TRAINING LDA
• Over time, discover words that tend tooccur together
• Key to estimation: α and η specify priorbelief that
• each document talks about few topics• each topic is discussed using few words
• How few depends on α and ηD
N Kwd ,n
zd ,n
θd
α
βk
η
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 31/44
![Page 101: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/101.jpg)
NEURAL NETWORKS / DEEPLEARNING
• Neural networks: been around a long time
• Recent explosion:
• Clever new training tricks• Lots of data• Faster processors (GPUs)
• Can now train huge networks, many hidden layers→ deep learning
• Loads of applications to NLP
• Couple of examples here
• No details of modelling, training, ML theory, maths, . . . : seeother courses!
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 32/44
![Page 102: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/102.jpg)
NEURAL NETWORKS / DEEPLEARNING
• Neural networks: been around a long time
• Recent explosion:
• Clever new training tricks• Lots of data• Faster processors (GPUs)
• Can now train huge networks, many hidden layers→ deep learning
• Loads of applications to NLP
• Couple of examples here
• No details of modelling, training, ML theory, maths, . . . : seeother courses!
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 32/44
![Page 103: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/103.jpg)
NEURAL NETWORKS / DEEPLEARNING
• Neural networks: been around a long time
• Recent explosion:• Clever new training tricks
• Lots of data• Faster processors (GPUs)
• Can now train huge networks, many hidden layers→ deep learning
• Loads of applications to NLP
• Couple of examples here
• No details of modelling, training, ML theory, maths, . . . : seeother courses!
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 32/44
![Page 104: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/104.jpg)
NEURAL NETWORKS / DEEPLEARNING
• Neural networks: been around a long time
• Recent explosion:• Clever new training tricks• Lots of data
• Faster processors (GPUs)
• Can now train huge networks, many hidden layers→ deep learning
• Loads of applications to NLP
• Couple of examples here
• No details of modelling, training, ML theory, maths, . . . : seeother courses!
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 32/44
![Page 105: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/105.jpg)
NEURAL NETWORKS / DEEPLEARNING
• Neural networks: been around a long time
• Recent explosion:• Clever new training tricks• Lots of data• Faster processors (GPUs)
• Can now train huge networks, many hidden layers→ deep learning
• Loads of applications to NLP
• Couple of examples here
• No details of modelling, training, ML theory, maths, . . . : seeother courses!
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 32/44
![Page 106: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/106.jpg)
NEURAL NETWORKS / DEEPLEARNING
• Neural networks: been around a long time
• Recent explosion:• Clever new training tricks• Lots of data• Faster processors (GPUs)
• Can now train huge networks, many hidden layers→ deep learning
• Loads of applications to NLP
• Couple of examples here
• No details of modelling, training, ML theory, maths, . . . : seeother courses!
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 32/44
![Page 107: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/107.jpg)
NEURAL NETWORKS / DEEPLEARNING
• Neural networks: been around a long time
• Recent explosion:• Clever new training tricks• Lots of data• Faster processors (GPUs)
• Can now train huge networks, many hidden layers→ deep learning
• Loads of applications to NLP
• Couple of examples here
• No details of modelling, training, ML theory, maths, . . . : seeother courses!
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 32/44
![Page 108: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/108.jpg)
NEURAL NETWORKS / DEEPLEARNING
• Neural networks: been around a long time
• Recent explosion:• Clever new training tricks• Lots of data• Faster processors (GPUs)
• Can now train huge networks, many hidden layers→ deep learning
• Loads of applications to NLP
• Couple of examples here
• No details of modelling, training, ML theory, maths, . . . : seeother courses!
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 32/44
![Page 109: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/109.jpg)
NEURAL NETWORKS / DEEPLEARNING
• Neural networks: been around a long time
• Recent explosion:• Clever new training tricks• Lots of data• Faster processors (GPUs)
• Can now train huge networks, many hidden layers→ deep learning
• Loads of applications to NLP
• Couple of examples here
• No details of modelling, training, ML theory, maths, . . . : seeother courses!
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 32/44
![Page 110: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/110.jpg)
SEQUENCE MODELS
• In NLP, lots of sequence processing!
• Words – sentences, POS tags, NER tags, . . .
• Not the full story: syntax, semantics (LR deps)
• Many applications of recurrent neural networks (RNNs)
• Learning problems with traditional RNNs
• Overcome by recent models: LSTMs, GRUs
• But same basic idea
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 33/44
![Page 111: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/111.jpg)
SEQUENCE MODELS
• In NLP, lots of sequence processing!
• Words – sentences, POS tags, NER tags, . . .
• Not the full story: syntax, semantics (LR deps)
• Many applications of recurrent neural networks (RNNs)
• Learning problems with traditional RNNs
• Overcome by recent models: LSTMs, GRUs
• But same basic idea
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 33/44
![Page 112: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/112.jpg)
SEQUENCE MODELS
• In NLP, lots of sequence processing!
• Words – sentences, POS tags, NER tags, . . .
• Not the full story: syntax, semantics (LR deps)
• Many applications of recurrent neural networks (RNNs)
• Learning problems with traditional RNNs
• Overcome by recent models: LSTMs, GRUs
• But same basic idea
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 33/44
![Page 113: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/113.jpg)
SEQUENCE MODELS
• In NLP, lots of sequence processing!
• Words – sentences, POS tags, NER tags, . . .
• Not the full story: syntax, semantics (LR deps)
• Many applications of recurrent neural networks (RNNs)
• Learning problems with traditional RNNs
• Overcome by recent models: LSTMs, GRUs
• But same basic idea
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 33/44
![Page 114: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/114.jpg)
SEQUENCE MODELS
• In NLP, lots of sequence processing!
• Words – sentences, POS tags, NER tags, . . .
• Not the full story: syntax, semantics (LR deps)
• Many applications of recurrent neural networks (RNNs)
• Learning problems with traditional RNNs
• Overcome by recent models: LSTMs, GRUs
• But same basic idea
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 33/44
![Page 115: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/115.jpg)
SEQUENCE MODELS
• In NLP, lots of sequence processing!
• Words – sentences, POS tags, NER tags, . . .
• Not the full story: syntax, semantics (LR deps)
• Many applications of recurrent neural networks (RNNs)
• Learning problems with traditional RNNs
• Overcome by recent models: LSTMs, GRUs
• But same basic idea
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 33/44
![Page 116: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/116.jpg)
SEQUENCE MODELS
• In NLP, lots of sequence processing!
• Words – sentences, POS tags, NER tags, . . .
• Not the full story: syntax, semantics (LR deps)
• Many applications of recurrent neural networks (RNNs)
• Learning problems with traditional RNNs
• Overcome by recent models: LSTMs, GRUs
• But same basic idea
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 33/44
![Page 117: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/117.jpg)
RNN LANGUAGE MODELLING
• One application: language modelling
• Earlier: Markov LMs
, n-grams
• Some problems:• Markov assumption: bad, but practical• Limited n-gram size: data sparsity• Fixed n-gram size: lose LR deps
• Modern RNNs can help! (LSTMs, . . . )
w0
p(w0|START)
w1
p(w1|w0)
w2
p(w2|w1)
. . .
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 34/44
![Page 118: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/118.jpg)
RNN LANGUAGE MODELLING
• One application: language modelling
• Earlier: Markov LMs
, n-grams
• Some problems:• Markov assumption: bad, but practical• Limited n-gram size: data sparsity• Fixed n-gram size: lose LR deps
• Modern RNNs can help! (LSTMs, . . . )
w0
p(w0|START)
w1
p(w1|w0)
w2
p(w2|w1)
. . .
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 34/44
![Page 119: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/119.jpg)
RNN LANGUAGE MODELLING
• One application: language modelling
• Earlier: Markov LMs, n-grams
• Some problems:• Markov assumption: bad, but practical• Limited n-gram size: data sparsity• Fixed n-gram size: lose LR deps
• Modern RNNs can help! (LSTMs, . . . )
w0
p(w0|START)
w1
p(w1|START ,w0)
w2
p(w2|w0,w1)
. . .
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 34/44
![Page 120: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/120.jpg)
RNN LANGUAGE MODELLING
• One application: language modelling
• Earlier: Markov LMs, n-grams
• Some problems:• Markov assumption: bad, but practical• Limited n-gram size: data sparsity• Fixed n-gram size: lose LR deps
• Modern RNNs can help! (LSTMs, . . . )
w0
p(w0|START)
w1
p(w1|START ,w0)
w2
p(w2|w0,w1)
. . .
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 34/44
![Page 121: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/121.jpg)
RNN LANGUAGE MODELLING
• One application: language modelling
• Earlier: Markov LMs, n-grams
• Some problems:• Markov assumption: bad, but practical• Limited n-gram size: data sparsity• Fixed n-gram size: lose LR deps
• Modern RNNs can help! (LSTMs, . . . )
w0
p(w0|START)
w1
p(w1|START ,w0)
w2
p(w2|w0,w1)
. . .
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 34/44
![Page 122: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/122.jpg)
RNNs
h0 h1 h2 h3 . . .
this is a sentence
o0 o1 o2 o3
• Looks rather like HMM
• States: theoretically ‘remember’ distant inputs/states
• Modern RNNs (LSTMs, GRUs) make this work in practice
• Outputs: based on seq so far
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 35/44
![Page 123: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/123.jpg)
RNNs
h0 h1 h2 h3 . . .
this is a sentence
o0 o1 o2 o3
• Looks rather like HMM
• States: theoretically ‘remember’ distant inputs/states
• Modern RNNs (LSTMs, GRUs) make this work in practice
• Outputs: based on seq so far
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 35/44
![Page 124: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/124.jpg)
RNNs
h0 h1 h2 h3 . . .
this is a sentence
o0 o1 o2 o3
Hidden state(layer/vector)
• Looks rather like HMM
• States: theoretically ‘remember’ distant inputs/states
• Modern RNNs (LSTMs, GRUs) make this work in practice
• Outputs: based on seq so far
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 35/44
![Page 125: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/125.jpg)
RNNs
h0 h1 h2 h3 . . .
this is a sentence
o0 o1 o2 o3
Hidden state(layer/vector)
• Looks rather like HMM
• States: theoretically ‘remember’ distant inputs/states
• Modern RNNs (LSTMs, GRUs) make this work in practice
• Outputs: based on seq so far
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 35/44
![Page 126: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/126.jpg)
RNNs
h0 h1 h2 h3 . . .
this is a sentence
o0 o1 o2 o3
Hidden state(layer/vector)
• Looks rather like HMM
• States: theoretically ‘remember’ distant inputs/states
• Modern RNNs (LSTMs, GRUs) make this work in practice
• Outputs: based on seq so far
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 35/44
![Page 127: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/127.jpg)
RNNs
h0 h1 h2 h3 . . .
this is a sentence
o0 o1 o2 o3
Hidden state(layer/vector)
• Looks rather like HMM
• States: theoretically ‘remember’ distant inputs/states
• Modern RNNs (LSTMs, GRUs) make this work in practice
• Outputs: based on seq so far
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 35/44
![Page 128: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/128.jpg)
RNNs FOR LM
h0 h1 h2 h3 . . .
w0 w0 w0 w0
o0 o1 o2 o3
• Inputs: words
• Outputs: prediction of next word
• Softmax output: probability distribution
• Train on sentences
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 36/44
![Page 129: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/129.jpg)
RNNs FOR LM
h0 h1 h2 h3 . . .
w0 w0 w0 w0
o0 o1 o2 o3
• Inputs: words
• Outputs: prediction of next word
• Softmax output: probability distribution
• Train on sentences
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 36/44
![Page 130: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/130.jpg)
RNNs FOR LM
h0 h1 h2 h3 . . .
w0 w0 w0 w0
w1 w2 w3 w4
• Inputs: words
• Outputs: prediction of next word
• Softmax output: probability distribution
• Train on sentences
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 36/44
![Page 131: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/131.jpg)
RNNs FOR LM
h0 h1 h2 h3 . . .
w0 w0 w0 w0
h′0 h′1 h′2 h′3. . .
w1 w2 w3 w4
‘Deep’ hidden representations
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 37/44
![Page 132: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/132.jpg)
RNNs FOR LM
h0 h1 h2 h3 . . .
w0 w0 w0 w0
h′0 h′1 h′2 h′3. . .
w1 w2 w3 w4
‘Deep’ hidden representations
Stack hidden layers(arbitrarily many)
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 37/44
![Page 133: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/133.jpg)
RNNs FOR LM
h0 h1 h2 h3 . . .
w0 w0 w0 w0
h′0 h′1 h′2 h′3. . .
w1 w2 w3 w4
‘Deep’ hidden representations
Stack hidden layers(arbitrarily many)
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 37/44
![Page 134: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/134.jpg)
OTHER TASKS
h0 h1 h2 h3 . . .
w0 w0 w0 w0
o0 o1 o2 o3
• Not just language modelling
• Trivial to apply to any supervised labelling task
• E.g. POS tagging
, NER
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 38/44
![Page 135: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/135.jpg)
OTHER TASKS
h0 h1 h2 h3 . . .
w0 w0 w0 w0
JJ NNP VBI NNS
• Not just language modelling
• Trivial to apply to any supervised labelling task
• E.g. POS tagging
, NER
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 38/44
![Page 136: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/136.jpg)
OTHER TASKS
h0 h1 h2 h3 . . .
w0 w0 w0 w0
O Plc O Prs
• Not just language modelling
• Trivial to apply to any supervised labelling task
• E.g. POS tagging, NER
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 38/44
![Page 137: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/137.jpg)
RNNs FOR SENTIMENT
h0 h1 h2 h3
w0 w0 w0 w0
Score
• Apply to sentiment analysis
• Feed in input text
• Only predict at end
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 39/44
![Page 138: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/138.jpg)
RNNs FOR SENTIMENT
h0 h1 h2 h3
w0 w0 w0 w0
Score
• Apply to sentiment analysis
• Feed in input text
• Only predict at end
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 39/44
![Page 139: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/139.jpg)
RNNs FOR SENTIMENT
h0 h1 h2 h3
w0 w0 w0 w0
Score
• Apply to sentiment analysis
• Feed in input text
• Only predict at end
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 39/44
![Page 140: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/140.jpg)
RNNs FOR SENTIMENT h0 h1 h2 h3
w0 w0 w0 w0
Score
• RNN sees full input: prediction can depend on any word
• Supervised training: known sentiment scores/classes
• Potential advantages
• No more BOWs! RNN can learn phrases, MWE, . . .• Can generalize based on word similarity:
great → ⇒ awesome →• Might learn LR deps
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 40/44
![Page 141: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/141.jpg)
RNNs FOR SENTIMENT h0 h1 h2 h3
w0 w0 w0 w0
Score
• RNN sees full input: prediction can depend on any word
• Supervised training: known sentiment scores/classes
• Potential advantages
• No more BOWs! RNN can learn phrases, MWE, . . .• Can generalize based on word similarity:
great → ⇒ awesome →• Might learn LR deps
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 40/44
![Page 142: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/142.jpg)
RNNs FOR SENTIMENT h0 h1 h2 h3
w0 w0 w0 w0
Score
• RNN sees full input: prediction can depend on any word
• Supervised training: known sentiment scores/classes
• Potential advantages
• No more BOWs! RNN can learn phrases, MWE, . . .• Can generalize based on word similarity:
great → ⇒ awesome →• Might learn LR deps
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 40/44
![Page 143: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/143.jpg)
RNNs FOR SENTIMENT h0 h1 h2 h3
w0 w0 w0 w0
Score
• RNN sees full input: prediction can depend on any word
• Supervised training: known sentiment scores/classes
• Potential advantages• No more BOWs! RNN can learn phrases, MWE, . . .
• Can generalize based on word similarity:
great → ⇒ awesome →• Might learn LR deps
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 40/44
![Page 144: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/144.jpg)
RNNs FOR SENTIMENT h0 h1 h2 h3
w0 w0 w0 w0
Score
• RNN sees full input: prediction can depend on any word
• Supervised training: known sentiment scores/classes
• Potential advantages• No more BOWs! RNN can learn phrases, MWE, . . .• Can generalize based on word similarity:
great → ⇒ awesome →
• Might learn LR deps
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 40/44
![Page 145: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/145.jpg)
RNNs FOR SENTIMENT h0 h1 h2 h3
w0 w0 w0 w0
Score
• RNN sees full input: prediction can depend on any word
• Supervised training: known sentiment scores/classes
• Potential advantages• No more BOWs! RNN can learn phrases, MWE, . . .• Can generalize based on word similarity:
great → ⇒ awesome →• Might learn LR deps
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 40/44
![Page 146: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/146.jpg)
MORE NNs
• Lots more work in this area• Bidirectional RNNs• Convolutional NNs (CNNs)• Context-sensitive word embeddings• Seq2seq models: Machine Translation, etc
• Other courses:• Deep Learning• Deep Learning for NLP (seminar)
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 41/44
![Page 147: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/147.jpg)
PART II SUMMARY
• Statistics in NLP
• Data-driven NLP, less linguistics
• Classification / regression
• Sentiment analysis
• SVMs
• Topic modelling: details of LDA
• Neural networks
• RNNs, applications
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 42/44
![Page 148: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/148.jpg)
READING MATERIAL
• All statistical methods: Eisenstein
• Neural networks: Neural Network Methods in NLPE-book, available through Helka
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 43/44
![Page 149: Day 8: Information Extraction and Advanced Statistical NLP](https://reader030.vdocuments.us/reader030/viewer/2022020623/61f093e8e713287b9b7bd03e/html5/thumbnails/149.jpg)
NEXT UP
After lunch:Practical assignments in BK107
9:15 – 12:00 Lectures12:00 – 13:15 Lunch
13:15 – ∼13:30 Introduction13:30 – 16:00 Practical assignments
• Building an IE system
• Regular expression-based methods
• Building on other NLP components: pipeline
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 44/44