day 8: information extraction and advanced statistical nlp

Post on 26-Jan-2022

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

DAY 8: INFORMATION EXTRACTION ANDADVANCED STATISTICAL NLP

PART II: ADVANCED STATISTICAL NLP

Mark Granroth-Wilding

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 1/44

OUTLINE

Statistical NLP

Advanced Statistical Methods

Closing remarks

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 2/44

OUTLINE

Statistical NLP

Advanced Statistical Methods

Closing remarks

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 3/44

REMINDER: AMBIGUITY OFINTERPRETATION

What is the mean temperature inKumpula?

SELECT day_mean FROM daily_forecast:

WHERE station = ’Helsinki Kumpula’:

AND date = ’2019-05-21’;

SELECT day_mean FROM weekly_forecast

WHERE station = ’Helsinki Kumpula’

AND week = ’w22’;

SELECT MEAN(day_temp)

FROM weather_history

WHERE station = ’Helsinki Kumpula’

AND year = ’2019’;

. . .?• Many forms of ambiguity

• Every level/step of analysis

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 4/44

REMINDER: AMBIGUITY INSYNTAX

S

NP

PropN

Alice

VP

VP

Vt

saw

NP

Det

Art

the

N

duck

PP

Prep

with

NP

Det

the

N

telescope

S

NP

PropN

Alice

VP

Vt

saw

NP

NP

Det

Art

the

N

duck

PP

Prep

with

NP

Det

the

N

telescope

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 5/44

REMINDER: STATISTICAL MODELS

• Statistics over previous analyses can help estimate confidences

• Often use probabilistic models

• Local ambiguity: probabilities/confidences

• Multiple hypotheses about meaning/structure

• Update hypotheses as larger units are combined

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 6/44

STATISTICS IN NLP

• Statistical models hit NLP in 1990s

• Now almost ubiquitous

• Seen already: rule-based systems augmented with statisticsE.g.: PCFG

• Also: derive rules from dataE.g. treebank parser

• So far: carefully defined models for specific sub-tasks

• Derived from linguistic theory

• Some exceptions: word embeddings, topic models

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 7/44

STATISTICS IN NLP

• Statistical models hit NLP in 1990s

• Now almost ubiquitous

• Seen already: rule-based systems augmented with statisticsE.g.: PCFG

• Also: derive rules from dataE.g. treebank parser

• So far: carefully defined models for specific sub-tasks

• Derived from linguistic theory

• Some exceptions: word embeddings, topic models

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 7/44

STATISTICS IN NLP

• Statistical models hit NLP in 1990s

• Now almost ubiquitous

• Seen already: rule-based systems augmented with statisticsE.g.: PCFG

• Also: derive rules from dataE.g. treebank parser

• So far: carefully defined models for specific sub-tasks

• Derived from linguistic theory

• Some exceptions: word embeddings, topic models

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 7/44

STATISTICS IN NLP

• Statistical models hit NLP in 1990s

• Now almost ubiquitous

• Seen already: rule-based systems augmented with statisticsE.g.: PCFG

• Also: derive rules from dataE.g. treebank parser

• So far: carefully defined models for specific sub-tasks

• Derived from linguistic theory

• Some exceptions: word embeddings, topic models

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 7/44

STATISTICS IN NLP

• Statistical models hit NLP in 1990s

• Now almost ubiquitous

• Seen already: rule-based systems augmented with statisticsE.g.: PCFG

• Also: derive rules from dataE.g. treebank parser

• So far: carefully defined models for specific sub-tasks

• Derived from linguistic theory

• Some exceptions: word embeddings, topic models

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 7/44

STATISTICS IN NLP

• Statistical models hit NLP in 1990s

• Now almost ubiquitous

• Seen already: rule-based systems augmented with statisticsE.g.: PCFG

• Also: derive rules from dataE.g. treebank parser

• So far: carefully defined models for specific sub-tasks

• Derived from linguistic theory

• Some exceptions: word embeddings, topic models

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 7/44

STATISTICS IN NLP

• Statistical models hit NLP in 1990s

• Now almost ubiquitous

• Seen already: rule-based systems augmented with statisticsE.g.: PCFG

• Also: derive rules from dataE.g. treebank parser

• So far: carefully defined models for specific sub-tasks

• Derived from linguistic theory

• Some exceptions: word embeddings, topic models

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 7/44

DISAMBIGUATION WITH A PCFG

S

NP

PropN

Alice

VP

VP

Vt

saw

NP

Det

Art

the

N

duck

PP

Prep

with

NP

Det

Art

the

N

telescope

0.2

1.0

1.0 0.7

0.5

0.55

0.6 0.7

0.5

0.45

1.0

0.35

0.65

0.5

1.0

0.5

p(t) = 2.07× 10−4

S

NP

PropN

Alice

VP

Vt

saw

NP

NP

Det

Art

the

N

duck

PP

Prep

with

NP

Det

Art

the

N

telescope

0.2

1.0 1.0 0.7

0.5

0.55 0.6 0.7

0.5

0.45

1.0

0.65

0.3

0.5 1.0

0.5

p(t) = 2.96× 10−4

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 8/44

PCFGs

Advantages:

• Linguistic structures (relatively) easy to understand

• Simple statistical model, easy to estimate (treebank)

• Further learning by semi-supervised estimation

• Quite efficient parsing with beam search

Disadvantages:

• Not expressive enough to capture all dependencies

• Not much like syntactic formalisms linguists use

• Statistical model limited• Independence assumptions• Can extend, efficient parsing becomes complicated

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 9/44

PCFGs

Advantages:

• Linguistic structures (relatively) easy to understand

• Simple statistical model, easy to estimate (treebank)

• Further learning by semi-supervised estimation

• Quite efficient parsing with beam search

Disadvantages:

• Not expressive enough to capture all dependencies

• Not much like syntactic formalisms linguists use

• Statistical model limited• Independence assumptions• Can extend, efficient parsing becomes complicated

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 9/44

MORE DATA, LESS LINGUISTICS

• Can use more advanced models• For parsing and other tasks

• More sophisticated modelling/learning/inference

• Use less linguistic motivation and knowledge

• E.g. neural end-to-end NLG architecture

• Potential advantages:• Learn from data instead of specifying by hand

E.g. tools for new languages• Discover new phenomena we didn’t know about• Learn structures too complex to write by hand• Maybe don’t care about linguistics, psycholinguistics

How the brainprocesses language

:just do well on task

• Help with long tail in some cases: e.g. rare words

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 10/44

MORE DATA, LESS LINGUISTICS

• Can use more advanced models• For parsing and other tasks

• More sophisticated modelling/learning/inference

• Use less linguistic motivation and knowledge

• E.g. neural end-to-end NLG architecture

• Potential advantages:• Learn from data instead of specifying by hand

E.g. tools for new languages• Discover new phenomena we didn’t know about• Learn structures too complex to write by hand• Maybe don’t care about linguistics, psycholinguistics

How the brainprocesses language

:just do well on task

• Help with long tail in some cases: e.g. rare words

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 10/44

MORE DATA, LESS LINGUISTICS

• Can use more advanced models• For parsing and other tasks

• More sophisticated modelling/learning/inference

• Use less linguistic motivation and knowledge

• E.g. neural end-to-end NLG architecture

• Potential advantages:• Learn from data instead of specifying by hand

E.g. tools for new languages• Discover new phenomena we didn’t know about

• Learn structures too complex to write by hand• Maybe don’t care about linguistics, psycholinguistics

How the brainprocesses language

:just do well on task

• Help with long tail in some cases: e.g. rare words

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 10/44

MORE DATA, LESS LINGUISTICS

• Can use more advanced models• For parsing and other tasks

• More sophisticated modelling/learning/inference

• Use less linguistic motivation and knowledge

• E.g. neural end-to-end NLG architecture

• Potential advantages:• Learn from data instead of specifying by hand

E.g. tools for new languages• Discover new phenomena we didn’t know about• Learn structures too complex to write by hand• Maybe don’t care about linguistics, psycholinguistics

How the brainprocesses language

:just do well on task

• Help with long tail in some cases: e.g. rare words

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 10/44

DANGERS OF FORGETTINGLINGUISTICS

A little linguistics takes you a long way

• Linguists know a lot about language!

• Easily fall into traps they’ve known about fordecades/centuries

• Or waste valuable insight

• Don’t claim linguistic insight without knowledge of linguistics

• Data helps with some types of long tail, but:• easy to focus too much on common phenomena• get caught out by less common ones – more informative

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 11/44

DANGERS OF FORGETTINGLINGUISTICS

A little linguistics takes you a long way

• Linguists know a lot about language!

• Easily fall into traps they’ve known about fordecades/centuries

• Or waste valuable insight

• Don’t claim linguistic insight without knowledge of linguistics

• Data helps with some types of long tail, but:• easy to focus too much on common phenomena• get caught out by less common ones – more informative

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 11/44

DANGERS OF FORGETTINGLINGUISTICS

A little linguistics takes you a long way

• Linguists know a lot about language!

• Easily fall into traps they’ve known about fordecades/centuries

• Or waste valuable insight

• Don’t claim linguistic insight without knowledge of linguistics

• Data helps with some types of long tail, but:• easy to focus too much on common phenomena• get caught out by less common ones – more informative

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 11/44

DANGERS OF FORGETTINGLINGUISTICS

A little linguistics takes you a long way

• Linguists know a lot about language!

• Easily fall into traps they’ve known about fordecades/centuries

• Or waste valuable insight

• Don’t claim linguistic insight without knowledge of linguistics

• Data helps with some types of long tail, but:• easy to focus too much on common phenomena

wordssyntactic structuresword classes. . .

• get caught out by less common ones – more informative

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 11/44

CLASSIFICATION AND REGRESSION

• Many practical NLP tasks• Classification: text classification, sentiment analysis, NER

• Regression: sentiment analysis, other continuous predictions

• Formulate task as one of classification/regression→ apply existing advanced techniques

• SVMs, logistic regression, polynomial regression, neuralnetworks, random forests, . . .

• Often, input features from other NLP components:POS tags, syntactic dependencies, lemmas, word embeddings

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 12/44

CLASSIFICATION AND REGRESSION

• Many practical NLP tasks• Classification: text classification, sentiment analysis, NER• Regression: sentiment analysis, other continuous predictions

• Formulate task as one of classification/regression→ apply existing advanced techniques

• SVMs, logistic regression, polynomial regression, neuralnetworks, random forests, . . .

• Often, input features from other NLP components:POS tags, syntactic dependencies, lemmas, word embeddings

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 12/44

CLASSIFICATION AND REGRESSION

• Many practical NLP tasks• Classification: text classification, sentiment analysis, NER• Regression: sentiment analysis, other continuous predictions

• Formulate task as one of classification/regression→ apply existing advanced techniques

• SVMs, logistic regression, polynomial regression, neuralnetworks, random forests, . . .

• Often, input features from other NLP components:POS tags, syntactic dependencies, lemmas, word embeddings

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 12/44

CLASSIFICATION AND REGRESSION

• Many practical NLP tasks• Classification: text classification, sentiment analysis, NER• Regression: sentiment analysis, other continuous predictions

• Formulate task as one of classification/regression→ apply existing advanced techniques

• SVMs, logistic regression, polynomial regression, neuralnetworks, random forests, . . .

• Often, input features from other NLP components:POS tags, syntactic dependencies, lemmas, word embeddings

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 12/44

SENTIMENT ANALYSIS

• Input: text (article, review, . . . )

• Goal: predict positive/negative sentiment

• E.g. good or bad review of product?

I just saw #CaptainMarveland I really enjoyed it frombeginning to end. I thoughtthe humor was great, andI loved Brie Larson. Surethere were some changesfrom the comic books, butI liked them. The changeswere necessary.

I wanted to like this movie,but it flopped for me... Theyshould have went with a dif-ferent route, instead of start-ing her off as a strong heroalready, just like every hero.Larson’s acting was a bitstale and lacked more per-sonality.

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 13/44

SENTIMENT ANALYSIS

• Input: text (article, review, . . . )

• Goal: predict positive/negative sentiment

• E.g. good or bad review of product?

I just saw #CaptainMarveland I really enjoyed it frombeginning to end. I thoughtthe humor was great, andI loved Brie Larson. Surethere were some changesfrom the comic books, butI liked them. The changeswere necessary.

I wanted to like this movie,but it flopped for me... Theyshould have went with a dif-ferent route, instead of start-ing her off as a strong heroalready, just like every hero.Larson’s acting was a bitstale and lacked more per-sonality.

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 13/44

SENTIMENT ANALYSIS

• Input: text (article, review, . . . )

• Goal: predict positive/negative sentiment

• E.g. good or bad review of product?

I just saw #CaptainMarveland I really enjoyed it frombeginning to end. I thoughtthe humor was great, andI loved Brie Larson. Surethere were some changesfrom the comic books, butI liked them. The changeswere necessary.

I wanted to like this movie,but it flopped for me... Theyshould have went with a dif-ferent route, instead of start-ing her off as a strong heroalready, just like every hero.Larson’s acting was a bitstale and lacked more per-sonality.

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 13/44

SENTIMENT ANALYSIS

• Input: text (article, review, . . . )

• Goal: predict positive/negative sentiment

• E.g. good or bad review of product?

I just saw #CaptainMarveland I really enjoyed it frombeginning to end. I thoughtthe humor was great, andI loved Brie Larson. Surethere were some changesfrom the comic books, butI liked them. The changeswere necessary.

I wanted to like this movie,but it flopped for me... Theyshould have went with a dif-ferent route, instead of start-ing her off as a strong heroalready, just like every hero.Larson’s acting was a bitstale and lacked more per-sonality.

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 13/44

SENTIMENT ANALYSIS

I just saw #CaptainMarveland I really enjoyed it frombeginning to end. I thoughtthe humor was great, andI loved Brie Larson. Surethere were some changesfrom the comic books, butI liked them. The changeswere necessary.

I wanted to like this movie,but it flopped for me... Theyshould have went with a dif-ferent route, instead of start-ing her off as a strong heroalready, just like every hero.Larson’s acting was a bitstale and lacked more per-sonality.

• Standard ML problem

• / labels

• What are input features? Words?

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 14/44

SENTIMENT ANALYSIS

I just saw #CaptainMarveland I really enjoyed it frombeginning to end. I thoughtthe humor was great, andI loved Brie Larson. Surethere were some changesfrom the comic books, butI liked them. The changeswere necessary.

I wanted to like this movie,but it flopped for me... Theyshould have went with a dif-ferent route, instead of start-ing her off as a strong heroalready, just like every hero.Larson’s acting was a bitstale and lacked more per-sonality.

• Standard ML problem

• / labels

• What are input features? Words?

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 14/44

SENTIMENT ANALYSIS

I just saw #CaptainMarveland I really enjoyed it frombeginning to end. I thoughtthe humor was great, andI loved Brie Larson. Surethere were some changesfrom the comic books, butI liked them. The changeswere necessary.

I wanted to like this movie,but it flopped for me... Theyshould have went with a dif-ferent route, instead of start-ing her off as a strong heroalready, just like every hero.Larson’s acting was a bitstale and lacked more per-sonality.

• Standard ML problem

• / labels

• What are input features? Words?

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 14/44

SENTIMENT ANALYSIS

I just saw #CaptainMarveland I really enjoyed it frombeginning to end. I thoughtthe humor was great, andI loved Brie Larson. Surethere were some changesfrom the comic books, butI liked them. The changeswere necessary.

I wanted to like this movie,but it flopped for me... Theyshould have went with a dif-ferent route, instead of start-ing her off as a strong heroalready, just like every hero.Larson’s acting was a bitstale and lacked more per-sonality.

• Standard ML problem

• / labels

• What are input features? Words?

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 14/44

SENTIMENT ANALYSIS

• Apply standard classification (or regression) methods

• NLP-based features

• Word embeddings, POS tags, . . . what else?

• Limitation of BOW model

• Solve with better features? Multi-word features?

• Or better model? Compose word meanings

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 15/44

SENTIMENT ANALYSIS

• Apply standard classification (or regression) methods

• NLP-based features

• Word embeddings, POS tags, . . . what else?

• Limitation of BOW model

• Solve with better features? Multi-word features?

• Or better model? Compose word meanings

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 15/44

SENTIMENT ANALYSIS

• Apply standard classification (or regression) methods

• NLP-based features

• Word embeddings, POS tags, . . . what else?

• Limitation of BOW model

• Solve with better features? Multi-word features?

• Or better model? Compose word meanings

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 15/44

SENTIMENT ANALYSIS

• Apply standard classification (or regression) methods

• NLP-based features

• Word embeddings, POS tags, . . . what else?

• Limitation of BOW model

• Solve with better features? Multi-word features?

• Or better model? Compose word meanings

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 15/44

OUTLINE

Statistical NLP

Advanced Statistical Methods

Closing remarks

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 16/44

ADVANCED STATISTICALMETHODS

• SVMs: another classifier

• Topic modelling: model detail on LDA

• Deep learning: hot topic, example use in NLP

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 17/44

SUPPORT VECTOR MACHINES(SVMs)

• Probably seen before. . . ?

• Commonly used classifier

• Still often best choice for classification tasks

• Well developed, efficient libraries

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 18/44

SUPPORT VECTOR MACHINES(SVMs)

• Probably seen before. . . ?

• Commonly used classifier

• Still often best choice for classification tasks

• Well developed, efficient libraries

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 18/44

SVMs

• Linearly separable data

• Find hyperplane to separate data with maximum margin

• High dimensional space

Original image: Fabian BurgerHELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 19/44

SVMs

• Linearly separable data

• Find hyperplane to separate data with maximum margin

• High dimensional space

Original image: Fabian BurgerHELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 19/44

SVMs

• Linearly separable data

• Find hyperplane to separate data with maximum margin

• High dimensional space

Original image: Fabian BurgerHELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 19/44

SVMs

• For non-linearly separable data: use kernel trick

• Map space to new dimensions via non-linear kernels

• Find linear separation

• Different kernels: different types of hyperplane

• Capture interactions between (original) dimensions

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 20/44

SVMs

• For non-linearly separable data: use kernel trick

• Map space to new dimensions via non-linear kernels

• Find linear separation

• Different kernels: different types of hyperplane

• Capture interactions between (original) dimensions

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 20/44

SVMs

• For non-linearly separable data: use kernel trick

• Map space to new dimensions via non-linear kernels

• Find linear separation

• Different kernels: different types of hyperplane

• Capture interactions between (original) dimensions

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 20/44

SVMs

• For non-linearly separable data: use kernel trick

• Map space to new dimensions via non-linear kernels

• Find linear separation

• Different kernels: different types of hyperplane

• Capture interactions between (original) dimensions

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 20/44

SVMs

• For non-linearly separable data: use kernel trick

• Map space to new dimensions via non-linear kernels

• Find linear separation

• Different kernels: different types of hyperplane

• Capture interactions between (original) dimensions

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 20/44

SVM APPLICATIONS

• Apply to sentiment analysis

• Separate positive examples from negative

• Many dimensions (features):words, word-POSs, embeddings, dependencies, . . .

• With kernels, word features no longer independent

• Overcomes some problems with BOW

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 21/44

SVM APPLICATIONS

• Apply to sentiment analysis

• Separate positive examples from negative

• Many dimensions (features):words, word-POSs, embeddings, dependencies, . . .

• With kernels, word features no longer independent

• Overcomes some problems with BOW

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 21/44

SVM APPLICATIONS

• Apply to sentiment analysis

• Separate positive examples from negative

• Many dimensions (features):words, word-POSs, embeddings, dependencies, . . .

• With kernels, word features no longer independent

• Overcomes some problems with BOW

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 21/44

SVM APPLICATIONS

• Apply to other classification problems

• E.g. email spam detection: spam vs ham

• Similar features

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 22/44

TOPIC MODELLING

• Topic modelling: LDA

• Example of unsupervised learning in NLP

• Type of clustering of docs

• Based on their words: BOW model

• More detail on modelling now

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 23/44

TOPIC MODELLING

• Topic modelling: LDA

• Example of unsupervised learning in NLP

• Type of clustering of docs

• Based on their words: BOW model

• More detail on modelling now

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 23/44

TOPIC MODELLING

• Topic modelling: LDA

• Example of unsupervised learning in NLP

• Type of clustering of docs

• Based on their words: BOW model

• More detail on modelling now

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 23/44

LDA

• Latent Dirichlet Allocation (LDA)

• Bayesian modelling: generative probability distribution

• Simplifying assumptions:• Each document generated by simple process• Fixed number of topics in corpus

• Bag of words model• Document covers a few topics• Topic associated (mainly) with small number of words

• Words that often appear in same doc belong to same topic

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 24/44

LDA

• Latent Dirichlet Allocation (LDA)

• Bayesian modelling: generative probability distribution

• Simplifying assumptions:• Each document generated by simple process• Fixed number of topics in corpus

• Bag of words model• Document covers a few topics• Topic associated (mainly) with small number of words

• Words that often appear in same doc belong to same topic

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 24/44

LDA

• Latent Dirichlet Allocation (LDA)

• Bayesian modelling: generative probability distribution

• Simplifying assumptions:• Each document generated by simple process• Fixed number of topics in corpus• Bag of words model

• Document covers a few topics• Topic associated (mainly) with small number of words

• Words that often appear in same doc belong to same topic

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 24/44

LDA

• Latent Dirichlet Allocation (LDA)

• Bayesian modelling: generative probability distribution

• Simplifying assumptions:• Each document generated by simple process• Fixed number of topics in corpus• Bag of words model• Document covers a few topics

• Topic associated (mainly) with small number of words

• Words that often appear in same doc belong to same topic

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 24/44

LDA

• Latent Dirichlet Allocation (LDA)

• Bayesian modelling: generative probability distribution

• Simplifying assumptions:• Each document generated by simple process• Fixed number of topics in corpus• Bag of words model• Document covers a few topics• Topic associated (mainly) with small number of words

• Words that often appear in same doc belong to same topic

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 24/44

LDA

• Latent Dirichlet Allocation (LDA)

• Bayesian modelling: generative probability distribution

• Simplifying assumptions:• Each document generated by simple process• Fixed number of topics in corpus• Bag of words model• Document covers a few topics• Topic associated (mainly) with small number of words

• Words that often appear in same doc belong to same topic

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 24/44

LDA

Generative process (‘story’):

(1) For each topic t:(1) Choose words associated with t(2) Prob dist over words

(2) For each document d :(1) Choose a distribution over topics

(2) For each word in d :(1) Choose a topic t from d ’s distribution

(2) Choose a word from t’s distribution

• Process assumed by model to underlie data

• Not process followed by inference!

• Only words and documents are observed:topics are latent variables discovered by model

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 25/44

LDA

Generative process (‘story’):

(1) For each topic t:(1) Choose words associated with t(2) Prob dist over words

(2) For each document d :(1) Choose a distribution over topics

(2) For each word in d :(1) Choose a topic t from d ’s distribution

(2) Choose a word from t’s distribution

• Process assumed by model to underlie data

• Not process followed by inference!

• Only words and documents are observed:topics are latent variables discovered by model

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 25/44

LDA

Generative process (‘story’):

(1) For each topic t:(1) Choose words associated with t(2) Prob dist over words

(2) For each document d :(1) Choose a distribution over topics(2) For each word in d :

(1) Choose a topic t from d ’s distribution

(2) Choose a word from t’s distribution

• Process assumed by model to underlie data

• Not process followed by inference!

• Only words and documents are observed:topics are latent variables discovered by model

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 25/44

LDA

Generative process (‘story’):

(1) For each topic t:(1) Choose words associated with t(2) Prob dist over words

(2) For each document d :(1) Choose a distribution over topics(2) For each word in d :

(1) Choose a topic t from d ’s distribution(2) Choose a word from t’s distribution

• Process assumed by model to underlie data

• Not process followed by inference!

• Only words and documents are observed:topics are latent variables discovered by model

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 25/44

LDA

Generative process (‘story’):

(1) For each topic t:(1) Choose words associated with t(2) Prob dist over words

(2) For each document d :(1) Choose a distribution over topics(2) For each word in d :

(1) Choose a topic t from d ’s distribution(2) Choose a word from t’s distribution

• Process assumed by model to underlie data

• Not process followed by inference!

• Only words and documents are observed:topics are latent variables discovered by model

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 25/44

LDA

Generative process (‘story’):

(1) For each topic t:(1) Choose words associated with t(2) Prob dist over words

(2) For each document d :(1) Choose a distribution over topics(2) For each word in d :

(1) Choose a topic t from d ’s distribution(2) Choose a word from t’s distribution

• Process assumed by model to underlie data

• Not process followed by inference!

• Only words and documents are observed:topics are latent variables discovered by model

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 25/44

REMINDER: PLATE DIAGRAMS

t0

p(t0|Start)

t1

p(t1|t0)

t2

p(t2|t1)

t3

p(t3|t2). . .

w0

p(this|t0)

w1

p(is|t1)

w2

p(a|t2)

w3

p(sentence|t3)

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 26/44

REMINDER: PLATE DIAGRAMS

t0

p(t0|Start)

t1

p(t1|t0)

t2

p(t2|t1)

t3

p(t3|t2). . .

w0

p(this|t0)

w1

p(is|t1)

w2

p(a|t2)

w3

p(sentence|t3)

Variables

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 26/44

REMINDER: PLATE DIAGRAMS

t0

p(t0|Start)

t1

p(t1|t0)

t2

p(t2|t1)

t3

p(t3|t2). . .

w0

p(this|t0)

w1

p(is|t1)

w2

p(a|t2)

w3

p(sentence|t3)

Variables

Hidden

Observed

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 26/44

REMINDER: PLATE DIAGRAMS

t0

p(t0|Start)

t1

p(t1|t0)

t2

p(t2|t1)

t3

p(t3|t2). . .

w0

p(this|t0)

w1

p(is|t1)

w2

p(a|t2)

w3

p(sentence|t3)

Variables

Hidden

Observed

Conditional distributions

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 26/44

REMINDER: PLATE DIAGRAMS

N

ti−1 ti

wi

More compact notation: box around repeated elements

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 27/44

REMINDER: PLATE DIAGRAMS

N

ti−1 ti

wi

Repeat N times:N words

More compact notation: box around repeated elements

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 27/44

LDA

DN K

wd ,n

zd ,n

θd

α

βk

η

(1) For each topic t:

(1) Choose words associatedwith t

(2) Prob dist over words

(2) For each document d :

(1) Choose a distribution overtopics

(2) For each word in d :

(1) Choose a topic t fromd ’s distribution

(2) Choose a word fromt’s distribution

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 28/44

LDA

DN K

wd ,n

zd ,n

θd

α

βk

η

(1) For each topic t:

(1) Choose words associatedwith t

(2) Prob dist over words

(2) For each document d :

(1) Choose a distribution overtopics

(2) For each word in d :

(1) Choose a topic t fromd ’s distribution

(2) Choose a word fromt’s distribution

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 28/44

LDA

DN K

wd ,n

zd ,n

θd

α

βk

η

(1) For each topic t:

(1) Choose words associatedwith t

(2) Prob dist over words

(2) For each document d :

(1) Choose a distribution overtopics

(2) For each word in d :

(1) Choose a topic t fromd ’s distribution

(2) Choose a word fromt’s distribution

Per-topic distover words

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 28/44

LDA

DN K

wd ,n

zd ,n

θd

α

βk

η

(1) For each topic t:

(1) Choose words associatedwith t

(2) Prob dist over words

(2) For each document d :

(1) Choose a distribution overtopics

(2) For each word in d :

(1) Choose a topic t fromd ’s distribution

(2) Choose a word fromt’s distribution

Per-topic distover words

Hyperparameter for βDist over dists

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 28/44

LDA

DN K

wd ,n

zd ,n

θd

α

βk

η

(1) For each topic t:

(1) Choose words associatedwith t

(2) Prob dist over words

(2) For each document d :

(1) Choose a distribution overtopics

(2) For each word in d :

(1) Choose a topic t fromd ’s distribution

(2) Choose a word fromt’s distribution

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 28/44

LDA

DN K

wd ,n

zd ,n

θd

α

βk

η

(1) For each topic t:

(1) Choose words associatedwith t

(2) Prob dist over words

(2) For each document d :

(1) Choose a distribution overtopics

(2) For each word in d :

(1) Choose a topic t fromd ’s distribution

(2) Choose a word fromt’s distribution

Per-doc distover topics

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 28/44

LDA

DN K

wd ,n

zd ,n

θd

α

βk

η

(1) For each topic t:

(1) Choose words associatedwith t

(2) Prob dist over words

(2) For each document d :

(1) Choose a distribution overtopics

(2) For each word in d :

(1) Choose a topic t fromd ’s distribution

(2) Choose a word fromt’s distribution

Per-doc distover topics

Hyperparameter for θDist over dists

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 28/44

LDA

DN K

wd ,n

zd ,n

θd

α

βk

η

(1) For each topic t:

(1) Choose words associatedwith t

(2) Prob dist over words

(2) For each document d :

(1) Choose a distribution overtopics

(2) For each word in d :

(1) Choose a topic t fromd ’s distribution

(2) Choose a word fromt’s distribution

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 28/44

LDA

DN K

wd ,n

zd ,n

θd

α

βk

η

(1) For each topic t:

(1) Choose words associatedwith t

(2) Prob dist over words

(2) For each document d :

(1) Choose a distribution overtopics

(2) For each word in d :

(1) Choose a topic t fromd ’s distribution

(2) Choose a word fromt’s distribution

Topic for each word

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 28/44

LDA

DN K

wd ,n

zd ,n

θd

α

βk

η

(1) For each topic t:

(1) Choose words associatedwith t

(2) Prob dist over words

(2) For each document d :

(1) Choose a distribution overtopics

(2) For each word in d :

(1) Choose a topic t fromd ’s distribution

(2) Choose a word fromt’s distribution

Topic for each word

Word, drawn from z ’s word dist (βz)HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 28/44

TRAINING LDA

• LDA defines prob of words given topics

• Topics are unknown – words are observed

• Can’t estimate p(w |t) from counts, as with HMM POS tagger

• Bayes’ rule: prob of topic given word (+ other topics)

• Rough idea behind training:• See lots of docs (words only)• Try selecting some topics based on words and current guess at

distributions

• Keep iterating – distributions start to look consistent

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 29/44

TRAINING LDA

• LDA defines prob of words given topics

• Topics are unknown – words are observed

• Can’t estimate p(w |t) from counts, as with HMM POS tagger

• Bayes’ rule: prob of topic given word (+ other topics)

• Rough idea behind training:• See lots of docs (words only)• Try selecting some topics based on words and current guess at

distributions

• Keep iterating – distributions start to look consistent

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 29/44

TRAINING LDA

• LDA defines prob of words given topics

• Topics are unknown – words are observed

• Can’t estimate p(w |t) from counts, as with HMM POS tagger

• Bayes’ rule: prob of topic given word (+ other topics)

• Rough idea behind training:• See lots of docs (words only)• Try selecting some topics based on words and current guess at

distributions

• Keep iterating – distributions start to look consistent

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 29/44

TRAINING LDA

• LDA defines prob of words given topics

• Topics are unknown – words are observed

• Can’t estimate p(w |t) from counts, as with HMM POS tagger

• Bayes’ rule: prob of topic given word (+ other topics)

• Rough idea behind training:• See lots of docs (words only)• Try selecting some topics based on words and current guess at

distributions

Initially random

• Keep iterating – distributions start to look consistent

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 29/44

TRAINING LDA

• LDA defines prob of words given topics

• Topics are unknown – words are observed

• Can’t estimate p(w |t) from counts, as with HMM POS tagger

• Bayes’ rule: prob of topic given word (+ other topics)

• Rough idea behind training:• See lots of docs (words only)• Try selecting some topics based on words and current guess at

distributions

Initially random

• Keep iterating – distributions start to look consistent

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 29/44

TRAINING LDA

• Over time, discover words that tend to occur together

• Grouped into topics

• E.g. many docs use bank and money→ one topic covers these docs

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 30/44

TRAINING LDA

• Over time, discover words that tend to occur together

• Grouped into topics

• E.g. many docs use bank and money→ one topic covers these docs

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 30/44

TRAINING LDA

• Over time, discover words that tend to occur together

• Grouped into topics

• E.g. many docs use bank and money→ one topic covers these docs

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 30/44

TRAINING LDA

• Over time, discover words that tend tooccur together

• Key to estimation: α and η specify priorbelief that

• each document talks about few topics• each topic is discussed using few words

• How few depends on α and η

DN K

wd ,n

zd ,n

θd

α

βk

η

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 31/44

TRAINING LDA

• Over time, discover words that tend tooccur together

• Key to estimation: α and η specify priorbelief that

• each document talks about few topics• each topic is discussed using few words

• How few depends on α and η

DN K

wd ,n

zd ,n

θd

α

βk

η

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 31/44

TRAINING LDA

• Over time, discover words that tend tooccur together

• Key to estimation: α and η specify priorbelief that

• each document talks about few topics

• each topic is discussed using few words

• How few depends on α and η

DN K

wd ,n

zd ,n

θd

α

βk

η

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 31/44

TRAINING LDA

• Over time, discover words that tend tooccur together

• Key to estimation: α and η specify priorbelief that

• each document talks about few topics• each topic is discussed using few words

• How few depends on α and η

DN K

wd ,n

zd ,n

θd

α

βk

η

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 31/44

TRAINING LDA

• Over time, discover words that tend tooccur together

• Key to estimation: α and η specify priorbelief that

• each document talks about few topics• each topic is discussed using few words

• How few depends on α and ηD

N Kwd ,n

zd ,n

θd

α

βk

η

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 31/44

NEURAL NETWORKS / DEEPLEARNING

• Neural networks: been around a long time

• Recent explosion:

• Clever new training tricks• Lots of data• Faster processors (GPUs)

• Can now train huge networks, many hidden layers→ deep learning

• Loads of applications to NLP

• Couple of examples here

• No details of modelling, training, ML theory, maths, . . . : seeother courses!

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 32/44

NEURAL NETWORKS / DEEPLEARNING

• Neural networks: been around a long time

• Recent explosion:

• Clever new training tricks• Lots of data• Faster processors (GPUs)

• Can now train huge networks, many hidden layers→ deep learning

• Loads of applications to NLP

• Couple of examples here

• No details of modelling, training, ML theory, maths, . . . : seeother courses!

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 32/44

NEURAL NETWORKS / DEEPLEARNING

• Neural networks: been around a long time

• Recent explosion:• Clever new training tricks

• Lots of data• Faster processors (GPUs)

• Can now train huge networks, many hidden layers→ deep learning

• Loads of applications to NLP

• Couple of examples here

• No details of modelling, training, ML theory, maths, . . . : seeother courses!

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 32/44

NEURAL NETWORKS / DEEPLEARNING

• Neural networks: been around a long time

• Recent explosion:• Clever new training tricks• Lots of data

• Faster processors (GPUs)

• Can now train huge networks, many hidden layers→ deep learning

• Loads of applications to NLP

• Couple of examples here

• No details of modelling, training, ML theory, maths, . . . : seeother courses!

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 32/44

NEURAL NETWORKS / DEEPLEARNING

• Neural networks: been around a long time

• Recent explosion:• Clever new training tricks• Lots of data• Faster processors (GPUs)

• Can now train huge networks, many hidden layers→ deep learning

• Loads of applications to NLP

• Couple of examples here

• No details of modelling, training, ML theory, maths, . . . : seeother courses!

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 32/44

NEURAL NETWORKS / DEEPLEARNING

• Neural networks: been around a long time

• Recent explosion:• Clever new training tricks• Lots of data• Faster processors (GPUs)

• Can now train huge networks, many hidden layers→ deep learning

• Loads of applications to NLP

• Couple of examples here

• No details of modelling, training, ML theory, maths, . . . : seeother courses!

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 32/44

NEURAL NETWORKS / DEEPLEARNING

• Neural networks: been around a long time

• Recent explosion:• Clever new training tricks• Lots of data• Faster processors (GPUs)

• Can now train huge networks, many hidden layers→ deep learning

• Loads of applications to NLP

• Couple of examples here

• No details of modelling, training, ML theory, maths, . . . : seeother courses!

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 32/44

NEURAL NETWORKS / DEEPLEARNING

• Neural networks: been around a long time

• Recent explosion:• Clever new training tricks• Lots of data• Faster processors (GPUs)

• Can now train huge networks, many hidden layers→ deep learning

• Loads of applications to NLP

• Couple of examples here

• No details of modelling, training, ML theory, maths, . . . : seeother courses!

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 32/44

NEURAL NETWORKS / DEEPLEARNING

• Neural networks: been around a long time

• Recent explosion:• Clever new training tricks• Lots of data• Faster processors (GPUs)

• Can now train huge networks, many hidden layers→ deep learning

• Loads of applications to NLP

• Couple of examples here

• No details of modelling, training, ML theory, maths, . . . : seeother courses!

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 32/44

SEQUENCE MODELS

• In NLP, lots of sequence processing!

• Words – sentences, POS tags, NER tags, . . .

• Not the full story: syntax, semantics (LR deps)

• Many applications of recurrent neural networks (RNNs)

• Learning problems with traditional RNNs

• Overcome by recent models: LSTMs, GRUs

• But same basic idea

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 33/44

SEQUENCE MODELS

• In NLP, lots of sequence processing!

• Words – sentences, POS tags, NER tags, . . .

• Not the full story: syntax, semantics (LR deps)

• Many applications of recurrent neural networks (RNNs)

• Learning problems with traditional RNNs

• Overcome by recent models: LSTMs, GRUs

• But same basic idea

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 33/44

SEQUENCE MODELS

• In NLP, lots of sequence processing!

• Words – sentences, POS tags, NER tags, . . .

• Not the full story: syntax, semantics (LR deps)

• Many applications of recurrent neural networks (RNNs)

• Learning problems with traditional RNNs

• Overcome by recent models: LSTMs, GRUs

• But same basic idea

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 33/44

SEQUENCE MODELS

• In NLP, lots of sequence processing!

• Words – sentences, POS tags, NER tags, . . .

• Not the full story: syntax, semantics (LR deps)

• Many applications of recurrent neural networks (RNNs)

• Learning problems with traditional RNNs

• Overcome by recent models: LSTMs, GRUs

• But same basic idea

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 33/44

SEQUENCE MODELS

• In NLP, lots of sequence processing!

• Words – sentences, POS tags, NER tags, . . .

• Not the full story: syntax, semantics (LR deps)

• Many applications of recurrent neural networks (RNNs)

• Learning problems with traditional RNNs

• Overcome by recent models: LSTMs, GRUs

• But same basic idea

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 33/44

SEQUENCE MODELS

• In NLP, lots of sequence processing!

• Words – sentences, POS tags, NER tags, . . .

• Not the full story: syntax, semantics (LR deps)

• Many applications of recurrent neural networks (RNNs)

• Learning problems with traditional RNNs

• Overcome by recent models: LSTMs, GRUs

• But same basic idea

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 33/44

SEQUENCE MODELS

• In NLP, lots of sequence processing!

• Words – sentences, POS tags, NER tags, . . .

• Not the full story: syntax, semantics (LR deps)

• Many applications of recurrent neural networks (RNNs)

• Learning problems with traditional RNNs

• Overcome by recent models: LSTMs, GRUs

• But same basic idea

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 33/44

RNN LANGUAGE MODELLING

• One application: language modelling

• Earlier: Markov LMs

, n-grams

• Some problems:• Markov assumption: bad, but practical• Limited n-gram size: data sparsity• Fixed n-gram size: lose LR deps

• Modern RNNs can help! (LSTMs, . . . )

w0

p(w0|START)

w1

p(w1|w0)

w2

p(w2|w1)

. . .

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 34/44

RNN LANGUAGE MODELLING

• One application: language modelling

• Earlier: Markov LMs

, n-grams

• Some problems:• Markov assumption: bad, but practical• Limited n-gram size: data sparsity• Fixed n-gram size: lose LR deps

• Modern RNNs can help! (LSTMs, . . . )

w0

p(w0|START)

w1

p(w1|w0)

w2

p(w2|w1)

. . .

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 34/44

RNN LANGUAGE MODELLING

• One application: language modelling

• Earlier: Markov LMs, n-grams

• Some problems:• Markov assumption: bad, but practical• Limited n-gram size: data sparsity• Fixed n-gram size: lose LR deps

• Modern RNNs can help! (LSTMs, . . . )

w0

p(w0|START)

w1

p(w1|START ,w0)

w2

p(w2|w0,w1)

. . .

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 34/44

RNN LANGUAGE MODELLING

• One application: language modelling

• Earlier: Markov LMs, n-grams

• Some problems:• Markov assumption: bad, but practical• Limited n-gram size: data sparsity• Fixed n-gram size: lose LR deps

• Modern RNNs can help! (LSTMs, . . . )

w0

p(w0|START)

w1

p(w1|START ,w0)

w2

p(w2|w0,w1)

. . .

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 34/44

RNN LANGUAGE MODELLING

• One application: language modelling

• Earlier: Markov LMs, n-grams

• Some problems:• Markov assumption: bad, but practical• Limited n-gram size: data sparsity• Fixed n-gram size: lose LR deps

• Modern RNNs can help! (LSTMs, . . . )

w0

p(w0|START)

w1

p(w1|START ,w0)

w2

p(w2|w0,w1)

. . .

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 34/44

RNNs

h0 h1 h2 h3 . . .

this is a sentence

o0 o1 o2 o3

• Looks rather like HMM

• States: theoretically ‘remember’ distant inputs/states

• Modern RNNs (LSTMs, GRUs) make this work in practice

• Outputs: based on seq so far

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 35/44

RNNs

h0 h1 h2 h3 . . .

this is a sentence

o0 o1 o2 o3

• Looks rather like HMM

• States: theoretically ‘remember’ distant inputs/states

• Modern RNNs (LSTMs, GRUs) make this work in practice

• Outputs: based on seq so far

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 35/44

RNNs

h0 h1 h2 h3 . . .

this is a sentence

o0 o1 o2 o3

Hidden state(layer/vector)

• Looks rather like HMM

• States: theoretically ‘remember’ distant inputs/states

• Modern RNNs (LSTMs, GRUs) make this work in practice

• Outputs: based on seq so far

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 35/44

RNNs

h0 h1 h2 h3 . . .

this is a sentence

o0 o1 o2 o3

Hidden state(layer/vector)

• Looks rather like HMM

• States: theoretically ‘remember’ distant inputs/states

• Modern RNNs (LSTMs, GRUs) make this work in practice

• Outputs: based on seq so far

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 35/44

RNNs

h0 h1 h2 h3 . . .

this is a sentence

o0 o1 o2 o3

Hidden state(layer/vector)

• Looks rather like HMM

• States: theoretically ‘remember’ distant inputs/states

• Modern RNNs (LSTMs, GRUs) make this work in practice

• Outputs: based on seq so far

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 35/44

RNNs

h0 h1 h2 h3 . . .

this is a sentence

o0 o1 o2 o3

Hidden state(layer/vector)

• Looks rather like HMM

• States: theoretically ‘remember’ distant inputs/states

• Modern RNNs (LSTMs, GRUs) make this work in practice

• Outputs: based on seq so far

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 35/44

RNNs FOR LM

h0 h1 h2 h3 . . .

w0 w0 w0 w0

o0 o1 o2 o3

• Inputs: words

• Outputs: prediction of next word

• Softmax output: probability distribution

• Train on sentences

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 36/44

RNNs FOR LM

h0 h1 h2 h3 . . .

w0 w0 w0 w0

o0 o1 o2 o3

• Inputs: words

• Outputs: prediction of next word

• Softmax output: probability distribution

• Train on sentences

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 36/44

RNNs FOR LM

h0 h1 h2 h3 . . .

w0 w0 w0 w0

w1 w2 w3 w4

• Inputs: words

• Outputs: prediction of next word

• Softmax output: probability distribution

• Train on sentences

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 36/44

RNNs FOR LM

h0 h1 h2 h3 . . .

w0 w0 w0 w0

h′0 h′1 h′2 h′3. . .

w1 w2 w3 w4

‘Deep’ hidden representations

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 37/44

RNNs FOR LM

h0 h1 h2 h3 . . .

w0 w0 w0 w0

h′0 h′1 h′2 h′3. . .

w1 w2 w3 w4

‘Deep’ hidden representations

Stack hidden layers(arbitrarily many)

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 37/44

RNNs FOR LM

h0 h1 h2 h3 . . .

w0 w0 w0 w0

h′0 h′1 h′2 h′3. . .

w1 w2 w3 w4

‘Deep’ hidden representations

Stack hidden layers(arbitrarily many)

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 37/44

OTHER TASKS

h0 h1 h2 h3 . . .

w0 w0 w0 w0

o0 o1 o2 o3

• Not just language modelling

• Trivial to apply to any supervised labelling task

• E.g. POS tagging

, NER

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 38/44

OTHER TASKS

h0 h1 h2 h3 . . .

w0 w0 w0 w0

JJ NNP VBI NNS

• Not just language modelling

• Trivial to apply to any supervised labelling task

• E.g. POS tagging

, NER

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 38/44

OTHER TASKS

h0 h1 h2 h3 . . .

w0 w0 w0 w0

O Plc O Prs

• Not just language modelling

• Trivial to apply to any supervised labelling task

• E.g. POS tagging, NER

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 38/44

RNNs FOR SENTIMENT

h0 h1 h2 h3

w0 w0 w0 w0

Score

• Apply to sentiment analysis

• Feed in input text

• Only predict at end

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 39/44

RNNs FOR SENTIMENT

h0 h1 h2 h3

w0 w0 w0 w0

Score

• Apply to sentiment analysis

• Feed in input text

• Only predict at end

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 39/44

RNNs FOR SENTIMENT

h0 h1 h2 h3

w0 w0 w0 w0

Score

• Apply to sentiment analysis

• Feed in input text

• Only predict at end

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 39/44

RNNs FOR SENTIMENT h0 h1 h2 h3

w0 w0 w0 w0

Score

• RNN sees full input: prediction can depend on any word

• Supervised training: known sentiment scores/classes

• Potential advantages

• No more BOWs! RNN can learn phrases, MWE, . . .• Can generalize based on word similarity:

great → ⇒ awesome →• Might learn LR deps

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 40/44

RNNs FOR SENTIMENT h0 h1 h2 h3

w0 w0 w0 w0

Score

• RNN sees full input: prediction can depend on any word

• Supervised training: known sentiment scores/classes

• Potential advantages

• No more BOWs! RNN can learn phrases, MWE, . . .• Can generalize based on word similarity:

great → ⇒ awesome →• Might learn LR deps

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 40/44

RNNs FOR SENTIMENT h0 h1 h2 h3

w0 w0 w0 w0

Score

• RNN sees full input: prediction can depend on any word

• Supervised training: known sentiment scores/classes

• Potential advantages

• No more BOWs! RNN can learn phrases, MWE, . . .• Can generalize based on word similarity:

great → ⇒ awesome →• Might learn LR deps

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 40/44

RNNs FOR SENTIMENT h0 h1 h2 h3

w0 w0 w0 w0

Score

• RNN sees full input: prediction can depend on any word

• Supervised training: known sentiment scores/classes

• Potential advantages• No more BOWs! RNN can learn phrases, MWE, . . .

• Can generalize based on word similarity:

great → ⇒ awesome →• Might learn LR deps

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 40/44

RNNs FOR SENTIMENT h0 h1 h2 h3

w0 w0 w0 w0

Score

• RNN sees full input: prediction can depend on any word

• Supervised training: known sentiment scores/classes

• Potential advantages• No more BOWs! RNN can learn phrases, MWE, . . .• Can generalize based on word similarity:

great → ⇒ awesome →

• Might learn LR deps

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 40/44

RNNs FOR SENTIMENT h0 h1 h2 h3

w0 w0 w0 w0

Score

• RNN sees full input: prediction can depend on any word

• Supervised training: known sentiment scores/classes

• Potential advantages• No more BOWs! RNN can learn phrases, MWE, . . .• Can generalize based on word similarity:

great → ⇒ awesome →• Might learn LR deps

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 40/44

MORE NNs

• Lots more work in this area• Bidirectional RNNs• Convolutional NNs (CNNs)• Context-sensitive word embeddings• Seq2seq models: Machine Translation, etc

• Other courses:• Deep Learning• Deep Learning for NLP (seminar)

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 41/44

PART II SUMMARY

• Statistics in NLP

• Data-driven NLP, less linguistics

• Classification / regression

• Sentiment analysis

• SVMs

• Topic modelling: details of LDA

• Neural networks

• RNNs, applications

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 42/44

READING MATERIAL

• All statistical methods: Eisenstein

• Neural networks: Neural Network Methods in NLPE-book, available through Helka

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 43/44

NEXT UP

After lunch:Practical assignments in BK107

9:15 – 12:00 Lectures12:00 – 13:15 Lunch

13:15 – ∼13:30 Introduction13:30 – 16:00 Practical assignments

• Building an IE system

• Regular expression-based methods

• Building on other NLP components: pipeline

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI NLP 2019. D8: IE and Advanced Stat NLP Mark Granroth-Wilding 44/44

top related