language models continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and...

41
Language Models Continued Introduction to Natural Language Processing Computer Science 585—Fall 2009 University of Massachusetts Amherst David Smith 1

Upload: others

Post on 01-Apr-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Language ModelsContinued

Introduction to Natural Language ProcessingComputer Science 585—Fall 2009

University of Massachusetts Amherst

David Smith

1

Page 2: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

The Story So Far

• Last time: simple LMs

• Markov assumptions: bigrams, trigrams,...

• Generating text from an n-gram model

• This time

• More on probability

• Bayes theorem and naive Bayes classifiers

• Smoothing: expecting the unseen

2

Page 3: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Axioms of Probability

⋃i Fi = Ω• Define event space

• Probability function, s.t.

• Disjoint events sum

• All events sum to one

• Show that:

P : F → [0, 1]

A ∩B = ∅ ⇔ P (A ∪B) = P (A) + P (B)

P (Ω) = 1

P (A) = 1− P (A)

3

Page 4: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Conditional Probability

P (A | B) =P (A,B)P (B)

P (A,B) = P (B)P (A | B) = P (A)P (B | A)

P (A1, A2, . . . , An) = P (A1)P (A2 | A1)P (A3 | A1, A2)· · · P (An | A1, . . . , An−1)Chain rule

A

BA∩B

4

Page 5: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Independence

P (A,B) = P (A)P (B)⇔

P (A | B) = P (A) ∧ P (B | A) = P (B)

In coding terms, knowing B doesn’t help in decoding A, and vice versa.

5

Page 6: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Another View of Markov Models

p(w1, w2, . . . , wn) = p(w1)p(w2 | w1)p(w3 | w1, w2)p(w4 | w1, w2, w3) · · · p(wn | p1, . . . , pn−1)

6

Page 7: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Another View of Markov Models

p(w1, w2, . . . , wn) = p(w1)p(w2 | w1)p(w3 | w1, w2)p(w4 | w1, w2, w3) · · · p(wn | p1, . . . , pn−1)

p(wi | w1, . . . wi−1) ≈ p(wi | wi−1)

Markov independence assumption

6

Page 8: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Another View of Markov Models

p(w1, w2, . . . , wn) = p(w1)p(w2 | w1)p(w3 | w1, w2)p(w4 | w1, w2, w3) · · · p(wn | p1, . . . , pn−1)

p(w1, w2, . . . , wn) ≈ p(w1)p(w2 | w1)p(w3 | w2)p(w4 | w3) · · · p(wn | pn−1)

p(wi | w1, . . . wi−1) ≈ p(wi | wi−1)

Markov independence assumption

6

Page 9: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Yet Another View

w1 w2 w3 w4

7

Page 10: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Yet Another View

w1 w2 w3 w4

The

7

Page 11: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Yet Another View

w1 w2 w3 w4

Thep(w2|The)

7

Page 12: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Yet Another View

w1 w2 w3 w4

The resultsp(w2|The)

7

Page 13: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Yet Another View

w1 w2 w3 w4

The resultsp(w2|The) p(w3|results)

7

Page 14: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Yet Another View

w1 w2 w3 w4

The results havep(w2|The) p(w3|results)

7

Page 15: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Yet Another View

w1 w2 w3 w4

The results havep(w2|The) p(w3|results) p(w4|have)

7

Page 16: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Yet Another View

w1 w2 w3 w4

The results have shownp(w2|The) p(w3|results) p(w4|have)

7

Page 17: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Yet Another View

w1 w2 w3 w4

The results have shownp(w2|The) p(w3|results) p(w4|have) p(w5|shown)

7

Page 18: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Yet Another View

w1 w2 w3 w4

The results have shownp(w2|The) p(w3|results) p(w4|have) p(w5|shown)

Directed graphical models: lack of edge means conditional independence

7

Page 19: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Yet Another View

w1 w2 w3 w4

The results have shownp(w2|The) p(w3|results) p(w4|have) p(w5|shown)

The results have shown

Directed graphical models: lack of edge means conditional independence

7

Page 20: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Yet Another View

w1 w2 w3 w4

The results have shownp(w2|The) p(w3|results) p(w4|have) p(w5|shown)

The results have shown

Directed graphical models: lack of edge means conditional independence

7

Page 21: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Yet Another View

w1 w2 w3 w4

The results have shownp(w2|The) p(w3|results) p(w4|have) p(w5|shown)

The results have shownp(w2|The) p(w3|The,results) p(w4|results,have) p(w5|have,shown)

Directed graphical models: lack of edge means conditional independence

7

Page 22: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Classifiers:Language under

Different Conditions

8

Page 23: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Movie Reviews

9

Page 24: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Movie Reviewsthere ' s some movies i enjoy even though i know i probably shouldn 't and have a difficult time trying to explain why i did . " luckynumbers " is a perfect example of this because it ' s such a blatantrip - off of " fargo " and every movie based on an elmore leonardnovel and yet it somehow still works for me . i know i ' m in theminority here but let me explain . the film takes place in harrisburg, pa in 1988 during an unseasonably warm winter . ...

9

Page 25: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Movie Reviewsthere ' s some movies i enjoy even though i know i probably shouldn 't and have a difficult time trying to explain why i did . " luckynumbers " is a perfect example of this because it ' s such a blatantrip - off of " fargo " and every movie based on an elmore leonardnovel and yet it somehow still works for me . i know i ' m in theminority here but let me explain . the film takes place in harrisburg, pa in 1988 during an unseasonably warm winter . ...

9

Page 26: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Movie Reviewsthere ' s some movies i enjoy even though i know i probably shouldn 't and have a difficult time trying to explain why i did . " luckynumbers " is a perfect example of this because it ' s such a blatantrip - off of " fargo " and every movie based on an elmore leonardnovel and yet it somehow still works for me . i know i ' m in theminority here but let me explain . the film takes place in harrisburg, pa in 1988 during an unseasonably warm winter . ...

seen at : amc old pasadena 8 , pasadena , ca ( in sdds ) paulverhoeven ' s last movie , showgirls , had a bad script , bad acting ,and a " plot " ( i use the word in its loosest possible sense ) thatserved only to allow lots of sex and nudity . it stank . starshiptroopers has a bad script , bad acting , and a " plot " that servesonly to allow lots of violence and gore . it stinks . nobody willwatch this movie for the plot , ...

9

Page 27: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Movie Reviewsthere ' s some movies i enjoy even though i know i probably shouldn 't and have a difficult time trying to explain why i did . " luckynumbers " is a perfect example of this because it ' s such a blatantrip - off of " fargo " and every movie based on an elmore leonardnovel and yet it somehow still works for me . i know i ' m in theminority here but let me explain . the film takes place in harrisburg, pa in 1988 during an unseasonably warm winter . ...

seen at : amc old pasadena 8 , pasadena , ca ( in sdds ) paulverhoeven ' s last movie , showgirls , had a bad script , bad acting ,and a " plot " ( i use the word in its loosest possible sense ) thatserved only to allow lots of sex and nudity . it stank . starshiptroopers has a bad script , bad acting , and a " plot " that servesonly to allow lots of violence and gore . it stinks . nobody willwatch this movie for the plot , ...

9

Page 28: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Movie Reviewsthere ' s some movies i enjoy even though i know i probably shouldn 't and have a difficult time trying to explain why i did . " luckynumbers " is a perfect example of this because it ' s such a blatantrip - off of " fargo " and every movie based on an elmore leonardnovel and yet it somehow still works for me . i know i ' m in theminority here but let me explain . the film takes place in harrisburg, pa in 1988 during an unseasonably warm winter . ...

seen at : amc old pasadena 8 , pasadena , ca ( in sdds ) paulverhoeven ' s last movie , showgirls , had a bad script , bad acting ,and a " plot " ( i use the word in its loosest possible sense ) thatserved only to allow lots of sex and nudity . it stank . starshiptroopers has a bad script , bad acting , and a " plot " that servesonly to allow lots of violence and gore . it stinks . nobody willwatch this movie for the plot , ...

the rich legacy of cinema has left us with certain indelible images .the tinkling christmas tree bell in " it ' s a wonderful life . "bogie ' s speech at the airport in " casablanca . " little elliott ' sflying bicycle , silhouetted by the moon in " e . t . " and now , "starship troopers " director paul verhoeven adds one more image thatwill live in our memories forever : doogie houser doing a vulcan mindmeld with a giant slug . " starship troopers , " loosely based on

9

Page 29: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Movie Reviewsthere ' s some movies i enjoy even though i know i probably shouldn 't and have a difficult time trying to explain why i did . " luckynumbers " is a perfect example of this because it ' s such a blatantrip - off of " fargo " and every movie based on an elmore leonardnovel and yet it somehow still works for me . i know i ' m in theminority here but let me explain . the film takes place in harrisburg, pa in 1988 during an unseasonably warm winter . ...

seen at : amc old pasadena 8 , pasadena , ca ( in sdds ) paulverhoeven ' s last movie , showgirls , had a bad script , bad acting ,and a " plot " ( i use the word in its loosest possible sense ) thatserved only to allow lots of sex and nudity . it stank . starshiptroopers has a bad script , bad acting , and a " plot " that servesonly to allow lots of violence and gore . it stinks . nobody willwatch this movie for the plot , ...

the rich legacy of cinema has left us with certain indelible images .the tinkling christmas tree bell in " it ' s a wonderful life . "bogie ' s speech at the airport in " casablanca . " little elliott ' sflying bicycle , silhouetted by the moon in " e . t . " and now , "starship troopers " director paul verhoeven adds one more image thatwill live in our memories forever : doogie houser doing a vulcan mindmeld with a giant slug . " starship troopers , " loosely based on

9

Page 30: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Setting up a Classifier

10

Page 31: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Setting up a Classifier

• What we want:

p( | w1, w2, ..., wn) > p( | w1, w2, ..., wn) ?

10

Page 32: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Setting up a Classifier

• What we want:

p( | w1, w2, ..., wn) > p( | w1, w2, ..., wn) ?

• What we know how to build:

10

Page 33: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Setting up a Classifier

• What we want:

p( | w1, w2, ..., wn) > p( | w1, w2, ..., wn) ?

• What we know how to build:

• A language model for each class

10

Page 34: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Setting up a Classifier

• What we want:

p( | w1, w2, ..., wn) > p( | w1, w2, ..., wn) ?

• What we know how to build:

• A language model for each class

• p(w1, w2, ..., wn | )

10

Page 35: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Setting up a Classifier

• What we want:

p( | w1, w2, ..., wn) > p( | w1, w2, ..., wn) ?

• What we know how to build:

• A language model for each class

• p(w1, w2, ..., wn | )

• p(w1, w2, ..., wn | )

10

Page 36: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Bayes’ Theorem

P (A,B) = P (B)P (A | B) = P (A)P (B | A)

P (A | B) =P (B | A)P (A)

P (B)

By the definition of conditional probability:

we can show:

Seemingly trivial result from 1763; interesting consequences...

11

Page 37: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

A “Bayesian” Classifier

PriorLikelihood

maxR∈!,"

p(R | w1, w2, . . . , wn) = maxR∈!,"

p(R)p(w1, w2, . . . , wn | R)

Posterior

p(R | w1, w2, . . . , wn) =p(R)p(w1, w2, . . . , wn | R)

p(w1, w2, . . . , wn)

12

Page 38: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Naive Bayes Classifier

w1 w2 w3 w4

R

No dependencies among words!

13

Page 39: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

NB on Movie Reviews

>>> classifier.show_most_informative_features(5)

classifier.show_most_informative_features(5)Most Informative Features contains(outstanding) = True pos : neg = 14.1 : 1.0 contains(mulan) = True pos : neg = 8.3 : 1.0 contains(seagal) = True neg : pos = 7.8 : 1.0 contains(wonderfully) = True pos : neg = 6.6 : 1.0 contains(damon) = True pos : neg = 6.1 : 1.0

• Train models for positive, negative

• For each review, find higher posterior

• Which word probability ratios are highest?

14

Page 40: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

What’s Wrong With NB?

• What happens for word dependencies are strong?

• What happens when some words occur only once?

• What happens when the classifier sees a new word?

15

Page 41: Language Models Continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and every movie based on an elmore leonard novel and yet it somehow still works for me . i know

Summing Up

• Exploit rules of probability to condition events

• Exploit Bayes rule for classification

• Smooth to avoid zeroes

• Read Manning & Schütze 2.1 and chap. 6

16