language models continueddasmith/inlp2009/lect4-cs585.pdf · rip - off of " fargo " and...
TRANSCRIPT
Language ModelsContinued
Introduction to Natural Language ProcessingComputer Science 585—Fall 2009
University of Massachusetts Amherst
David Smith
1
The Story So Far
• Last time: simple LMs
• Markov assumptions: bigrams, trigrams,...
• Generating text from an n-gram model
• This time
• More on probability
• Bayes theorem and naive Bayes classifiers
• Smoothing: expecting the unseen
2
Axioms of Probability
⋃i Fi = Ω• Define event space
• Probability function, s.t.
• Disjoint events sum
• All events sum to one
• Show that:
P : F → [0, 1]
A ∩B = ∅ ⇔ P (A ∪B) = P (A) + P (B)
P (Ω) = 1
P (A) = 1− P (A)
3
Conditional Probability
P (A | B) =P (A,B)P (B)
P (A,B) = P (B)P (A | B) = P (A)P (B | A)
P (A1, A2, . . . , An) = P (A1)P (A2 | A1)P (A3 | A1, A2)· · · P (An | A1, . . . , An−1)Chain rule
A
BA∩B
4
Independence
P (A,B) = P (A)P (B)⇔
P (A | B) = P (A) ∧ P (B | A) = P (B)
In coding terms, knowing B doesn’t help in decoding A, and vice versa.
5
Another View of Markov Models
p(w1, w2, . . . , wn) = p(w1)p(w2 | w1)p(w3 | w1, w2)p(w4 | w1, w2, w3) · · · p(wn | p1, . . . , pn−1)
6
Another View of Markov Models
p(w1, w2, . . . , wn) = p(w1)p(w2 | w1)p(w3 | w1, w2)p(w4 | w1, w2, w3) · · · p(wn | p1, . . . , pn−1)
p(wi | w1, . . . wi−1) ≈ p(wi | wi−1)
Markov independence assumption
6
Another View of Markov Models
p(w1, w2, . . . , wn) = p(w1)p(w2 | w1)p(w3 | w1, w2)p(w4 | w1, w2, w3) · · · p(wn | p1, . . . , pn−1)
p(w1, w2, . . . , wn) ≈ p(w1)p(w2 | w1)p(w3 | w2)p(w4 | w3) · · · p(wn | pn−1)
p(wi | w1, . . . wi−1) ≈ p(wi | wi−1)
Markov independence assumption
6
Yet Another View
w1 w2 w3 w4
7
Yet Another View
w1 w2 w3 w4
The
7
Yet Another View
w1 w2 w3 w4
Thep(w2|The)
7
Yet Another View
w1 w2 w3 w4
The resultsp(w2|The)
7
Yet Another View
w1 w2 w3 w4
The resultsp(w2|The) p(w3|results)
7
Yet Another View
w1 w2 w3 w4
The results havep(w2|The) p(w3|results)
7
Yet Another View
w1 w2 w3 w4
The results havep(w2|The) p(w3|results) p(w4|have)
7
Yet Another View
w1 w2 w3 w4
The results have shownp(w2|The) p(w3|results) p(w4|have)
7
Yet Another View
w1 w2 w3 w4
The results have shownp(w2|The) p(w3|results) p(w4|have) p(w5|shown)
7
Yet Another View
w1 w2 w3 w4
The results have shownp(w2|The) p(w3|results) p(w4|have) p(w5|shown)
Directed graphical models: lack of edge means conditional independence
7
Yet Another View
w1 w2 w3 w4
The results have shownp(w2|The) p(w3|results) p(w4|have) p(w5|shown)
The results have shown
Directed graphical models: lack of edge means conditional independence
7
Yet Another View
w1 w2 w3 w4
The results have shownp(w2|The) p(w3|results) p(w4|have) p(w5|shown)
The results have shown
Directed graphical models: lack of edge means conditional independence
7
Yet Another View
w1 w2 w3 w4
The results have shownp(w2|The) p(w3|results) p(w4|have) p(w5|shown)
The results have shownp(w2|The) p(w3|The,results) p(w4|results,have) p(w5|have,shown)
Directed graphical models: lack of edge means conditional independence
7
Classifiers:Language under
Different Conditions
8
Movie Reviews
9
Movie Reviewsthere ' s some movies i enjoy even though i know i probably shouldn 't and have a difficult time trying to explain why i did . " luckynumbers " is a perfect example of this because it ' s such a blatantrip - off of " fargo " and every movie based on an elmore leonardnovel and yet it somehow still works for me . i know i ' m in theminority here but let me explain . the film takes place in harrisburg, pa in 1988 during an unseasonably warm winter . ...
9
Movie Reviewsthere ' s some movies i enjoy even though i know i probably shouldn 't and have a difficult time trying to explain why i did . " luckynumbers " is a perfect example of this because it ' s such a blatantrip - off of " fargo " and every movie based on an elmore leonardnovel and yet it somehow still works for me . i know i ' m in theminority here but let me explain . the film takes place in harrisburg, pa in 1988 during an unseasonably warm winter . ...
9
Movie Reviewsthere ' s some movies i enjoy even though i know i probably shouldn 't and have a difficult time trying to explain why i did . " luckynumbers " is a perfect example of this because it ' s such a blatantrip - off of " fargo " and every movie based on an elmore leonardnovel and yet it somehow still works for me . i know i ' m in theminority here but let me explain . the film takes place in harrisburg, pa in 1988 during an unseasonably warm winter . ...
seen at : amc old pasadena 8 , pasadena , ca ( in sdds ) paulverhoeven ' s last movie , showgirls , had a bad script , bad acting ,and a " plot " ( i use the word in its loosest possible sense ) thatserved only to allow lots of sex and nudity . it stank . starshiptroopers has a bad script , bad acting , and a " plot " that servesonly to allow lots of violence and gore . it stinks . nobody willwatch this movie for the plot , ...
9
Movie Reviewsthere ' s some movies i enjoy even though i know i probably shouldn 't and have a difficult time trying to explain why i did . " luckynumbers " is a perfect example of this because it ' s such a blatantrip - off of " fargo " and every movie based on an elmore leonardnovel and yet it somehow still works for me . i know i ' m in theminority here but let me explain . the film takes place in harrisburg, pa in 1988 during an unseasonably warm winter . ...
seen at : amc old pasadena 8 , pasadena , ca ( in sdds ) paulverhoeven ' s last movie , showgirls , had a bad script , bad acting ,and a " plot " ( i use the word in its loosest possible sense ) thatserved only to allow lots of sex and nudity . it stank . starshiptroopers has a bad script , bad acting , and a " plot " that servesonly to allow lots of violence and gore . it stinks . nobody willwatch this movie for the plot , ...
9
Movie Reviewsthere ' s some movies i enjoy even though i know i probably shouldn 't and have a difficult time trying to explain why i did . " luckynumbers " is a perfect example of this because it ' s such a blatantrip - off of " fargo " and every movie based on an elmore leonardnovel and yet it somehow still works for me . i know i ' m in theminority here but let me explain . the film takes place in harrisburg, pa in 1988 during an unseasonably warm winter . ...
seen at : amc old pasadena 8 , pasadena , ca ( in sdds ) paulverhoeven ' s last movie , showgirls , had a bad script , bad acting ,and a " plot " ( i use the word in its loosest possible sense ) thatserved only to allow lots of sex and nudity . it stank . starshiptroopers has a bad script , bad acting , and a " plot " that servesonly to allow lots of violence and gore . it stinks . nobody willwatch this movie for the plot , ...
the rich legacy of cinema has left us with certain indelible images .the tinkling christmas tree bell in " it ' s a wonderful life . "bogie ' s speech at the airport in " casablanca . " little elliott ' sflying bicycle , silhouetted by the moon in " e . t . " and now , "starship troopers " director paul verhoeven adds one more image thatwill live in our memories forever : doogie houser doing a vulcan mindmeld with a giant slug . " starship troopers , " loosely based on
9
Movie Reviewsthere ' s some movies i enjoy even though i know i probably shouldn 't and have a difficult time trying to explain why i did . " luckynumbers " is a perfect example of this because it ' s such a blatantrip - off of " fargo " and every movie based on an elmore leonardnovel and yet it somehow still works for me . i know i ' m in theminority here but let me explain . the film takes place in harrisburg, pa in 1988 during an unseasonably warm winter . ...
seen at : amc old pasadena 8 , pasadena , ca ( in sdds ) paulverhoeven ' s last movie , showgirls , had a bad script , bad acting ,and a " plot " ( i use the word in its loosest possible sense ) thatserved only to allow lots of sex and nudity . it stank . starshiptroopers has a bad script , bad acting , and a " plot " that servesonly to allow lots of violence and gore . it stinks . nobody willwatch this movie for the plot , ...
the rich legacy of cinema has left us with certain indelible images .the tinkling christmas tree bell in " it ' s a wonderful life . "bogie ' s speech at the airport in " casablanca . " little elliott ' sflying bicycle , silhouetted by the moon in " e . t . " and now , "starship troopers " director paul verhoeven adds one more image thatwill live in our memories forever : doogie houser doing a vulcan mindmeld with a giant slug . " starship troopers , " loosely based on
9
Setting up a Classifier
10
Setting up a Classifier
• What we want:
p( | w1, w2, ..., wn) > p( | w1, w2, ..., wn) ?
10
Setting up a Classifier
• What we want:
p( | w1, w2, ..., wn) > p( | w1, w2, ..., wn) ?
• What we know how to build:
10
Setting up a Classifier
• What we want:
p( | w1, w2, ..., wn) > p( | w1, w2, ..., wn) ?
• What we know how to build:
• A language model for each class
10
Setting up a Classifier
• What we want:
p( | w1, w2, ..., wn) > p( | w1, w2, ..., wn) ?
• What we know how to build:
• A language model for each class
• p(w1, w2, ..., wn | )
10
Setting up a Classifier
• What we want:
p( | w1, w2, ..., wn) > p( | w1, w2, ..., wn) ?
• What we know how to build:
• A language model for each class
• p(w1, w2, ..., wn | )
• p(w1, w2, ..., wn | )
10
Bayes’ Theorem
P (A,B) = P (B)P (A | B) = P (A)P (B | A)
P (A | B) =P (B | A)P (A)
P (B)
By the definition of conditional probability:
we can show:
Seemingly trivial result from 1763; interesting consequences...
11
A “Bayesian” Classifier
PriorLikelihood
maxR∈!,"
p(R | w1, w2, . . . , wn) = maxR∈!,"
p(R)p(w1, w2, . . . , wn | R)
Posterior
p(R | w1, w2, . . . , wn) =p(R)p(w1, w2, . . . , wn | R)
p(w1, w2, . . . , wn)
12
Naive Bayes Classifier
w1 w2 w3 w4
R
No dependencies among words!
13
NB on Movie Reviews
>>> classifier.show_most_informative_features(5)
classifier.show_most_informative_features(5)Most Informative Features contains(outstanding) = True pos : neg = 14.1 : 1.0 contains(mulan) = True pos : neg = 8.3 : 1.0 contains(seagal) = True neg : pos = 7.8 : 1.0 contains(wonderfully) = True pos : neg = 6.6 : 1.0 contains(damon) = True pos : neg = 6.1 : 1.0
• Train models for positive, negative
• For each review, find higher posterior
• Which word probability ratios are highest?
14
What’s Wrong With NB?
• What happens for word dependencies are strong?
• What happens when some words occur only once?
• What happens when the classifier sees a new word?
15
Summing Up
• Exploit rules of probability to condition events
• Exploit Bayes rule for classification
• Smooth to avoid zeroes
• Read Manning & Schütze 2.1 and chap. 6
16