pushpak bhattacharyya cse dept., iit bombay 17 th march, 2011

58
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 27– SMT Assignment; HMM recap; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT Bombay 17 th March, 2011

Upload: prentice

Post on 22-Feb-2016

36 views

Category:

Documents


0 download

DESCRIPTION

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 27– SMT Assignment; HMM recap; Probabilistic Parsing cntd ). Pushpak Bhattacharyya CSE Dept., IIT Bombay 17 th March, 2011. CMU Pronunciation Dictionary Assignment. Data. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

CS460/626 : Natural Language Processing/Speech, NLP and the Web

(Lecture 27– SMT Assignment; HMM recap; Probabilistic Parsing cntd)

Pushpak BhattacharyyaCSE Dept., IIT Bombay

17th March, 2011

Page 2: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

CMU Pronunciation Dictionary Assignment

Page 3: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Data The Carnegie Mellon University

Pronouncing Dictionary machine-readable pronunciation

dictionary for North American English that contains over 125,000 words and their transcriptions.

The current phoneme set contains 39 phonemes

Page 4: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

“Parallel” CorpusPhoneme Example Translation ------- ------- ----------- AA odd AA D AE at AE T AH hut HH AH T AO ought AO T AW cow K AW AY hide HH AY D B be B IY

Page 5: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

“Parallel” Corpus cntdPhoneme Example Translation ------- ------- ----------- CH cheese CH IY Z D dee D IY DH thee DH IY EH Ed EH D ER hurt HH ER T EY ate EY T F fee F IY G green G R IY N HH he HH IY IH it IH T IY eat IY T JH gee JH IY

Page 6: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

The tasks First obtain the Carnegie Mellon

University's Pronouncing Dictionary Create the Phrase Table using

GIZA++ For language modeling use SRILM For decoding use Moses Calculate precision, recall and F-

score

Page 7: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Probabilistic Parsing

Page 8: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Bridging Classical and Probabilistic Parsing The bridge between probabilistic parsing and

classical parsing is the concept of domination Frequency: P( NP -> DT NN) = 0.5 means in the

corpus 50% of noun phrase is composed of determiner and noun

Phenomenon: P(NP -> DT NN) is actually P(DT NN |NP) i.e. join probability of domination by DT and NN to give rise to domination of NP

The concept of domination is the bridge between Frequency (probabilistic parsing) and Phenomenon (classical parsing).

Page 9: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Calculating Probability of a Sentence

We can either calculate P(s=w1 m) using naive N-gram based approach or by calculating

Which approach to choose??

The velocity of waves rises near the shore.

Consecutive plural noun and singular verb is unlikely in the corpus. So low probability value for the sentence as given by n-gram.

1

1: ( )

( ) ( )m

mt yield t w

P s w P t

1: ( )

( )mt yield t w

P t

Page 10: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Parse Tree

No other Parse Tree is possible for the sentence

S

NP VP

nearrises

wavesofvelocityThe

PP

DT

the shore

NP

NN P NNS

V PP

NPPP N

Page 11: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Various ways to calculate probability of sentence Naïve n-gram based Syntactic level (Parse tree) Semantic Level Pragmatics Discourse

Page 12: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Probabilistic Context Free Grammars

S NP VP 1.0 NP DT NN 0.5 NP NNS 0.3 NP NP PP 0.2 PP P NP 1.0 VP VP PP 0.6 VP VBD NP 0.4

DT the 1.0 NN gunman 0.5 NN building 0.5 VBD sprayed 1.0 NNS bullets 1.0

Page 13: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Example Parse t1`

The gunman sprayed the building with bullets. S1.0

NP0.5 VP0.6

DT1.0NN0.5

VBD1.0NP0.5

PP1.0

DT1.0 NN0.5

P1.0 NP0.3

NNS1.0

bullets

with

buildingthe

The gunman

sprayed

P (t1) = 1.0 * 0.5 * 1.0 * 0.5 * 0.6 * 0.4 * 1.0 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = 0.00225VP0.4

Page 14: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Another Parse t2

S1.0

NP0.5 VP0.4

DT1.0 NN0.5VBD1.0

NP0.5 PP1.0

DT1.0 NN0.5 P1.0 NP0.3

NNS1.

0bullets

withbuilding

the

Thegunman

sprayed

NP0.2

P (t2) = 1.0 * 0.5 * 1.0 * 0.5 * 0.4 * 1.0 * 0.2 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = 0.0015

The gunman sprayed the building with bullets.

Page 15: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Probability of a parse tree (cont.)S1,l

NP1,2 VP3,l

N2V3,3 PP4,l

P4,4 NP5,lw2

w4

DT1

w1 w3

w5 wl

P ( t|s ) = P (t | S1,l )= P ( NP1,2, DT1,1 , w1,

N2,2, w2,VP3,l, V3,3 , w3,

PP4,l, P4,4 , w4, NP5,l, w5…l | S1,l )

= P ( NP1,2 , VP3,l | S1,l) * P ( DT1,1 , N2,2 | NP1,2) * D(w1 | DT1,1) * P (w2 | N2,2) * P (V3,3, PP4,l | VP3,l) * P(w3 | V3,3) * P( P4,4, NP5,l | PP4,l ) * P(w4|P4,4) * P (w5…l | NP5,l)

(Using Chain Rule, Context Freeness and Ancestor Freeness )

Page 16: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

HMM ↔ PCFG O observed sequence ↔ w1m

sentence X state sequence ↔ t parse

tree model ↔ G grammar

Three fundamental questions

Page 17: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

HMM ↔ PCFG How likely is a certain observation given the

model? ↔ How likely is a sentence given the grammar?

How to choose a state sequence which best explains the observations? ↔ How to choose a parse which best supports the sentence?

arg max ( | , )X

P X O 1arg max ( | , )mt

P t w G↔

1( | )mP w G( | )P O ↔

Page 18: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

HMM ↔ PCFG

How to choose the model parameters that best explain the observed data? ↔ How to choose rule probabilities which maximize the probabilities of the observed sentences?arg max ( | )P O

1arg max ( | )m

GP w G↔

Page 19: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Recap of HMM

Page 20: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

HMM Definition Set of states: S where |S|=N Start state S0 /*P(S0)=1*/ Output Alphabet: O where |O|=M Transition Probabilities: A= {aij} /*state i

to state j*/ Emission Probabilities : B= {bj(ok)}

/*prob. of emitting or absorbing ok from state j*/

Initial State Probabilities: Π={p1,p2,p3,…pN}

Each pi=P(o0=ε,Si|S0)

Page 21: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Markov Processes Properties

Limited Horizon: Given previous t states, a state i, is independent of preceding 0 to t-k+1 states.

P(Xt=i|Xt-1, Xt-2 ,… X0) = P(Xt=i|Xt-1, Xt-2… Xt-

k) Order k Markov process

Time invariance: (shown for k=1) P(Xt=i|Xt-1=j) = P(X1=i|X0=j) …= P(Xn=i|

Xn-1=j)

Page 22: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Three basic problems (contd.) Problem 1: Likelihood of a

sequence Forward Procedure Backward Procedure

Problem 2: Best state sequence Viterbi Algorithm

Problem 3: Re-estimation Baum-Welch ( Forward-Backward

Algorithm )

Page 23: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Probabilistic Inference O: Observation Sequence S: State Sequence

Given O find S* where called Probabilistic Inference

Infer “Hidden” from “Observed” How is this inference different from logical

inference based on propositional or predicate calculus?

* arg max ( / )S

S p S O

Page 24: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Essentials of Hidden Markov Model

1. Markov + Naive Bayes

2. Uses both transition and observation probability

3. Effectively makes Hidden Markov Model a Finite State

Machine (FSM) with probability

1 1( ) ( / ) ( / )kOk k k k k kp S S p O S p S S

Page 25: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Probability of Observation Sequence

Without any restriction, Search space size= |S||O|

( ) ( , )

= ( ) ( / )S

S

p O p O S

p S p O S

Page 26: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Continuing with the Urn example

Urn 1# of Red = 30

# of Green = 50 # of Blue = 20

Urn 3# of Red =60

# of Green =10 # of Blue = 30

Urn 2# of Red = 10

# of Green = 40 # of Blue = 50

Colored Ball choosing

Page 27: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Example (contd.)U1 U2 U3

U1 0.1 0.4 0.5U2 0.6 0.2 0.2U3 0.3 0.4 0.3

Given :

Observation : RRGGBRGR

What is the corresponding state sequence ?

and

R G BU1 0.3 0.5 0.2U2 0.1 0.4 0.5U3 0.6 0.1 0.3

Transition Probability Observation/output Probability

Page 28: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Diagrammatic representation (1/2)

U1

U2

U3

0.1

0.20.4

0.6

0.4

0.5

0.3

0.2

0.3

R, 0.6

G, 0.1B, 0.3

R, 0.1

B, 0.5G, 0.4

B, 0.2R, 0.3 G, 0.5

Page 29: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Diagrammatic representation (2/2)

U1

U2

U3

R,0.02G,0.08B,0.10

R,0.24G,0.04B,0.12

R,0.06G,0.24B,0.30R, 0.08

G, 0.20B, 0.12

R,0.15G,0.25B,0.10

R,0.18G,0.03B,0.09

R,0.18G,0.03B,0.09

R,0.02G,0.08B,0.10

R,0.03G,0.05B,0.02

Page 30: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Probabilistic FSM

(a1:0.3)

(a2:0.4)

(a1:0.2)

(a2:0.3)

(a1:0.1)

(a2:0.2)

(a1:0.3)

(a2:0.2)

The question here is:“what is the most likely state sequence given the output sequence seen”

S1 S2

Page 31: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Developing the treeStart

S1 S2

S1 S2 S1 S2

S1 S2 S1 S2

1.0 0.0

0.1 0.3 0.2 0.3

1*0.1=0.1 0.3 0.0 0.0

0.1*0.2=0.02 0.1*0.4=0.04 0.3*0.3=0.09 0.3*0.2=0.06

. .

. .

a1

a2

Choose the winning sequence per state per iteration

0.2 0.4 0.3 0.2

Page 32: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Tree structure contd…S1 S2

S1 S2 S1 S2

0.1 0.3 0.2 0.3

0.027 0.012 . .

0.09 0.06

0.09*0.1=0.009 0.018

S1

0.3

0.0081

S2

0.2

0.0054

S2

0.4

0.0048

S1

0.2

0.0024

.

a1

a2

The problem being addressed by this tree is )|(maxarg* ,2121 aaaaSPSs

a1-a2-a1-a2 is the output sequence and μ the model or the machine

Page 33: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Viterbi Algorithm for the Urn problem (first two symbols)

S0

U1

U2

U3

0.50.3

0.2

U1

U2

U3

0.03

0.08

0.15

U1

U2

U3

U1

U2

U3

0.060.02

0.020.18

0.24

0.18

0.015

0.04 0.075* 0.018

0.006

0.006

0.036*

0.048*

0.036*: winner sequences

ε

R

Page 34: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Markov process of order>1 (say 2)

Same theory worksP(S).P(O|S)= P(O0|S0).P(S1|S0).

[P(O1|S1). P(S2|S1S0)].[P(O2|S2). P(S3|S2S1)]. [P(O3|S3).P(S4|S3S2)]. [P(O4|S4).P(S5|S4S3)]. [P(O5|S5).P(S6|S5S4)]. [P(O6|S6).P(S7|S6S5)]. [P(O7|S7).P(S8|S7S6)].[P(O8|S8).P(S9|S8S7)].

We introduce the states S0 and S9 as initial and final states respectively.

After S8 the next state is S9 with probability 1, i.e., P(S9|S8S7)=1

O0 is ε-transition

O0 O1 O2 O3 O4 O5 O6 O7 O8

Obs: ε R R G G B R G RState: S0 S1 S2 S3 S4 S5 S6 S7 S8

S9

Page 35: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Adjustments Transition probability table will have

tuples on rows and states on columns Output probability table will remain the

same In the Viterbi tree, the Markov process

will take effect from the 3rd input symbol (εRR)

There will be 27 leaves, out of which only 9 will remain

Sequences ending in same tuples will be compared Instead of U1, U2 and U3 U1U1, U1U2, U1U3, U2U1, U2U2,U2U3,

U3U1,U3U2,U3U3

Page 36: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Forward and Backward Probability Calculation

Page 37: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Forward probability F(k,i) Define F(k,i)= Probability of being

in state Si having seen o0o1o2…ok

F(k,i)=P(o0o1o2…ok , Si ) With m as the length of the

observed sequence P(observed

sequence)=P(o0o1o2..om) =Σp=0,N P(o0o1o2..om , Sp) =Σp=0,N F(m , p)

Page 38: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Forward probability (contd.)F(k , q)= P(o0o1o2..ok , Sq)= P(o0o1o2..ok , Sq)= P(o0o1o2..ok-1 , ok ,Sq)= Σp=0,N P(o0o1o2..ok-1 , Sp ,

ok ,Sq)= Σp=0,N P(o0o1o2..ok-1 , Sp ). P(om ,Sq|o0o1o2..ok-1 , Sp)= Σp=0,N F(k-1,p). P(ok ,Sq|Sp) = Σp=0,N F(k-1,p). P(Sp Sq)

ok

O0 O1 O2 O3 … Ok Ok+1 … Om-1 Om

S0 S1 S2 S3 … Sp Sq … Sm Sfinal

Page 39: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Backward probability B(k,i) Define B(k,i)= Probability of seeing

okok+1ok+2…om given that the state was Si

B(k,i)=P(okok+1ok+2…om \ Si ) With m as the length of the

observed sequence P(observed

sequence)=P(o0o1o2..om) = P(o0o1o2..om| S0) =B(0,0)

Page 40: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Backward probability (contd.)

B(k , p)= P(okok+1ok+2…om \ Sp)= P(ok+1ok+2…om , ok |Sp)= Σq=0,N P(ok+1ok+2…om , ok , Sq|

Sp)= Σq=0,N P(ok ,Sq|Sp) P(ok+1ok+2…

om|ok ,Sq ,Sp )= Σq=0,N P(ok+1ok+2…om|Sq ).

P(ok , Sq|Sp)= Σq=0,N B(k+1,q). P(Sp Sq)

ok

O0 O1 O2 O3 … Ok Ok+1 … Om-1 Om

S0 S1 S2 S3 … Sp Sq … Sm Sfinal

Page 41: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Back to PCFG

Page 42: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Interesting Probabilities

The gunman sprayed the building with bullets 1 2 3 4 5 6 7

(4,5) NP

N1

NP

(4,5)NP

What is the probability of having a NP at this position such that it will derive “the building” ? -

What is the probability of starting from N1 and deriving “The gunman sprayed”, a NP and “with bullets” ? -

Inside Probabilities

Outside Probabilities

Page 43: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Interesting Probabilities Random variables to be considered

The non-terminal being expanded. E.g., NP

The word-span covered by the non-terminal. E.g., (4,5) refers to words “the building”

While calculating probabilities, consider: The rule to be used for expansion :

E.g., NP DT NN The probabilities associated with the RHS

non-terminals : E.g., DT subtree’s inside/outside probabilities & NN subtree’s inside/outside probabilities

Page 44: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Outside Probability j(p,q) : The probability of beginning

with N1 & generating the non-terminal Nj

pq and all words outside wp..wq1( 1) ( 1)( , ) ( , , | )j

j p pq q mp q P w N w G

w1 ………wp-1 wp…wqwq+1 ……… wm

N1

Nj

Page 45: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Inside Probabilities j(p,q) : The probability of generating

the words wp..wq starting with the non-terminal Nj

pq.( , ) ( | , )jj pq pqp q P w N G

w1 ………wp-1 wp…wqwq+1 ……… wm

N1

Nj

Page 46: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Outside & Inside Probabilities: example

The gunman sprayed the building with bullets 1 2 3 4 5 6 7

4,5

(4,5) for "the building"

(The gunman sprayed, , with bullets | )NP

P NP G

N1

NP

4,5(4,5) for "the building" (the building | , )NP P NP G

Page 47: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Calculating Inside probabilities j(p,q)

Base case: ( , ) ( | , ) ( | )j jj k kk kk kk k P w N G P N w G

Base case is used for rules which derive the words or terminals directly E.g., Suppose Nj = NN is being considered & NN building is one of the rules with probability 0.5

5,5

5,5

(5,5) ( | , )

( | ) 0.5NN P building NN G

P NN building G

Page 48: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Induction Step: Assuming Grammar in Chomsky Normal Form

Induction step :

wp

Nj

Nr Ns

wdwd+

1

wq

1

,

( , ) ( | , )

( )*

( , )* ( 1, )

jj pq pq

qj r s

r s d p

r

s

p q P w N G

P N N N

p dd q

Consider different splits of the words - indicated by d E.g., the huge building

Consider different non-terminals to be used in the rule: NP DT NN, NP DT NNS are available options Consider summation over all these.

Split here for d=2 d=3

Page 49: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

The Bottom-Up Approach The idea of induction Consider “the gunman”

Base cases : Apply unary rulesDT the Prob = 1.0NN gunman Prob = 0.5

Induction : Prob that a NP covers these 2 words= P (NP DT NN) * P (DT deriving the word

“the”) * P (NN deriving the word “gunman”) = 0.5 * 1.0 * 0.5 = 0.25

The gunman

NP0.5

DT1.0 NN0.5

Page 50: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Parse Triangle

1 11 1 1 1 1( | ) ( | ) ( | , ) (1, )m m m mP w G P N w G P w N G m

• A parse triangle is constructed for calculating j(p,q)

• Probability of a sentence using j(p,q):

Page 51: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Parse TriangleThe

(1)gunman

(2)sprayed

(3)the

(4)building

(5) with

(6) bullets

(7)

1

2

3

4

5

6

7

• Fill diagonals with ( , )j k k

1.0DT

0.5NN

0.5NN

1.0NNS

1.0VBD

1.0DT

1.0P

Page 52: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Parse TriangleThe

(1)gunman

(2)sprayed

(3)the

(4)building

(5) with

(6) bullets

(7)123

4

5

6

7

• Calculate using induction formula

1.0DT 0.5NN

0.5NN

1.0NNS

1.0VBD

1.0DT

1.0P

1,2(1,2) (the gunman | , )

( )* (1,1)* (2,2) 0.5*1.0*0.5 0.25

NP

DT NN

P NP G

P NP DT NN

0.25NP

Page 53: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Example Parse t1

S1.0

NP0.5 VP0.6

DT1.0 NN0.5

VBD1.0 NP0.5

PP1.0

DT1.0 NN0.5

P1.0 NP0.3

NNS1.

0

bullets

with

building

the

Thegunman

sprayed

VP0.4

Rule used here is VP VP PP

The gunman sprayed the building with bullets.

Page 54: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Another Parse t2

S1.0

NP0.5 VP0.4

DT1.0 NN0.5VBD1.0

NP0.5 PP1.0

DT1.0 NN0.5 P1.0 NP0.3

NNS1.

0bullets

withbuilding

the

Thegunman

sprayed

NP0.2

Rule used here is VP VBD NP

The gunman sprayed the building with bullets.

Page 55: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Parse TriangleThe (1) gunman

(2)sprayed

(3)the

(4)building

(5) with

(6) bullets (7)

123

4

5

6

7

1.0DT 0.5NN

0.5NN

1.0NNS

1.0VBD

1.0DT

1.0P

3,7(3,7) (sprayed the building with bullets | , )

( )* (3,5)* (6,7) ( )* (3,3)* (4,7) 0.6*1.0*0.3 0.4*1.0*0.015 0.186

VP

VP PP

VBD NP

P VP G

P VP VP PPP VP VBD NP

0.25NP

0.25NP

0.3PP

1.0VP

0.015NP

0.186VP

0.0465S

Page 56: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Different Parses Consider

Different splitting points : E.g., 5th and 3rd position

Using different rules for VP expansion : E.g., VP VP PP, VP VBD NP

Different parses for the VP “sprayed the building with bullets” can be constructed this way.

Page 57: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Outside Probabilities j(p,q)

Base case:

Inductive step for calculating :

1(1, ) 1 for start symbol(1, ) 0 for 1j

mm j

wp

Nfpe

Njpq Ng

(q+1)

e

wqwq+

1

we

( , )f p e

wp-1w1 wmwe+1

N1

( 1, )g q e

( )f j gP N N N

( , )j p q

Summation over f, g & e

Page 58: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   17 th  March, 2011

Probability of a Sentence

• Joint probability of a sentence w1m and that there is a constituent spanning words wp to wq is given as:

1 1( , | ) ( | , ) ( , ) ( , )jm pq m pq j j

j j

P w N G P w N G p q p q

The gunman sprayed the building with bullets 1 2 3 4 5 6 7

N1

NP

4,5

4,5

(The gunman....bullets, | )

(The gunman...bullets | , )

(4,5) (4,5) (4,5) (4,5) ...

j

j

NP NP

VP VP

P N G

P N G