speech, nlp and the webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · inductive bias: what too...

87
Speech, NLP and the Web Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 7,9, 10: Theoretical Underpinnings- Maximum Likelihood and Maximum Entropy Principles (lecture 8 was on NLTK by Abhijit) 4 Aug, 2014 Pushpak Bhattacharyya: ML and ME 1

Upload: others

Post on 25-Mar-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Speech, NLP and the Web

Pushpak BhattacharyyaCSE Dept., IIT Bombay

Lecture 7,9, 10: Theoretical Underpinnings-Maximum Likelihood and Maximum Entropy

Principles(lecture 8 was on NLTK by Abhijit)

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 1

Page 2: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Fundamental principles of machine learning

Learning in vacuum is impossible-importance of prior knowledge

Inductive Bias: What too learn, in what form to learn are pre-decided

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 2

Page 3: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Structure learning and parameter learning

Structure- parts and their relationships

Parameter- probabilities

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 3

Page 4: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Example (1/2): transition table

^

NN

NV

.

^ N V O .

^ 0 0.6 0.2 0.2 0

N 0 0.1 0.4 0.3 0.2

V 0 0.3 0.1 0.3 0.3

O 0 0.3 0.2 0.3 0.2

. 1 0 0 0 0

This transition table will change from language to language due to language divergences.

Partial sequence graph

Page 5: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Example (2/2): Lexical Probability Table

Size of this table = # pos tags in tagset X vocabulary size

vocabulary size = # unique words in corpus

Є people laugh ... …

^ 1 0 0 ... 0

N 0 1x10-3 1x10-5 ... ...

V 0 1x10-6 1x10-3 ... ...

O 0 0 1x10-9 ... ...

. 1 0 0 0 0

Page 6: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Structure and parameter

N people N (1 x 10-3 ) x 0.1 N laugh N (1 x 10-5 ) x 0.1 N people V (1 x 10-3 ) x 0.4 N laugh V (1 x 10-5 ) x 0.4 …

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 6

Page 7: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

PCFG rules (structure + parameter)

S NP VP 1.0 NP DT NN 0.5 NP NNS 0.3 NP NP PP 0.2 PP P NP 1.0 VP VP PP 0.6 VP VBD NP 0.4

• DT the 1.0• NN gunman 0.5• NN building 0.5• VBD sprayed 1.0• NNS bullets 1.0• P with 1.0

29 July, 2014 Pushpak Bhattacharyya: Parsing 7

Page 8: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Expectation Maximization

One of the key ideas of Statistical AI, ML, NLP, CV

Iterative procedure Find Parameters Find hidden variables Maiximize data likelihood

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 8

Page 9: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

The coin tossing problem

Case of 1 coin: Suppose there are N tosses of a coin. NH = The number of Heads What is the probability of a head i.e. PH = ?

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 9

Page 10: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Observed variable

#Observation = N

N

xPTherefore

otherwiseheadaproducestossthewhenxwhere

xxxx

N

ii

H

i

N

1

321

,

,0,1

,, :X

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 10

Page 11: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Each observation is a Bernoulli’s Trial where

is the probability of success i.e., getting a head

is the probability of failure i.e., getting a tail

HP

HP1

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 11

Page 12: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Likelihood of X

• Likelihood of X, i.e., probability of Observation Sequence X is:

Each trial is identical and independent. Maximum Likelihood of data, requires

us to make and thus, get

the expression for PH

ii x-1H

N

1i

xHH )P -(1P )PL(X,

0HdP

dL

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 12

Page 13: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Mathematical Convenience

Take log of the likelihood.

Differentiating w.r.t. PH

To get the expression for , make

N

iHiHi PxPxXLL

1

)1log()1(log);(

H

iN

i H

i

Px

Px

dHdLL

1

11

HP 0HdP

dLL

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 13

Page 14: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Equating to 0, expression for PH

H

N

ii

H

N

ii

H P

x

PNx

P

111 1

1

N

xP

N

ii

H

1

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 14

Page 15: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Maximum Entropy

Suppose we do not know how to get the MLE, or the likelihood expression is impossible to get, then we use: Maximum Entropy. Example: In problems like co-reference

resolution.

Entropy= To be elaborated later.

)1log()1(log HHHH PPPP

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 15

Page 16: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Case for Expectation Maximization Instead of one coin we toss two coins.

Parameters <P, P1, P2> P = Probability of choosing first coin P1 = Probability of choosing head from first

coin P2 = Probability of choosing head from second

coin

We do not know which coin the observation came from

NxxxxX ,.....,,: 321

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 16

Page 17: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

EM continued..

Z1, Z2, Z3,…, ZN is the hidden sequence running alongside X1, X2, X3,…, XN

Where, Zi =1, if the ith observation came from coin 1, =0, otherwise

21

321

,,,....,,,

),,Pr();Pr(

PPPzzzzZ

ZXX

N

Z

NN zxzxzxzxY ,......,,,: 332211

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 17

Page 18: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Cntd.

We want to work with

Invoke convexity/concavity and expectation of Zi and work with log(Pr(Y;θ))

N

i

zxxzxx iiiiii PPPPPP

ZXPY

1

1122

111 ))1.().1((*))1.(.(

);,();Pr(

));,(log();( Z

ZXPXLL

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 18

Page 19: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

N

iiii PxPxPzEXLL

1

11 ))1log()1(log)(log([);(

))]1log()1(log)1))(log((1( 22 PxPxPziE ii

Log Likelihood of the Data

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 19

Page 20: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

IMPORTANT POINTS TO NOTE

Log moves inside the product term. Σ disappears giving rise to E(Zi) in place

of Zi

Differentiate wrt p, p1, p2, equate to 0 and get the results

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 20

Page 21: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

P, P1, P2

)()(

1

11

i

Ni

iiNi

zExzE

p

)()(

1

12

i

Ni

iiNi

zENxzEM

p

M= observed no. of heads

NzE

p iNi )(1

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 21

)1(22

)1(11

)1(11

)1()1()1()1(

)(/)1|().1()|1()(

iiii

ii

xxxx

xxiiiiii

i

PPPPPPPPP

xxPzxxPzPxxzPzE

Page 22: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Another application of EM

WSD

Mitesh Khapra, Salil Joshi and PushpakBhattacharyya, It takes two to Tango: A Bilingual Unsupervised Approach for estimating Sense Distributions using Expectation Maximization, 5th International Joint Conference on Natural Language Processing (IJCNLP 2011), Chiang Mai, Thailand, November 2011.

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 22

Page 23: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Definition: WSD

Given a context: Get “meaning” of

a set of words (targeted wsd) or all words (all words wsd)

The “Meaning” is usually given by the id of senses in a sense repository usually the wordnet

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 23

Page 24: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Example: “operation” (from Princeton Wordnet) Operation, surgery, surgical operation, surgical procedure, surgical

process -- (a medical procedure involving an incision with instruments; performed to repair damage or arrest disease in a living body; "they will schedule the operation as soon as an operating room is available"; "he died while undergoing surgery") TOPIC->(noun) surgery#1

Operation, military operation -- (activity by a military or naval force (as a maneuver or campaign); "it was a joint operation of the navy and air force") TOPIC->(noun) military#1, armed forces#1, armed services#1, military machine#1, war machine#1

mathematical process, mathematical operation, operation --((mathematics) calculation by mathematical methods; "the problems at the end of the chapter demonstrated the mathematical processes involved in the derivation"; "they were learning the basic operations of arithmetic") TOPIC->(noun) mathematics#1, math#1, maths#1

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 24

Page 25: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Hindi Wordnet

Dravidian Language Wordnet

North East Language Wordnet

Marathi Wordnet

Sanskrit Wordnet

EnglishWordnet

Bengali Wordnet

Punjabi Wordnet

KonkaniWordnet

UrduWordnet

WSD for ALL Indian languages: Critical resource: INDOWORDNET

Gujarati Wordnet

Oriya Wordnet

Kashmiri Wordnet

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 25

Page 26: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Synset Based Multilingual Dictionary

Expansion approach for creating wordnets [Mohanty et. al., 2008]

Instead of creating from scratch link to the synsets of existing wordnet

Relations get borrowed from existing wordnet

S1

S3 S4

S6

S5

S7

S2

S1

S3 S4

S6

S5

S7

S2

S1

S3 S4

S6

S5

S7

S2 A sample entry from the MultiDict

Hindi Marathi

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 26

Page 27: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Hypothesis

Sense distributions across languages is invariant!! Proportion of times a sense appears in a

language is uniform across languages!

E.g., proportion of times the sense of “sun” appears in any language through “sun” and its synonyms remains the same!

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 27

Page 28: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

ESTIMATING SENSE DISTRIBUTIONS

If sense tagged Marathi corpus were available, we could have estimated

But such a corpus is not available

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 28

Page 29: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

EM for estimating sense distributions

Problem: ‘galaa’ itself is ambiguous Its raw count cannot be used as it

is

Solution: Its count should be weighted by

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 29

Page 30: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Word correspondencesSense inEnglish

Smar

(Marathisensenumber)

wordsmar

(partial list)Shin=π(Smar)(projectedHindi sensenumber)

wordsmar

(partial listof words inprojectedHindisense)

Neck 1 maan, greeva 1 gardan, galaa

Respect 2 maan,satkaar,sanmaan

3 izzat, aadar

Voice 3 awaaz, swar 2 galaa

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 30

Page 31: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

EM for estimating sense distributions

M-Step

E-Step

)().#|()().#|()().#|()().#|()().#|()().#|(

)|(

1111

11

1

swarswarSPawaajawaajSPgreevagreevaSPmaanmaanSPgreevagreevaSPmaanmaanSP

galaSP

marmarmarmar

marmar

hin

)().#|()().#|()().#|()().#|()().#|()().#|(

)|(

2211

11

1

izzatizzatSPaadaraadarSPgalagalaSPgardangardanSPgalagalaSPgardangardanSP

maanSP

hinhinhinhin

hinhin

mar

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 31

Page 32: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

General Algo

stepExxSP

vvSPuSP L

jLSxS

LiL

SvLi

LjL

Lj

LiL

)2()()).#|((

)().#|)(()|(

1

21

21

1

21

21

)(

)(

)(

)3()()).#|((

)().#|)(()|(

1

2

2

2

12

22

2

11

22

)(

)(

LiL

Lk

LmL

SyS

LkL

SvLk

SSwhere

stepMyySP

vvSPvSP

LmL

Lm

LiL

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 32

Page 33: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Results Algorithm

MarathiP % R % F %

IWSD (training onself corpora; noparameterprojection) 81.29 80.42 80.85

IWSD (training onHindi and projectingparameters forMarathi) 73.45 70.33 71.86

EM (no sensecorpora in eitherHindi or Marathi) 68.57 67.93 68.25

Wordnet Baseline 58.07 58.07 58.07

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 33

Page 34: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Results & Discussions

Performance of projection using manual cross linkages is within 7% of Self-Training

Performance of projection using probabilistic cross linkages is within 10-12% of Self-Training – remarkable since no additional cost incurred in target language

Both MCL and PCL give 10-14% improvement over Wordnet First Sense Baseline

Not prudent to stick to knowledge based and unsupervised approaches –they come nowhere close to MCL or PCL

Manual Cross LinkagesProbabilistic Cross LinkagesSkyline - self training data is available

Wordnet first sense baseline

S-O-T-A Knowledge Based ApproachS-O-T-A Unsupervised Approach

Our values

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 34

Page 35: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Convexity

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 35

Page 36: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Motivation: argmaxcomputation

Statistical Spell Checking Automatic Speech Recognition Part of Speech Tagging Probabilistic Parsing Statistical Machine Translation

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 36

Page 37: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Some general observations

A*= argmax [P(A|B)]A

= argmax [P(A).P(B|A)]A

Computing and using P(A) and P(B|A), both need(i) looking at the internal structures of A and B(ii) making independence assumptions(iii) putting together a computation from smaller parts

Page 38: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Problem 1: Spell checker: apply Bayes Rule

W*= argmax [P(W|T)]= argmax [P(W).P(T|W)]

W=correct word, T=misspelt word Why apply Bayes rule?

Finding p(w|t) vs. p(t|w) ? Assumptions :

t is obtained from w by a single error. The words consist of only alphabets(Jurafsky and Martin, Speech and NLP, 2000)

Page 39: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Problem-2: Isolated word recognition

Problem Definition : Given a sequence of speech signals, identify the words.

2 steps : Segmentation (Word Boundary Detection) Identify the word

Isolated Word Recognition : Identify W given SS (speech signal)

^arg max ( | )

WW P W SS

Page 40: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Problem-3: Statistical MT “Find the English translation e corresponding to a

given Foreign sentence f”

Thus, we seek ebest such that

ebest = argmaxe P(e |f ) = argmaxe [P(e) * P(f |e)]

Language Model – P(e)Translation Model – P(f |e)

Translations are produced on the basis of statistical model

Parameters are estimated using bilingual parallel corpora

Page 41: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Convexity: utility

Jensen’s inequality

Kullback–Leibler distance/divergence

EM formulation

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 41

Page 42: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

)( 1xf

)( 2xf

)()1()( 21 xfxf

))1(( 21 xxf

21 )1( xxz

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 42

Page 43: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Criteria for convexity

A function f(x) is said to be convex inthe interval [a,b] iff

)()1()())1(( 2121 xfxfxxf

],[,

21

21

baxxxx

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 43

Page 44: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Jensen’s inequality

For any convex function f(x)

n

iii

n

iii xfxf

11)()(

Where 11

n

ii and 10, ii

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 44

Page 45: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Proof of Jensen´s inequality

Method:- By induction on N Base case:-

ally truef(x),trivif(x)λλ

λf(x)x)f(λN

i

11 where.

1

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 45

Page 46: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Another base case

N = 2

convex is f(x) since )()1()(1 since ))1((

)(

2111

212111

2211

xfxfxxf

xxf

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 46

Page 47: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Hypothesis

n

iii

n

iii xfxf

kn

11

)()( i.e

for trueSuppose

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 47

Page 48: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Induction Step

1 given

)()(

thatShow

1321

1

1

1

1

kk

k

iii

k

iii xfxf

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 48

Page 49: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Proof

)1( where )()()1(

convexityBy )())1(

()1(

))1(

)1((

)(

111

11

111 1

1

111 1

1

11332211

k

iikk

k

iiik

kk

k

i k

iik

kk

k

i k

iik

kk

xfxf

xfxf

xxf

xxxxf

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 49

Page 50: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Continued...

Examine µis

1 because

1)1()1(

)1(

)1()1()1()1(

1321

1

1

1

321

11

3

1

2

1

1

3211

kk

k

k

k

k

k

k

kkk

k

k

ii

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 50

Page 51: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Continued...

Therefore,

proved is inequality Jensen´s Thus

)()(

stepinduction at theFinally

)()(

)()()1(

)()()1(

1

1

1

1

111

111

1

111

1

i

k

ii

k

iii

kki

k

ii

kki

k

iik

kk

k

iiik

xfxf

xfxf

xfxf

xfxf

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 51

Page 52: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

KL -divergence

We will do the discrete form of probability distribution.

Given two probability distribution P,Q on the random variable

X : x1,x2,x3...xN

P:p1=p(x1 ), p2=p(x2), ... pn=p(xn) Q:q1=q(x1 ), q2=q(x2), ... qn=q(xn)

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 52

Page 53: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

KLD definition

Q)(EP)(E DKL(P,Q)

DD

q,p qpp D KL(P,Q)

pp

iii

iN

ii

loglog

as written also0 and cassymmetri is

11log1

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 53

Page 54: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Proof: KLD>=0

)x(pxp

],[x pqp

qpp

qpp KL(P,Q)

i

N

iii

N

ii

i

iN

ii

i

iN

ii

i

iN

ii

loglog So

0in convex islog

loglog

-:Proof

0log

11

11

1

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 54

Page 55: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Proof cntd.

Apply Jensen’s inequality

10log

loglog

loglog So

11

11

11

N

ii

i

iN

ii

i

iN

ii

N

ii

i

iN

ii

i

iN

ii

q qpp

qppq

)pq(p

pqp

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 55

Page 56: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Convexity of –log x

1 1)1(

1)1(

1)1(

)1(

log)1(log))1(log(..

)log)(1()log())1(log(

2

1

1

2

2

1

1

112

2

1

12121

2121

2121

1

1

1

xxy

yy

xx

xx

xx

xx

xxxx

xxxxei

xxxx

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 56

Page 57: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Interesting problem

Try to prove:-

21 2121

21

2211 ww ww xxww

xwxw

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 57

Page 58: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

2nd definition of convexity

Theorem:

.convex is log So.in convex is then ,0

and in abledifferenti twiceis )( If

x-[a,b]f(x)[a,b] x (x)f

[a,b]xf''

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 58

Page 59: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Lemma 1

],[ s t,and s t, ),()(then ],[in 0)( If

''

''

batssftfbaxf

a s z t b

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 59

Page 60: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Mean Value Theorem

npm(p) m)f(nf(m)f(n)xf

(z,a)s (s) a)f(zf(a)f(z)

'

'

where)(function any For

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 60

Page 61: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Alternative form of z

Add –λz to both sides

21 1 λ)x(λxz

)xλ(zz)λ)(x(λ)x(z)λ(xλ)z(

12

21

111

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 61

Page 62: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Alternative form of convexity

Add –λf(z) to both sides

)λ)f(x()λf(x)λ)x(f(λ( 2121 11

)λ)f(x(f(z)))λ(f(xλ)f(z)()λ)f(x(f(z)))λ(f(xλ)f(z)(

λf(z))λ)f(x()λf(xλf(z)f(z)

21

21

21

1111

1

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 62

Page 63: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Proof: second derivative >=0 implies convexity (1/2)We have that,

(2) ][z]-)[x-(1(1) )]()([)]()()[1(

)()1()()(

)1(

12

12

21

21

xzxfzfzfxf

xfxfzf

xxz

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 63

Page 64: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Second derivative >=0 implies convexity (2/2)

(2) Is equivalent to

For some s and t , where

Now since f’’(x) >=0

)(')(' sftf

Combining this with (1), the result is proved

))(()).(()1( 12 xzsfxtf

21 xtzsx

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 64

Page 65: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Why all this In EM, we maximize the expectation of

log likelihood of the data Log is a concave function We have to take iterative steps to get

to the maximum There are two unknown values: Z

(unobserved data) and θ (parameters) From θ, get new value of Z (E-step) From Z, get new value of θ (M-step)4 Aug, 2014

Pushpak Bhattacharyya: ML and ME 65

Page 66: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Recap: a simple EM situation Toss of two coins:

Parameters <P, P1, P2> P = Probability of choosing first coin P1 = Probability of choosing head from first

coin P2 = Probability of choosing head from second

coin

We do not know which coin the observation came from

NxxxxX ,.....,,: 321

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 66

Page 67: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

EM continued..

Z1, Z2, Z3,…, ZN is the hidden sequence running alongside X1, X2, X3,…, XN

Where, Zi =1, if the ith observation came from coin 1, =0, otherwise

21

321

,,,....,,,

),,Pr();Pr(

PPPzzzzZ

ZXX

N

Z

NN zxzxzxzxY ,......,,,: 332211

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 67

Page 68: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Cntd.

We want to work with

Invoke convexity/concavity and expectation of Zi and work with log(Pr(Y;θ))

N

i

zxxzxx iiiiii PPPPPP

ZXPY

1

1122

111 ))1.().1((*))1.(.(

);,();Pr(

));,(log();( Z

ZXPXLL

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 68

Page 69: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

N

iiii PxPxPzEXLL

1

11 ))1log()1(log)(log([);(

))]1log()1(log)1))(log((1( 22 PxPxPziE ii

Log Likelihood of the Data

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 69

Page 70: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

IMPORTANT POINTS TO NOTE

Log moves inside the product term. Σ disappears giving rise to E(Zi) in place

of Zi

Differentiate wrt p, p1, p2, equate to 0 and get the results

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 70

Page 71: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

P, P1, P2

)()(

1

11

i

Ni

iiNi

zExzE

p

)()(

1

12

i

Ni

iiNi

zENxzEM

p

M= observed no. of heads

NzE

p iNi )(1

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 71

)1(22

)1(11

)1(11

)1()1()1()1(

)(/)1|().1()|1()(

iiii

ii

xxxx

xxiiiiii

i

PPPPPPPPP

xxPzxxPzPxxzPzE

Page 72: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

How to find θ How to choose the next θ? Take

Where,X: observed dataZ: unobserved dataΘ: parameterLL(X,Z:θn): log likelihood of complete

data with parameter value at θn

This is in lieu of, for example, gradient ascent

θnΘ

At every step LL(.) willIncrease, ultimatelyreaching local/globalmaximum

)):,():,((maxarg nZXLLZXLL

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 72

Page 73: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Why expectation of log likelihood? (1/3) P(X:θ) is the observation likelihood

Deal with P(X,Z:θ), marginalized over Z

Log(ΣZP(X,Z:θ)) is mathematically processed with multiplying by P(Z|X: θn) which for each Z is between 0 and 1 and sums to 1

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 73

Page 74: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Why expectation of log likelihood? (2/3) Then Jensen inequality will give

));|(

);,(log();|(

at y probabilit theis );|( where ),;|(by devide andmultiply

));|(

);,();|(log());,(log(

nzn

nn

n

z n

n

z

XZPZXPXZP

XZPXZP

XZPZXPXZPZXP

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 74

Page 75: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Why expectation of log likelihood? (3/3)

Z. w.r.t.data complete of liklihood log ofn expectatio theis (.) where))),;,((log(

));,(log();|(

));();((maxarg So,

));,();,(log();|(

1);|( since

));().;|(

);,(log();|(

));(log());|(

);,();|(log(

));(log());,(log();();(

zz

Zn

n

nZn

Zn

nnZn

nZ n

n

nZ

n

EZXPE

ZXPXZP

XLLXLLZXPZXPXZP

XZPXPXZP

ZXPXZP

XPXZP

ZXPXZP

XPZXPXLLXLL

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 75

Page 76: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Why expectation of Z?

If the log likelihood is a linear function of Z, then the expectation can be carried inside of the log likelihood and E(Z) is computed

The above is true when the hidden variables form a mixture of distributions (e..g, in tosses of two coins), and

Each distribution is an exponential distribution like multinomial/normal/poisson

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 76

Page 77: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Application of EM: HMM Training

Baum Welch or Forward Backward Algorithm

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 77

Page 78: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

A problem scenario

Unsupervised POS tagging Convert the Brown corpus into a corpus

with ONLY the following tags: N (noun), V (verb), J (adjective), R

(adverb), F (function words like prepositions and conjunctions), A (articles ‘a’, ‘an’, ‘the’) and O (others)

Assumes raw corpus and then create a POS tagger

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 78

Page 79: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Key Intuition

Given: Training sequenceInitialization: Probability valuesCompute: Pr (state seq | training seq)

get expected count of transitioncompute rule probabilities

Approach: Initialize the probabilities and recompute them… EM like approach

a

b

a

b

a

b

a

b

q r

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 79

Page 80: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Baum-Welch algorithm: counts

String = abb aaa bbb aaa

Sequence of states with respect to input symbols

a, b

a,b

q ra,b

rqrqqqrqrqqrq aaabbbaaabba o/p seq

State seq

a,b

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 80

Page 81: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Calculating probabilities from tableTable of counts

T=#statesA=#alphabet symbols

Now if we have a non-deterministic transitions then multiple state seq possible for the given o/p seq (ref. to previous slide’s feature). Our aim is to find expected count through this.

8/3)( bqP b

Src Dest O/P Count

q r a 5

q q b 3

r q a 3

r q b 2

8/5)( rqP a

T

l

A

m

li

jiji

swsc

swscswsPm

kk

1 1)(

)()(

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 81

Page 82: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Interplay Between Two Equations

T

l

A

m

lWmi

jWijWi

ssc

sscssPk

k

0 0

)(

)()(

1,0

),,()|()(

,01,0,01,0n

k

k

snn

jWinn

jWi

wSssnWSPssC

wk

No. of times the transitions sisj occurs in the string4 Aug, 2014

Pushpak Bhattacharyya: ML and ME 82

Page 83: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Illustration

a:0.67

b:1.0

b:0.17

a:0.16

q r

a:0.04

b:1.0

b:0.48

a:0.48

q r

Actual (Desired) HMM

Initial guess

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 83

Page 84: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

One run of Baum-Welch algorithm: string ababb

P(path)

q r q r q q 0.00077 0.00154 0.00154 0 0.00077

q r q q q q 0.00442 0.00442 0.00442 0.00442

0.00884

q q q r q q 0.00442 0.00442 0.00442 0.00442

0.00884

q q q q q q 0.02548 0.0 0.000 0.05096

0.07644

Rounded Total 0.035 0.01 0.01 0.06 0.095

New Probabilities (P) 0.06=(0.01/(0.01+0.06+

0.095)

1.0 0.36 0.581

qbq qaq raq qbr a ba ab bb bba

* ε is considered as starting and ending symbol of the input sequence string.

State sequences

Through multiple iterations the probability values will converge.4 Aug, 2014

Pushpak Bhattacharyya: ML and ME 84

Page 85: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Computational part (1/2)

ntn

jtkt

it

n

nt snn

jtkt

it

n

snn

jWinn

n

snn

jWinn

jWi

WsSwWsSPWP

WSsSwWsSPWP

WSssnWSPWP

WSssnWSPssC

n

n

k

n

kk

,0,01

,0

,0,01,01

,0

,01,0,01,0,0

,01,0,01,0

)],,,([)(

1

)],,,,([)(

1

)],,(),([)(

1

)],,()|([)(

1,0

1,0

1,0

w0 w1 w2 wk wn-1 wn

S0 S1 S1 … Si Sj … Sn-1 Sn Sn+1

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 85

Page 86: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Computational part (2/2)

),1()(),1(

),1()|,(),1(

),1()|,(),1(

)|(),|,(),(

),,,,(

),,,(

0

0

1

0

1

1

0,11,011,0

0,111,0

0,01

jtBswsPitF

jtBsSwWsSPitF

jtBsSwWsSPitF

sSWPsSWwWsSPsSWP

WwWsSsSWP

WwWsSsSP

n

t

ji

n

t

itkt

jt

n

t

itkt

jt

jt

n

tnt

ittkt

jt

itt

n

tntkt

jt

itt

n

tnkt

jt

it

k

w0 w1 w2 wk wn-1 wn

S0 S1 S1 … Si Sj … Sn-1 Sn Sn+1

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 86

Page 87: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri

Discussions1. Symmetry breaking:

Example: Symmetry breaking leads to no change in initial values

2 Struck in Local maxima3. Label bias problem

Probabilities have to sum to 1.Values can rise at the cost of fall of values for others.

s

ss

b:1.0

b:0.5

a:0.5

a:1.0

s

ss

a:0.5

b:0.5

a:0.25

a:0.5b:0.5

a:0.25

b:0.25

b:0.5

Desired Initialized

4 Aug, 2014Pushpak Bhattacharyya: ML and

ME 87