generative models rong jin. statistical inference training exampleslearning a statistical model ...

76
Generative Models Rong Jin

Post on 21-Dec-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Generative Models

Rong Jin

Page 2: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Statistical Inference

Training Examples

1 2{ , ,..., }nx x x

Learning a Statistical Model

Prediction

p(x;)

1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.10

1

2

3

4

5

6

7

8

9

10

Heigth

Num

ber

of P

eopl

e Female: Gaussian distribution N(1,1)

Male: Gaussian distribution N(2,2)

Pr(male|1.67m)

Pr(female|1.67m)

Page 3: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Statistical Inference

Training Examples

1 2{ , ,..., }nx x x

Learning a Statistical Model

Prediction

p(y|x;)

1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.10

1

2

3

4

5

6

7

8

9

10

Heigth

Num

ber

of P

eopl

e Male: Gaussian distribution N(1,1)

Female: Gaussian distribution N(2,2)

Pr(male|1.67m)

Pr(female|1.67m)

Page 4: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Probabilistic Models for Classification Problems Apply statistical inference methods Given training example Assume a parametric model Learn the model parameters from training example

using maximum likelihood approach The class of a new instance is predicted by

1

,n

i i ix y

( | ; )p y x

* arg max ( | ; )y

y p y x

Y

x

Page 5: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Probabilistic Models for Classification Problems Apply statistical inference methods Given training example Assume a parametric model Learn the model parameters from training example

using maximum likelihood approach The class of a new instance is predicted by

1

,n

i i ix y

( | ; )p y x

* arg max ( | ; )y

y p y x

Y

x

Page 6: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Probabilistic Models for Classification Problems Apply statistical inference methods Given training example Assume a parametric model Learn the model parameters from training example

using maximum likelihood approach The class of a new instance is predicted by

1

,n

i i ix y

( | ; )p y x

* arg max ( | ; )y

y p y x

Y

x

Page 7: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Probabilistic Models for Classification Problems Apply statistical inference methods Given training example Assume a parametric model Learn the model parameters from training example

using the maximum likelihood approach The class of a new instance is predicted by

1

,n

i i ix y

( | ; )p y x

* arg max ( | ; )y

y p y x

Y

x

Page 8: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Maximum Likelihood Estimation (MLE) Given training example Compute log-likelihood of data

Find the parameters that maximizes the log-likelihood

In many case, the expression for log-likelihood is not closed form and therefore MLE requires numerical calculation

1( ) log ( | ; )

ntrain i ii

l D p y x

*1

max ( ) log ( | ; )n

train i iil D p y x

1 1 2 2, , , ,..., ,n nx y x y x y

Page 9: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Maximum Likelihood Estimation (MLE) Given training example Compute log-likelihood of data

Find the parameters that maximizes the log-likelihood

In many case, the expression for log-likelihood is not closed form and therefore MLE requires numerical calculation

1( ) log ( | ; )

ntrain i ii

l D p y x

*1

max ( ) log ( | ; )n

train i iil D p y x

1 1 2 2, , , ,..., ,n nx y x y x y

Page 10: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Probabilistic Models for Classification Problems Apply statistical inference methods Given training example Assume a parametric model Learn the model parameters from training example

using the maximum likelihood approach The class of a new instance is predicted by

1

,n

i i ix y

( | ; )p y x

* arg max ( | ; )y

y p y x

Y

x

Page 11: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Generative Models Most probabilistic distributions are joint distribution (i.e.,

p(x;)), not conditional distribution (i.e., p(y|x;))

Using Bayes rule

p(xly;) { p(y|x;); p(y;)}

( ; ) ( | ; )( | ; )

( , ; )

p y p x yp y x

p y x

Page 12: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Generative Models Most probabilistic distributions are joint distribution (i.e.,

p(x;)), not conditional distribution (i.e., p(y|x;))

Using Bayes rule

p(y|x;) { p(x|y;); p(y;)}

( ; ) ( | ; )( | ; )

( ; )

p y p x yp y x

p x

Page 13: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Generative Models (cont’d) Treatment of p(x|y;) Let yY={1, 2, …, c} Allocate a separate set of parameters for each class

{1, 2,…, c}

p(xly;) p(x;y) Data in different class have different input patterns

Page 14: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Generative Models (cont’d) Parameter space

Parameters for distribution: {1, 2,…, c}

Class priors: {p(y=1), p(y=2), …, p(y=c)} Learn parameters from training examples using MLE

Compute log-likelihood

Search for the optimal parameters by maximizing the log-likelihood

1

1

( ) log ( | ; )

log ( | ) log ( ) log ( | )i i

ntrain i ii

ni y i i yi

l D p y x

p x p y p x

1max ( ) max log ( ) ( | )

i

ntrain i i yi

l D p y p x

Page 15: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Generative Models (cont’d) Parameter space

Parameters for distribution: {1, 2,…, c}

Class priors: {p(y=1), p(y=2), …, p(y=c)} Learn parameters from training examples using MLE

Compute log-likelihood

Search for the optimal parameters by maximizing the log-likelihood

1

1

( ) log ( | ; )

log ( ; ) log ( ) log ( )i

ntrain i ii

ni y i ii

l D p y x

p x p y p x

1max ( ) max log ( ) ( ; )

i

ntrain i i yi

l D p y p x

( ; ) ( | ; )( | ; )

( ; )

p y p x yp y x

p x

Page 16: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Generative Models (cont’d) Parameter space

Parameters for distribution: {1, 2,…, c}

Class priors: {p(y=1), p(y=2), …, p(y=c)} Learn parameters from training examples using MLE

Compute log-likelihood

Search for the optimal parameters by maximizing the log-likelihood

1

1

( ) log ( | ; )

log ( ; ) log ( ) log ( )i

ntrain i ii

ni y i ii

l D p y x

p x p y p x

1max ( ) max log ( ) ( ; )

i

ntrain i i yi

l D p y p x

( ; ) ( | ; )( | ; )

( ; )

p y p x yp y x

p x

Page 17: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Generative Models (cont’d) Parameter space

Parameters for distribution: {1, 2,…, c}

Class priors: {p(y=1), p(y=2), …, p(y=c)} Learn parameters from training examples using MLE

Compute log-likelihood

Search for the optimal parameters by maximizing the log-likelihood

1

1

( ) log ( | ; )

log ( ; ) log ( ) log ( )i

ntrain i ii

ni y i ii

l D p y x

p x p y p x

1max ( ) max log ( ) ( ; )

i

ntrain i i yi

l D p y p x

Page 18: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Example

• Task: predict gender of individuals based on their heights

• Given

• 100 height examples of women

• 100 height examples of man

• Assume height of women and man follow different Gaussian distributions

1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 20

5

10

15

20

25

30

35

40

Empirical data for male

Empirical data for female

Page 19: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Example (cont’d) Gaussian distribution

Parameter space Gaussian distribution for man: (m m)

Gaussian distribution for man: (w w)

Class priors: pm = p(y=man), pw = p(y=women)

1max ( ) max log ( ) ( | )

i

ntrain i i yi

l D p y p x

2

22

( )1( ) exp ,

22

xp x

Page 20: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Example (cont’d) Gaussian distribution

Parameter space Gaussian distribution for male: (m, m)

Gaussian distribution for female: (f , f)

Class priors: pm = p(y=male), pf = p(y=female)

2

22

( )1( ) exp ,

22

xp x

Page 21: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Example (cont’d)

1

1 1

2

2

2

1 2 1 2

log ( | )

log ( ; , ) log log ( ; , ) log

exp2

log2

Given training examples , ,..., ; , ,...,

m female

m f

m f

Nii

N N fmi m m m i f f fi i

mi m

m

m

f f fm m mN N

N N N

l p h y

p h p p h p

h

h h h h h h

2

2

1 1 2

exp2

log log log2

male male

fi f

fN N

m fi if

h

p p

Page 22: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Example (cont’d)

1

1 1

2

2

1 2 1 2

log ( | ) log ( )

log ( ; , ) log log ( ; , ) log

exp2

log2

Given training examples , ,..., ; , ,...,

m f

m f

m f

Ni i ii

N N fmi m m m i f f fi i

mi m

m

f f fm m mN N

N N N

l p h y p y

p h p p h p

h

h h h h h h

2

2

1 12 2

exp2

log log log2

male male

fi f

fN N

m fi im f

h

p p

Page 23: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Example (cont’d)

1

1 1

2

2

1 2 1 2

log ( | ) log ( )

log ( ; , ) log log ( ; , ) log

exp2

log2

Given training examples , ,..., ; , ,...,

m f

m f

m f

Ni i ii

N N fmi m m m i f f fi i

mi m

m

f f fm m mN N

N N N

l p h y p y

p h p p h p

h

h h h h h h

2

2

1 12 2

exp2

log log log2

m f

fi f

fN N

m fi im f

h

p p

Page 24: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Learn a Gaussian generative model

Example (cont’d)

*

221 1

221 1

, , ; , , max

( ), ,

( ), ,

m m

f f

m m m f f f

N Nm mi i mi i m

m m mm m

N Nf fi i f fi i

f f ff f

p p l

h h Np

N N N

h h Np

N N N

Page 25: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Learn a Gaussian generative model

Example (cont’d)

*

221 1

221 1

, , ; , , max

( ), ,

( ), ,

m m

f f

m m m f f f

N Nm mi i mi i m

m m mm m

N Nf fi i f fi i

f f ff f

p p l

h h Np

N N N

h h Np

N N N

Page 26: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Example (cont’d)

1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 20

5

10

15

20

25

30

35

40

Empirical data for maleFitted distributionfor maleEmpirical data for femaleFitted distribution for female

Page 27: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Predict the gender of an individual given his/her height

Example (cont’d)

2

22

2

22

( )( | ) ( | , ) exp

22

( )( | ) ( | , ) exp

22

m mm m m

mm

f ff f f

ff

p xp male h p p h

p xp female h p p h

Page 28: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 20

5

10

15

20

25

30

35

40

Empirical data for maleFitted distributionfor maleEmpirical data for femaleFitted distribution for female

Decision boundary Decision boundary h*

Predict female when h<h* Predict male when h>h* Random when h=h*

Where is the decision boundary?

It depends on the ratio pm/pf

h*

Page 29: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 20

5

10

15

20

25

30

35

40

Empirical data for maleFitted distributionfor maleEmpirical data for femaleFitted distribution for female

Example Decision boundary h*

Predict female when h<h* Predict male when h>h* Random when h=h*

Where is the decision boundary?

It depends on the ratio pm/pf

pf< pmpf> pm

Page 30: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 20

5

10

15

20

25

30

35

40

Empirical data for maleFitted distributionfor maleEmpirical data for femaleFitted distribution for female

Example Decision boundary h*

Predict female when h<h* Predict male when h>h* Random when h=h*

Where is the decision boundary?

It depends on the ratio pm/pf

pf< pmpf> pm

Page 31: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Gaussian Generative Model (II) Inputs contain multiple features Example

Task: predict if an individual is overweight based on his/her salary and the number of hours on watching TV

Input: (s: salary, h: hours for watching TV) Output: +1 (overweight), -1 (normal)

1 2, ,..., dx x x x

Page 32: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Multi-variate Gaussian Distribution

1/ 2 1/ 2

1 2

1 21

1,1 1,

,

,1 ,

, ,

1 1( ; , ) exp

22 | |

Input : , ,...,

1mean : , ,...,

variance matrix :

1

Ty y d

d

N

d ik

d

i j d d

d d d

i j i i j j k i i k

p x x x

x x x x

xN

E x x x x x x xN

,1

1

1

N

j jk

N T

k

x

x x x xN

Page 33: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Multi-variate Gaussian Distribution

1/ 2 1/ 2

1 2

1 21

1,1 1,

,

,1 ,

,

1 1( ; , ) exp

22 | |

Input : , ,...,

1mean : , ,...,

covariance matrix :

1

Ty y d

d

N

d ik

d

i j d d

d d d

i i j j i ii j k

p x x x

x x x x

xN

E x x x x x xN

1

1

1

Nj j

kk

N T

k kk

x x

x x x xN

Page 34: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Multi-variate Gaussian Distribution

1/ 2 1/ 2

1 2

1 21

1,1 1,

,

,1 ,

,

1 1( ; , ) exp

22 | |

Input : , ,...,

1mean : , ,...,

covariance matrix :

1

Ty y d

d

N

d ik

d

i j d d

d d d

i i j j i ii j k

p x x x

x x x x

xN

E x x x x x xN

1

1

1

Nj j

kk

NT

k kk

x x

x xN

Page 35: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Properties of Covariance Matrix

What if the number of data points N < d? How about for any vector ?

Positive semi-definitive matrix

1 21

1, , ,...,

NT

k k dk

x x x x x xN

Ta a

a

Page 36: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Properties of Covariance Matrix

What if the number of data points N < d? How about for any ?

Positive semi-definitive matrix

Ta a

1 21

1, , ,...,

NT

k k dk

x x x x x xN

a

Page 37: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Properties of Covariance Matrix

What if the number of data points N < d? How about for any ?

Positive semi-definitive matrix Number of different elements in ?

Ta a

1 21

1, , ,...,

NT

k k dk

x x x x x xN

a

Page 38: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Joint distribution p(s,h) for salary (s) and hours for watching TV (h)

12/ 2 1/ 2

, ,1 1

, ,

, ,

2 2, , , ,

1 1

1 1( ; , ) exp

22 | |

Input : ,

1 1mean : , , ,

covariance matrix :

1 1,

Ty y

N N

s h s k s h k hk k

s s s h

h s h h

N N

s s k s s h h k h hk k

s

p x x x

x s h

x xN N

x x x xN N

, , , ,1

1 N

h h s k s s k h hk

x x x xN

Gaussian Generative Model (II)

Page 39: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Joint distribution p(s,h) for salary (s) and hours for watching TV (h)

Gaussian Generative Model (II)

12/ 2 1/ 2

, ,1 1

, ,

, ,

2 2, , , ,

1 1

1 1( ; , ) exp

22 | |

Input : ,

1 1mean : , , ,

covariance matrix :

1 1,

Ty y

N N

s h s k s h k hk k

s s s h

h s h h

N N

s s k s s h h k h hk k

s

p x x x

x s h

x xN N

x x x xN N

, , , ,1

1 N

h h s k s s k h hk

x x x xN

Page 40: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Multi-variate Gaussian Generative Model Input with multiple input features A multi-variate Gaussian distribution for each class

1

/ 2 1/ 2

( | ; ) ~ ( , )

1 1( | ; ) exp

22 | |

Overweight: ( , , ( overweight))

Normal: ( , , ( normal))

y y

T

y y ydy

o o o o

n n n n

p x y N

p x y x x

p p y

p p y

1 2, ,..., dx x x x

Page 41: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Improve Multivariate Gaussian Model How could we improve the prediction of model for

overweight? Multiple modes for each class Introduce more attributes of individuals

Location Occupation The number of children House Age …

Page 42: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Problems with Using Multi-variate Gaussian Generative Model

is a matrix of size dxd, contains d(d+1)/2 independent variables d=100: the number of variables in is 5,050 d=1000: the number of variables in is 505,000 A large parameter space

can be singular If N < d If two features are linear correlated -1 does not exist

1

/ 2 1/ 2

1 1( | ; ) exp

22 | |

T

y y ydy

p x y x x

Page 43: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Problems with Using Multi-variate Gaussian Generative Model

Diagonalize

1

/ 2 1/ 2

1 1( | ; ) exp

22 | |

T

y y ydy

p x y x x

21

2

22,

1

0

0

1

d

N

i k i ik

x xN

21

1

2

0

0 d

Page 44: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Problems with Using Multi-variate Gaussian Generative Model

Diagonalize

Feature independence assumption (Naïve Bayes assumption)

1

/ 2 1/ 2

1 1( | ; ) exp

22 | |

T

y y ydy

p x y x x

2

21/ 2 2

1

1 1( | ; ) exp

22

di i

did i

ii

xp x y

Page 45: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Problems with Using Multi-variate Gaussian Generative Model

Diagonalize

Smooth the covariance matrix

1

/ 2 1/ 2

1 1( | ; ) exp

22 | |

T

y y ydy

p x y x x

2

21/ 2 2

1

1 1( | ; ) exp

22

di i

did i

ii

xp x y

, 0 is a smoothing parameterdI

Page 46: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Overfitting Issue Complex model vs. insufficient training Example

Consider a classification problem of multiple inputs 100 input features 5 classes 1000 training examples

Total number parameters for a full Gaussian model is 5 class prior 5 parameters 5 means 500 parameters 5 covariance matrices 50,500 parameters 51,005 parameters insufficient training data

Page 47: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Model Complexity Vs. Data

-6 -4 -2 0 2 4 6-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

Page 48: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Model Complexity Vs. Data

-6 -4 -2 0 2 4 6-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Page 49: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Model Complexity Vs. Data

-6 -4 -2 0 2 4 6-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Page 50: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Model Complexity Vs. Data

-8 -6 -4 -2 0 2 4 6 8-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Page 51: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Problems with Using Multi-variate Gaussian Generative Model Diagonalize

Feature independence assumption

2

21/ 2 2

1

1 1( | ; ) exp

22

di i

did i

ii

xp x y

2

221 1

1 1( | ; ) exp ( | ; )

22

( | ; ) ~ ( , )

d di i i

i iii

ii i

xp x y p x y

p x y N

Page 52: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Naïve Bayes Model In general, for any generative model, we have to

estimate For x in high dimension space, this probability is hard

to estimate In Naïve Bayes Model, we approximate

( | ; ) (or, ( | ))yp x y p x

( | ; )p x y

1

1 2

( | ; ) ( | ;; )

( | ;; ) ( | ;; )... ( | ;; )

d

ii

d

p x y p x y

p x y p x y p x y

Page 53: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Naïve Bayes Model In general, for any generative model, we have to

estimate For x in high dimension space, this probability is hard

to estimate In Naïve Bayes Model, we approximate

( | ; ) (or, ( | ))yp x y p x

( | ; )p x y

1

1 2

( | ; ) ( | ;; )

( | ;; ) ( | ;; )... ( | ;; )

d

ii

d

p x y p x y

p x y p x y p x y

Page 54: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Naïve Bayes Model In general, for any generative model, we have to

estimate For x in high dimension space, this probability is hard

to estimate In Naïve Bayes Model, we approximate

( | ; ) (or, ( | ))yp x y p x

( | ; )p x y

1

1 2

( | ; ) ( | ; )

( | ; ) ( | ; )... ( | ; )

di

i

d

p x y p x y

p x y p x y p x y

Page 55: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Text Categorization Learn to classify text into predefined categories Input x: a document

Represented by a vector of words Example: {(president, 10), (bush, 2), (election, 5), …}

Output y: if the document is politics or not +1 for political document, -1 for not political document

Page 56: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Text Categorization A generative model for text classification (TC)

Parameter space p(+) and p(-) p(doc|+;), p(doc|-;)

It is difficult to estimate both p(doc|+;), p(doc|-;) Typical vocabulary size ~ 100,000 Each document is a vector of 100,000 attributes ! Too many words in a document

A Naïve Bayes approach

( | ) ~ ( ) ( | )p y doc p y p doc y

Page 57: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Text Classification A generative model for text classification (TC)

Parameter space p(+) and p(-) p(doc|+;), p(doc|-;)

It is difficult to estimate both p(doc|+;), p(doc|-;) Typical vocabulary size ~ 100,000 Each document is a vector of 100,000 attributes ! Too many words in a document

A Naïve Bayes approach

( | ) ~ ( ) ( | )p y doc p y p doc y

Page 58: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Text Classification A generative model for text classification (TC)

Parameter space p(+) and p(-) p(doc|+;), p(doc|-;)

It is difficult to estimate both p(doc|+;), p(doc|-;) Typical vocabulary size ~ 100,000 Each document is a vector of 100,000 attributes ! Too many words in a document

A Naïve Bayes approach

( | ) ~ ( ) ( | )p y doc p y p doc y

Page 59: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Text Classification A Naïve Bayes approach For a document

1 21 2

1

( | ) ( | ) ( | ) ... ( | )

( | )

n

i

t t tn

n tii

p doc p w p w p w

p w

1 1 2 2, , , ,..., ,n ndoc w t w t w t

1 21 2

1

( | ) ( | ) ( | ) ... ( | )

( | )

n

i

t t tn

n tii

p doc p w p w p w

p w

Page 60: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Text Classification The original parameter space

p(+) and p(-) p(doc|+;), p(doc|-;)

Parameter space after Naïve Bayes simplification p(+) and p(-) {p(w1|+), p(w2|+),…, p(wn|+)} {p(w1|-), p(w2|-),…, p(wn|-)}

Page 61: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Text Classification Learning parameters from training examples

Each document

Learn parameters using maximum likelihood estimation

1 2 1 2 , ,..., ; , ,..., n n

N n n

d d d d d d

1 ,1 2 ,2 , = , , , ,..., ,i i i n i nd w t w t w t

Page 62: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Text Classification

,

,

1 1

1 1

1 1

,1 1

,1 1

log ( | ) log ( | )

log ( ) ( | )

log ( ) ( | )

log ( ) log ( | )

log ( ) log ( | )

i j

i j

n ni ii i

tnnji j

tnnji j

n ni j ji j

n ni j ji j

l p d p d

p p w

p p w

p t p w

p t p w

1 2 1 2 , ,..., ; , ,...,n n

N n n

d d d d d d

1 ,1 2 ,2 , = , , , ,..., ,i i i n i nd w t w t w t

Page 63: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Text Classification

,

,

1 1

1 1

1 1

,1 1

,1 1

log ( | ) log ( | )

log ( ) ( | )

log ( ) ( | )

log ( ) log ( | )

log ( ) log ( | )

i j

i j

n ni ii i

tnnji j

tnnji j

n ni j ji j

n ni j ji j

l p d p d

p p w

p p w

p t p w

p t p w

1 2 1 2 , ,..., ; , ,...,n n

N n n

d d d d d d

1 ,1 2 ,2 , = , , , ,..., ,i i i n i nd w t w t w t

Page 64: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Text Classification

,

,

1 1

1 1

1 1

,1 1

,1 1

log ( | ) log ( | )

log ( ) ( | )

log ( ) ( | )

log ( ) log ( | )

log ( ) log ( | )

i j

i j

n ni ii i

tnnji j

tnnji j

n ni j ji j

n ni j ji j

l p d p d

p p w

p p w

p t p w

p t p w

1 2 1 2 , ,..., ; , ,...,n n

N n n

d d d d d d

1 ,1 2 ,2 , = , , , ,..., ,i i i n i nd w t w t w t

Page 65: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Text Classification

, ,1 1

, ,1 1 1 1

( ) , ( )

( | ) , ( | )

n ni j i ji i

j jn n n ni j i jj i j i

n np p

N N

t tp w p w

t t

The optimal solution that maximizes the likelihood of training data

Page 66: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Text ClassificationTwenty Newsgroups An Example

Page 67: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Text Classification Any problems with the Naïve Bayes text classifier? Unseen words

Word ‘w’ is unseen from the training documents, what is the consequence?

Word ‘w’ is only unseen for documents of one class, what is the consequence?

Related to the overfitting problem Any suggestion? Solution: word class approach

Introducing word class T= {t1, t2, …, tm} Compute p(ti|+), p(ti|-) When w is unseen before, replace p(w|) with p(ti|)

Introducing prior for word probabilities

Page 68: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Naïve Bayes Model

This is a terrible approximation

1( | ; ) ( | ; )

d ii

p x y p x y

0 2 1,

0 1 2

0 2 0,

0 0 2

Page 69: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Naïve Bayes Model Why use Naïve Bayes Model ? We are essentially interested in p(y|x;), not

p(x|y;)

' 1

' 1

( ; ) ( | ; ) ( ; ) ( | ; )( | ; )

( ; ) ( '; ) ( | '; )

1( '; ) ( | '; )

( ; ) ( | ; )

c

y

c

y

p y p x y p y p x yp y x

p x p y p x y

p y p x y

p y p x y

Page 70: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Naïve Bayes Model Why use Naïve Bayes Model ? We are essentially interested in p(y|x;), not

p(x|y;)

' 1

' 1

( ; ) ( | ; ) ( ; ) ( | ; )( | ; )

( ; ) ( '; ) ( | '; )

1( '; ) ( | '; )

( ; ) ( | ; )

c

y

c

y

p y p x y p y p x yp y x

p x p y p x y

p y p x y

p y p x y

Page 71: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Naïve Bayes Model Why use Naïve Bayes Model ? We are essentially interested in p(y|x;), not

p(x|y;)

' 1

' 1

( ; ) ( | ; ) ( ; ) ( | ; )( | ; )

( ; ) ( '; ) ( | '; )

1( '; ) ( | '; )

( ; ) ( | ; )

c

y

c

y

p y p x y p y p x yp y x

p x p y p x y

p y p x y

p y p x y

Page 72: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Naïve Bayes Model The key for the prediction model is not p(x|

y;), but the ratio p(x|y;)/p(x|y’;)

Although Naïve Bayes model does a poor job for estimating p(x|y;), it does a reasonable good on estimating the ratio.

Page 73: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

The Ratio of Likelihood for Binary Classes Assume that both classes share the same variance

2 2

, ,

2 21

2 2

, ,

2 21 1

( 1) ( | 1)log

( 1) ( | 1)

( 1)log

( 1)

( 1)2 log

( 1)

i i i id

ii i

i ii im m

ii i ii

p y p x y

p y p x y

x xp y

p y

p yx

p y

Page 74: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

The Ratio of Likelihood for Binary Classes Assume that both classes share the same variance

2 2

, ,

2 21

2 2

, ,

2 21 1

( 1) ( | 1)log

( 1) ( | 1)

( 1)log

( 1)

( 1)2 log

( 1)

i i i id

ii i

i ii im m

ii i ii

p y p x y

p y p x y

x xp y

p y

p yx

p y

Page 75: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

The Ratio of Likelihood for Binary Classes Assume that both classes share the same variance

2 2

, ,

2 21

2 2

, ,

2 21 1

( 1) ( | 1)log

( 1) ( | 1)

( 1)log

( 1)

( 1)2 log

( 1)

i i i id

ii i

i ii im m

ii i ii

p y p x y

p y p x y

x xp y

p y

p yx

p y

Gaussian generative model is a linear model

Page 76: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Linear Decision Boundary Gaussian Generative Models == Finding a linear

decision boundary Why not directly estimate the decision boundary?