introduction to graphical models - imagineimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf ·...
TRANSCRIPT
![Page 1: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/1.jpg)
Introduction to Graphical models
Guillaume Obozinski
Ecole des Ponts - ParisTech
INIT/AERFAI Summer school on Machine Learning
Benicassim, June 26th 2017
G. Obozinski Introduction to Graphical models 1/47
![Page 2: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/2.jpg)
Structured problems in HD
G. Obozinski Introduction to Graphical models 2/47
![Page 3: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/3.jpg)
Structured problems in HD
G. Obozinski Introduction to Graphical models 2/47
![Page 4: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/4.jpg)
Structured problems in HD
G. Obozinski Introduction to Graphical models 2/47
![Page 5: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/5.jpg)
Sequence modelling
How to model the distribution of DNA sequences of length k?
Naive model→ 4n − 1 parameters
Independant model → 3n parameters
x1 x2 x3 x4
First order Markov chain:
x1 x2 x3 x4
Second order Markov chain:
x1 x2 x3 x4
Number of parameters O(n) for chains of length n.
G. Obozinski Introduction to Graphical models 3/47
![Page 6: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/6.jpg)
Sequence modelling
How to model the distribution of DNA sequences of length k?
Naive model→ 4n − 1 parameters
Independant model → 3n parameters
x1 x2 x3 x4
First order Markov chain:
x1 x2 x3 x4
Second order Markov chain:
x1 x2 x3 x4
Number of parameters O(n) for chains of length n.
G. Obozinski Introduction to Graphical models 3/47
![Page 7: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/7.jpg)
Sequence modelling
How to model the distribution of DNA sequences of length k?
Naive model→ 4n − 1 parameters
Independant model → 3n parameters
x1 x2 x3 x4
First order Markov chain:
x1 x2 x3 x4
Second order Markov chain:
x1 x2 x3 x4
Number of parameters O(n) for chains of length n.
G. Obozinski Introduction to Graphical models 3/47
![Page 8: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/8.jpg)
Sequence modelling
How to model the distribution of DNA sequences of length k?
Naive model→ 4n − 1 parameters
Independant model → 3n parameters
x1 x2 x3 x4
First order Markov chain:
x1 x2 x3 x4
Second order Markov chain:
x1 x2 x3 x4
Number of parameters O(n) for chains of length n.
G. Obozinski Introduction to Graphical models 3/47
![Page 9: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/9.jpg)
Sequence modelling
How to model the distribution of DNA sequences of length k?
Naive model→ 4n − 1 parameters
Independant model → 3n parameters
x1 x2 x3 x4
First order Markov chain:
x1 x2 x3 x4
Second order Markov chain:
x1 x2 x3 x4
Number of parameters O(n) for chains of length n.
G. Obozinski Introduction to Graphical models 3/47
![Page 10: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/10.jpg)
Sequence modelling
How to model the distribution of DNA sequences of length k?
Naive model→ 4n − 1 parameters
Independant model → 3n parameters
x1 x2 x3 x4
First order Markov chain:
x1 x2 x3 x4
Second order Markov chain:
x1 x2 x3 x4
Number of parameters O(n) for chains of length n.
G. Obozinski Introduction to Graphical models 3/47
![Page 11: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/11.jpg)
Sequence modelling
How to model the distribution of DNA sequences of length k?
Naive model→ 4n − 1 parameters
Independant model → 3n parameters
x1 x2 x3 x4
First order Markov chain:
x1 x2 x3 x4
Second order Markov chain:
x1 x2 x3 x4
Number of parameters O(n) for chains of length n.
G. Obozinski Introduction to Graphical models 3/47
![Page 12: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/12.jpg)
Models for speech processing
Speech modelled by a sequence of unobserved phonemes
For each phoneme a random sound is produced following adistribution which characterizes the phoneme
Hidden Markov Model: HMMzn−1 zn zn+1
xn−1 xn xn+1
z1 z2
x1 x2
→ Latent variable models
G. Obozinski Introduction to Graphical models 4/47
![Page 13: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/13.jpg)
Models for speech processing
Speech modelled by a sequence of unobserved phonemes
For each phoneme a random sound is produced following adistribution which characterizes the phoneme
Hidden Markov Model: HMMzn−1 zn zn+1
xn−1 xn xn+1
z1 z2
x1 x2
→ Latent variable models
G. Obozinski Introduction to Graphical models 4/47
![Page 14: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/14.jpg)
Models for speech processing
Speech modelled by a sequence of unobserved phonemes
For each phoneme a random sound is produced following adistribution which characterizes the phoneme
Hidden Markov Model: HMMzn−1 zn zn+1
xn−1 xn xn+1
z1 z2
x1 x2
→ Latent variable models
G. Obozinski Introduction to Graphical models 4/47
![Page 15: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/15.jpg)
Models for speech processing
Speech modelled by a sequence of unobserved phonemes
For each phoneme a random sound is produced following adistribution which characterizes the phoneme
Hidden Markov Model: HMMzn−1 zn zn+1
xn−1 xn xn+1
z1 z2
x1 x2
→ Latent variable models
G. Obozinski Introduction to Graphical models 4/47
![Page 16: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/16.jpg)
Modelling image structures
Markov Random Field
Segmentation
→ oriented graphical model vs non oriented
G. Obozinski Introduction to Graphical models 5/47
![Page 17: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/17.jpg)
Modelling image structures
Markov Random Field
Segmentation→ oriented graphical model vs non oriented
G. Obozinski Introduction to Graphical models 5/47
![Page 18: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/18.jpg)
Anaesthesia alarm (Beinlich et al., 1989)
“The ALARM Monitoring system”
http://www.bnlearn.com/documentation/networks/
CVP central venous pressurePCWP pulmonary capillary wedge pressureHIST historyTPR total peripheral resistanceBP blood pressureCO cardiac outputHRBP heart rate / blood pressure.HREK heart rate measured by an EKG monitorHRSA heart rate / oxygen saturation.PAP pulmonary artery pressure.SAO2 arterial oxygen saturation.FIO2 fraction of inspired oxygen.PRSS breathing pressure.ECO2 expelled CO2.MINV minimum volume.MVS minimum volume setHYP hypovolemiaLVF left ventricular failureAPL anaphylaxisANES insufficient anesthesia/analgesia.PMB pulmonary embolusINT intubationKINK kinked tube.DISC disconnectionLVV left ventricular end-diastolic volumeSTKV stroke volumeCCHL catecholamineERLO error low outputHR heart rate.ERCA electrocauterSHNT shuntPVS pulmonary venous oxygen saturationACO2 arterial CO2VALV pulmonary alveoli ventilationVLNG lung ventilationVTUB ventilation tubeVMCH ventilation machine
G. Obozinski Introduction to Graphical models 6/47
![Page 19: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/19.jpg)
Probabilistic model
1
2
3
7
8
6
4
5
9
p(x1, x2, . . . , x9) = f12(x1, x2) f23(x2, x3) f34(x3, x4) f45(x4, x5) . . .
f56(x5, x6) f37(x3, x7) f678(x6, x7, x8) f9(x9)
G. Obozinski Introduction to Graphical models 7/47
![Page 20: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/20.jpg)
Probabilistic model
1
2
3
7
8
6
4
5
9
f12 f34 f45
f678
f9
f56f37f23
p(x1, x2, . . . , x9) = f12(x1, x2) f23(x2, x3) f34(x3, x4) f45(x4, x5) . . .
f56(x5, x6) f37(x3, x7) f678(x6, x7, x8) f9(x9)
G. Obozinski Introduction to Graphical models 7/47
![Page 21: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/21.jpg)
Probabilistic model
1
2
3
7
8
6
4
5
9
f12 f34 f45
f678
f9
f56f37f23
p(x1, x2, . . . , x9) = f12(x1, x2) f23(x2, x3) f34(x3, x4) f45(x4, x5) . . .
f56(x5, x6) f37(x3, x7) f678(x6, x7, x8) f9(x9)
G. Obozinski Introduction to Graphical models 7/47
![Page 22: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/22.jpg)
Abstact models vs concrete ones
Abstracts models
Linear regression
Logistic regression
Mixture model
Principal Component Analysis
Canonical Correlation Analysis
Independent Component analysis
LDA (Multinomiale PCA)
Naive Bayes Classifier
Mixture of experts
Concrete Models
Markov chains
HMM
Tree-structured models
Double HMMs
Oriented acyclic models
Markov Random Fields
Star models
Constellation Model
G. Obozinski Introduction to Graphical models 8/47
![Page 23: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/23.jpg)
Poll...
... about some relevant concepts.
Markov Chain
Density of the multivariate Gaussian distribution
Maximum likelihood estimator
Logistic regression
Entropy
Exponential families of distributions (6= the exponential distribution)
Bayesian inference
Kullback-Leibler divergence
Expectation maximisation algorithm
Sum-product algorithm
MCMC sampling
G. Obozinski Introduction to Graphical models 9/47
![Page 24: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/24.jpg)
Poll...
... about some relevant concepts.
Markov Chain
Density of the multivariate Gaussian distribution
Maximum likelihood estimator
Logistic regression
Entropy
Exponential families of distributions (6= the exponential distribution)
Bayesian inference
Kullback-Leibler divergence
Expectation maximisation algorithm
Sum-product algorithm
MCMC sampling
G. Obozinski Introduction to Graphical models 9/47
![Page 25: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/25.jpg)
Poll...
... about some relevant concepts.
Markov Chain
Density of the multivariate Gaussian distribution
Maximum likelihood estimator
Logistic regression
Entropy
Exponential families of distributions (6= the exponential distribution)
Bayesian inference
Kullback-Leibler divergence
Expectation maximisation algorithm
Sum-product algorithm
MCMC sampling
G. Obozinski Introduction to Graphical models 9/47
![Page 26: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/26.jpg)
Poll...
... about some relevant concepts.
Markov Chain
Density of the multivariate Gaussian distribution
Maximum likelihood estimator
Logistic regression
Entropy
Exponential families of distributions (6= the exponential distribution)
Bayesian inference
Kullback-Leibler divergence
Expectation maximisation algorithm
Sum-product algorithm
MCMC sampling
G. Obozinski Introduction to Graphical models 9/47
![Page 27: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/27.jpg)
Poll...
... about some relevant concepts.
Markov Chain
Density of the multivariate Gaussian distribution
Maximum likelihood estimator
Logistic regression
Entropy
Exponential families of distributions (6= the exponential distribution)
Bayesian inference
Kullback-Leibler divergence
Expectation maximisation algorithm
Sum-product algorithm
MCMC sampling
G. Obozinski Introduction to Graphical models 9/47
![Page 28: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/28.jpg)
Poll...
... about some relevant concepts.
Markov Chain
Density of the multivariate Gaussian distribution
Maximum likelihood estimator
Logistic regression
Entropy
Exponential families of distributions (6= the exponential distribution)
Bayesian inference
Kullback-Leibler divergence
Expectation maximisation algorithm
Sum-product algorithm
MCMC sampling
G. Obozinski Introduction to Graphical models 9/47
![Page 29: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/29.jpg)
Poll...
... about some relevant concepts.
Markov Chain
Density of the multivariate Gaussian distribution
Maximum likelihood estimator
Logistic regression
Entropy
Exponential families of distributions (6= the exponential distribution)
Bayesian inference
Kullback-Leibler divergence
Expectation maximisation algorithm
Sum-product algorithm
MCMC sampling
G. Obozinski Introduction to Graphical models 9/47
![Page 30: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/30.jpg)
Poll...
... about some relevant concepts.
Markov Chain
Density of the multivariate Gaussian distribution
Maximum likelihood estimator
Logistic regression
Entropy
Exponential families of distributions (6= the exponential distribution)
Bayesian inference
Kullback-Leibler divergence
Expectation maximisation algorithm
Sum-product algorithm
MCMC sampling
G. Obozinski Introduction to Graphical models 9/47
![Page 31: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/31.jpg)
Poll...
... about some relevant concepts.
Markov Chain
Density of the multivariate Gaussian distribution
Maximum likelihood estimator
Logistic regression
Entropy
Exponential families of distributions (6= the exponential distribution)
Bayesian inference
Kullback-Leibler divergence
Expectation maximisation algorithm
Sum-product algorithm
MCMC sampling
G. Obozinski Introduction to Graphical models 9/47
![Page 32: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/32.jpg)
Poll...
... about some relevant concepts.
Markov Chain
Density of the multivariate Gaussian distribution
Maximum likelihood estimator
Logistic regression
Entropy
Exponential families of distributions (6= the exponential distribution)
Bayesian inference
Kullback-Leibler divergence
Expectation maximisation algorithm
Sum-product algorithm
MCMC sampling
G. Obozinski Introduction to Graphical models 9/47
![Page 33: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/33.jpg)
Poll...
... about some relevant concepts.
Markov Chain
Density of the multivariate Gaussian distribution
Maximum likelihood estimator
Logistic regression
Entropy
Exponential families of distributions (6= the exponential distribution)
Bayesian inference
Kullback-Leibler divergence
Expectation maximisation algorithm
Sum-product algorithm
MCMC sampling
G. Obozinski Introduction to Graphical models 9/47
![Page 34: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/34.jpg)
Poll...
... about some relevant concepts.
Markov Chain
Density of the multivariate Gaussian distribution
Maximum likelihood estimator
Logistic regression
Entropy
Exponential families of distributions (6= the exponential distribution)
Bayesian inference
Kullback-Leibler divergence
Expectation maximisation algorithm
Sum-product algorithm
MCMC sampling
G. Obozinski Introduction to Graphical models 9/47
![Page 35: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/35.jpg)
Poll...
... about some relevant concepts.
Markov Chain
Density of the multivariate Gaussian distribution
Maximum likelihood estimator
Logistic regression
Entropy
Exponential families of distributions (6= the exponential distribution)
Bayesian inference
Kullback-Leibler divergence
Expectation maximisation algorithm
Sum-product algorithm
MCMC sampling
G. Obozinski Introduction to Graphical models 9/47
![Page 36: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/36.jpg)
Outline
1 Preliminary concepts
2 Directed graphical models
3 Markov random field
4 Operations on graphical models
G. Obozinski Introduction to Graphical models 10/47
![Page 37: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/37.jpg)
Probability distributions
Joint probability distribution of r.v. (X1, . . . ,Xp): p(x1, x2, x3, . . . , xn).We assume either that
Xi takes values in {1, . . . ,K} and
p(x1, . . . , xn) = P(X1 = x1, . . . ,Xn = xn).
or that (X1, . . . ,Xn) admits a density in Rn
Marginalization
p(x1) =∑x2
p(x1, x2)
Factorization
p(x1, . . . , xn) = p(x1)p(x2|x1)p(x3|x1, x2) . . . p(xn|x1, . . . , xn−1)
G. Obozinski Introduction to Graphical models 11/47
![Page 38: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/38.jpg)
Entropy and Kullback-Leibler divergence
Entropie
H(p) = −∑x
p(x) log p(x) = E[− log p(X )]
→ Expectation of the negative log-likelihood
Kullback-Leibler divergence
KL(p‖q) =∑x
p(x) logp(x)
q(x)= Ep
[log
p(X )
q(X )
]→ Expectation of the log-likelihood ratio
→ Property: KL(p‖q) ≥ 0
G. Obozinski Introduction to Graphical models 12/47
![Page 39: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/39.jpg)
Entropy and Kullback-Leibler divergence
Entropie
H(p) = −∑x
p(x) log p(x) = E[− log p(X )]
→ Expectation of the negative log-likelihood
Kullback-Leibler divergence
KL(p‖q) =∑x
p(x) logp(x)
q(x)= Ep
[log
p(X )
q(X )
]→ Expectation of the log-likelihood ratio
→ Property: KL(p‖q) ≥ 0
G. Obozinski Introduction to Graphical models 12/47
![Page 40: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/40.jpg)
Entropy and Kullback-Leibler divergence
Entropie
H(p) = −∑x
p(x) log p(x) = E[− log p(X )]
→ Expectation of the negative log-likelihood
Kullback-Leibler divergence
KL(p‖q) =∑x
p(x) logp(x)
q(x)= Ep
[log
p(X )
q(X )
]→ Expectation of the log-likelihood ratio
→ Property: KL(p‖q) ≥ 0
G. Obozinski Introduction to Graphical models 12/47
![Page 41: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/41.jpg)
Independence concepts
Independence: X ⊥⊥Y
We say that X et Y are independents and write X ⊥⊥Y ssi:
∀x , y , P(X = x ,Y = y) = P(X = x)P(Y = y)
Conditional Independence: X ⊥⊥Y | ZOn says that X and Y are independent conditionally on Z and
write X ⊥⊥Y | Z iff:
∀x , y , z ,
P(X = x ,Y = y | Z = z) = P(X = x |Z = z) P(Y = y |Z = z)
G. Obozinski Introduction to Graphical models 13/47
![Page 42: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/42.jpg)
Independence concepts
Independence: X ⊥⊥Y
We say that X et Y are independents and write X ⊥⊥Y ssi:
∀x , y , P(X = x ,Y = y) = P(X = x)P(Y = y)
Conditional Independence: X ⊥⊥Y | ZOn says that X and Y are independent conditionally on Z and
write X ⊥⊥Y | Z iff:
∀x , y , z ,
P(X = x ,Y = y | Z = z) = P(X = x |Z = z) P(Y = y |Z = z)
G. Obozinski Introduction to Graphical models 13/47
![Page 43: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/43.jpg)
Conditional Independence exemple
Example of“X-linked recessive inheritance”:
Transmission of the generesponsible for hemophilia
Risk for sons from an unaffected father:
dependance between the situation of the two brothers.
conditionally independent given that the mother is a carrier of thegene or not.
G. Obozinski Introduction to Graphical models 14/47
![Page 44: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/44.jpg)
Conditional Independence exemple
Example of“X-linked recessive inheritance”:
Transmission of the generesponsible for hemophilia
Risk for sons from an unaffected father:
dependance between the situation of the two brothers.
conditionally independent given that the mother is a carrier of thegene or not.
G. Obozinski Introduction to Graphical models 14/47
![Page 45: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/45.jpg)
Indicator variable coding for multinomial variables
Let C a r.v. taking values in {1, . . . ,K}, with
P(C = k) = πk .
We will code C with a r.v. Y = (Y1, . . . ,YK )> with
Yk = 1{C=k}
For example if K = 5 and c = 4 then y = (0, 0, 0, 1, 0)>.So y ∈ {0, 1}K with
∑Kk=1 yk = 1.
P(C = k) = P(Yk = 1) and P(Y = y) =K∏
k=1
πykk .
G. Obozinski Introduction to Graphical models 15/47
![Page 46: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/46.jpg)
Indicator variable coding for multinomial variables
Let C a r.v. taking values in {1, . . . ,K}, with
P(C = k) = πk .
We will code C with a r.v. Y = (Y1, . . . ,YK )> with
Yk = 1{C=k}
For example if K = 5 and c = 4 then y = (0, 0, 0, 1, 0)>.So y ∈ {0, 1}K with
∑Kk=1 yk = 1.
P(C = k) = P(Yk = 1) and P(Y = y) =K∏
k=1
πykk .
G. Obozinski Introduction to Graphical models 15/47
![Page 47: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/47.jpg)
Indicator variable coding for multinomial variables
Let C a r.v. taking values in {1, . . . ,K}, with
P(C = k) = πk .
We will code C with a r.v. Y = (Y1, . . . ,YK )> with
Yk = 1{C=k}
For example if K = 5 and c = 4 then y = (0, 0, 0, 1, 0)>.
So y ∈ {0, 1}K with∑K
k=1 yk = 1.
P(C = k) = P(Yk = 1) and P(Y = y) =K∏
k=1
πykk .
G. Obozinski Introduction to Graphical models 15/47
![Page 48: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/48.jpg)
Indicator variable coding for multinomial variables
Let C a r.v. taking values in {1, . . . ,K}, with
P(C = k) = πk .
We will code C with a r.v. Y = (Y1, . . . ,YK )> with
Yk = 1{C=k}
For example if K = 5 and c = 4 then y = (0, 0, 0, 1, 0)>.So y ∈ {0, 1}K with
∑Kk=1 yk = 1.
P(C = k) = P(Yk = 1) and P(Y = y) =K∏
k=1
πykk .
G. Obozinski Introduction to Graphical models 15/47
![Page 49: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/49.jpg)
Indicator variable coding for multinomial variables
Let C a r.v. taking values in {1, . . . ,K}, with
P(C = k) = πk .
We will code C with a r.v. Y = (Y1, . . . ,YK )> with
Yk = 1{C=k}
For example if K = 5 and c = 4 then y = (0, 0, 0, 1, 0)>.So y ∈ {0, 1}K with
∑Kk=1 yk = 1.
P(C = k) = P(Yk = 1) and P(Y = y) =K∏
k=1
πykk .
G. Obozinski Introduction to Graphical models 15/47
![Page 50: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/50.jpg)
Bernoulli, Binomial, Multinomial
Y ∼ Ber(π) (Y1, . . . ,YK ) ∼M(1, π1, . . . , πK )
p(y) = πy (1− π)1−y p(y) = πy11 . . . πyKK
N1 ∼ Bin(n, π) (N1, . . . ,NK ) ∼M(n, π1, . . . , πK )
p(n1) =( nn1
)πn1 (1− π)n−n1 p(n) =
(n
n1 . . . nK
)πn1
1 . . . πnKK
with (n
i
)=
n!
(n − i)!i !and
(n
n1 . . . nK
)=
n!
n1! . . . nK !
G. Obozinski Introduction to Graphical models 16/47
![Page 51: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/51.jpg)
Gaussian model
Univariate gaussian : X ∼ N (µ, σ2)
X is real valued r.v., et θ =(µ, σ2
)∈ Θ = R× R∗+.
pµ,σ2 (x) =1√
2πσ2exp
(−1
2
(x − µ)2
σ2
)
Multivariate gaussian: X ∼ N (µ,Σ)
X takes values in Rd . Si Kn is the set of n × n positive definitematrices, and θ = (µ,Σ) ∈ Θ = Rd ×Kn.
pµ,Σ (x) =1√
(2π)d det Σexp
(−1
2(x − µ)T Σ−1 (x − µ)
)
G. Obozinski Introduction to Graphical models 17/47
![Page 52: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/52.jpg)
Gaussian model
Univariate gaussian : X ∼ N (µ, σ2)
X is real valued r.v., et θ =(µ, σ2
)∈ Θ = R× R∗+.
pµ,σ2 (x) =1√
2πσ2exp
(−1
2
(x − µ)2
σ2
)
Multivariate gaussian: X ∼ N (µ,Σ)
X takes values in Rd . Si Kn is the set of n × n positive definitematrices, and θ = (µ,Σ) ∈ Θ = Rd ×Kn.
pµ,Σ (x) =1√
(2π)d det Σexp
(−1
2(x − µ)T Σ−1 (x − µ)
)
G. Obozinski Introduction to Graphical models 17/47
![Page 53: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/53.jpg)
Gaussian densities
G. Obozinski Introduction to Graphical models 18/47
![Page 54: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/54.jpg)
Gaussian densities
G. Obozinski Introduction to Graphical models 18/47
![Page 55: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/55.jpg)
Maximum likelihood principle
Let a model PΘ ={p(x ; θ) | θ ∈ Θ
}Let an observation x
Likelihood:
L : Θ → R+
θ 7→ p(x ; θ)
Maximum likelihood estimator:
θML = argmaxθ∈Θ
p(x ; θ)Sir Ronald Fisher
(1890-1962)
Case of i.i.d. data
For (xi )1≤i≤n a sample of i.i.d. data of size n:
θML = argmaxθ∈Θ
n∏i=1
p(xi ; θ) = argmaxθ∈Θ
n∑i=1
log p(xi ; θ)
G. Obozinski Introduction to Graphical models 19/47
![Page 56: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/56.jpg)
Maximum likelihood principle
Let a model PΘ ={p(x ; θ) | θ ∈ Θ
}Let an observation x
Likelihood:
L : Θ → R+
θ 7→ p(x ; θ)
Maximum likelihood estimator:
θML = argmaxθ∈Θ
p(x ; θ)Sir Ronald Fisher
(1890-1962)
Case of i.i.d. data
For (xi )1≤i≤n a sample of i.i.d. data of size n:
θML = argmaxθ∈Θ
n∏i=1
p(xi ; θ) = argmaxθ∈Θ
n∑i=1
log p(xi ; θ)
G. Obozinski Introduction to Graphical models 19/47
![Page 57: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/57.jpg)
Maximum likelihood principle
Let a model PΘ ={p(x ; θ) | θ ∈ Θ
}Let an observation x
Likelihood:
L : Θ → R+
θ 7→ p(x ; θ)
Maximum likelihood estimator:
θML = argmaxθ∈Θ
p(x ; θ)Sir Ronald Fisher
(1890-1962)
Case of i.i.d. data
For (xi )1≤i≤n a sample of i.i.d. data of size n:
θML = argmaxθ∈Θ
n∏i=1
p(xi ; θ) = argmaxθ∈Θ
n∑i=1
log p(xi ; θ)
G. Obozinski Introduction to Graphical models 19/47
![Page 58: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/58.jpg)
Maximum likelihood principle
Let a model PΘ ={p(x ; θ) | θ ∈ Θ
}Let an observation x
Likelihood:
L : Θ → R+
θ 7→ p(x ; θ)
Maximum likelihood estimator:
θML = argmaxθ∈Θ
p(x ; θ)Sir Ronald Fisher
(1890-1962)
Case of i.i.d. data
For (xi )1≤i≤n a sample of i.i.d. data of size n:
θML = argmaxθ∈Θ
n∏i=1
p(xi ; θ) = argmaxθ∈Θ
n∑i=1
log p(xi ; θ)
G. Obozinski Introduction to Graphical models 19/47
![Page 59: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/59.jpg)
Bayesian estimation
Parameters θ are modelled as a random variable.
A priori
We have an a priori p (θ) on the model parameters.
A posteriori
The data contribute to the likelihood : p (x |θ).The a posteriori probability of parameters is then
p (θ|x) =p (x |θ) p (θ)
p (x)∝ p (x |θ) p (θ) .
→ The Bayesian estimator is thus a probability distibution on theparameters.
One talks about Bayesian inference.
G. Obozinski Introduction to Graphical models 20/47
![Page 60: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/60.jpg)
Bayesian estimation
Parameters θ are modelled as a random variable.
A priori
We have an a priori p (θ) on the model parameters.
A posteriori
The data contribute to the likelihood : p (x |θ).The a posteriori probability of parameters is then
p (θ|x) =p (x |θ) p (θ)
p (x)∝ p (x |θ) p (θ) .
→ The Bayesian estimator is thus a probability distibution on theparameters.
One talks about Bayesian inference.
G. Obozinski Introduction to Graphical models 20/47
![Page 61: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/61.jpg)
Outline
1 Preliminary concepts
2 Directed graphical models
3 Markov random field
4 Operations on graphical models
G. Obozinski Introduction to Graphical models 21/47
![Page 62: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/62.jpg)
Notations for graphical models
Graphs
G = (V ,E ) is a graph with vertex set V and edge set E .
The graph will be
either a directed acyclic graph (DAG)� then (i , j) ∈ E ⊂ V × V means i → j .
or a an undirected graph
� then {i , j} ∈ E means i and j are adjacent.
Variables of the graphical model
To each node i ∈ V , we associate a graphical variable Xi .
Observations/values of Xi are denoted xi .
If A ⊂ V is a set of nodes we will write XA = (Xi )i∈A etxA = (xi )i∈A.
G. Obozinski Introduction to Graphical models 22/47
![Page 63: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/63.jpg)
Notations for graphical models
Graphs
G = (V ,E ) is a graph with vertex set V and edge set E .The graph will be
either a directed acyclic graph (DAG)� then (i , j) ∈ E ⊂ V × V means i → j .
or a an undirected graph
� then {i , j} ∈ E means i and j are adjacent.
Variables of the graphical model
To each node i ∈ V , we associate a graphical variable Xi .
Observations/values of Xi are denoted xi .
If A ⊂ V is a set of nodes we will write XA = (Xi )i∈A etxA = (xi )i∈A.
G. Obozinski Introduction to Graphical models 22/47
![Page 64: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/64.jpg)
Notations for graphical models
Graphs
G = (V ,E ) is a graph with vertex set V and edge set E .The graph will be
either a directed acyclic graph (DAG)� then (i , j) ∈ E ⊂ V × V means i → j .
or a an undirected graph
� then {i , j} ∈ E means i and j are adjacent.
Variables of the graphical model
To each node i ∈ V , we associate a graphical variable Xi .
Observations/values of Xi are denoted xi .
If A ⊂ V is a set of nodes we will write XA = (Xi )i∈A etxA = (xi )i∈A.
G. Obozinski Introduction to Graphical models 22/47
![Page 65: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/65.jpg)
Notations for graphical models
Graphs
G = (V ,E ) is a graph with vertex set V and edge set E .The graph will be
either a directed acyclic graph (DAG)� then (i , j) ∈ E ⊂ V × V means i → j .
or a an undirected graph
� then {i , j} ∈ E means i and j are adjacent.
Variables of the graphical model
To each node i ∈ V , we associate a graphical variable Xi .
Observations/values of Xi are denoted xi .
If A ⊂ V is a set of nodes we will write XA = (Xi )i∈A etxA = (xi )i∈A.
G. Obozinski Introduction to Graphical models 22/47
![Page 66: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/66.jpg)
Notations for graphical models
Graphs
G = (V ,E ) is a graph with vertex set V and edge set E .The graph will be
either a directed acyclic graph (DAG)� then (i , j) ∈ E ⊂ V × V means i → j .
or a an undirected graph
� then {i , j} ∈ E means i and j are adjacent.
Variables of the graphical model
To each node i ∈ V , we associate a graphical variable Xi .
Observations/values of Xi are denoted xi .
If A ⊂ V is a set of nodes we will write XA = (Xi )i∈A etxA = (xi )i∈A.
G. Obozinski Introduction to Graphical models 22/47
![Page 67: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/67.jpg)
Notations for graphical models
Graphs
G = (V ,E ) is a graph with vertex set V and edge set E .The graph will be
either a directed acyclic graph (DAG)� then (i , j) ∈ E ⊂ V × V means i → j .
or a an undirected graph
� then {i , j} ∈ E means i and j are adjacent.
Variables of the graphical model
To each node i ∈ V , we associate a graphical variable Xi .
Observations/values of Xi are denoted xi .
If A ⊂ V is a set of nodes we will write XA = (Xi )i∈A etxA = (xi )i∈A.
G. Obozinski Introduction to Graphical models 22/47
![Page 68: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/68.jpg)
Directed graphical model or Bayesian network
p(a, b, c) = p(a) p(b|a) p(c |b, a)
a
b
c
p(x1, x2) = p(x1)p(x2)x1 x2
p(x1, x2, x3) = p(x1)p(x2|x1)p(x3|x2)x1 x2 x3
a⊥⊥ b | c
c
a b
G. Obozinski Introduction to Graphical models 23/47
![Page 69: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/69.jpg)
Directed graphical model or Bayesian network
p(a, b, c) = p(a) p(b|a) p(c |b, a)
a
b
c
p(x1, x2) = p(x1)p(x2)x1 x2
p(x1, x2, x3) = p(x1)p(x2|x1)p(x3|x2)x1 x2 x3
a⊥⊥ b | c
c
a b
G. Obozinski Introduction to Graphical models 23/47
![Page 70: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/70.jpg)
Directed graphical model or Bayesian network
p(a, b, c) = p(a) p(b|a) p(c |b, a)
a
b
c
p(x1, x2) = p(x1)p(x2)x1 x2
p(x1, x2, x3) = p(x1)p(x2|x1)p(x3|x2)x1 x2 x3
a⊥⊥ b | c
c
a b
G. Obozinski Introduction to Graphical models 23/47
![Page 71: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/71.jpg)
Directed graphical model or Bayesian network
p(a, b, c) = p(a) p(b|a) p(c |b, a)
a
b
c
p(x1, x2) = p(x1)p(x2)x1 x2
p(x1, x2, x3) = p(x1)p(x2|x1)p(x3|x2)x1 x2 x3
a⊥⊥ b | c
c
a b
G. Obozinski Introduction to Graphical models 23/47
![Page 72: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/72.jpg)
Directed graphical model or Bayesian network
Factorization according to a directed graph
Definition: a distribution factorizes according to a directed graph
p∏j=1
p(xj |xΠj)
x1
x2 x3
x4 x5
x6 x7
p(x1)M∏j=2
p(xj |xj−1)
x1 x2 xM
G. Obozinski Introduction to Graphical models 24/47
![Page 73: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/73.jpg)
Directed graphical model or Bayesian network
Factorization according to a directed graph
Definition: a distribution factorizes according to a directed graph
p∏j=1
p(xj |xΠj)
x1
x2 x3
x4 x5
x6 x7
p(x1)M∏j=2
p(xj |xj−1)
x1 x2 xM
G. Obozinski Introduction to Graphical models 24/47
![Page 74: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/74.jpg)
How to parameterize an Oriented graphical model?
x1
x2 x3
x4 x5
Conditional Probability tables
x1 ∈ {0, 1}x2 ∈ {0, 1, 2}x3 ∈ {0, 1, 2}
p(x3 = k)x1 x2 0 1 2
0 0 1 0 00 1 1 0 00 2 0.1 0 0.91 0 1 0 01 1 0.5 0.5 01 2 0.2 0.3 0.5
p(x;θ) = p(x1; θ1) p(x2|x1; θ2) p(x3|x2, x1; θ3) p(x4|x3, x2; θ4) p(x5|x3; θ5)
G. Obozinski Introduction to Graphical models 25/47
![Page 75: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/75.jpg)
How to parameterize an Oriented graphical model?
x1
x2 x3
x4 x5
Conditional Probability tables
x1 ∈ {0, 1}x2 ∈ {0, 1, 2}x3 ∈ {0, 1, 2}
p(x3 = k)x1 x2 0 1 2
0 0 1 0 00 1 1 0 00 2 0.1 0 0.91 0 1 0 01 1 0.5 0.5 01 2 0.2 0.3 0.5
p(x;θ) = p(x1; θ1) p(x2|x1; θ2) p(x3|x2, x1; θ3) p(x4|x3, x2; θ4) p(x5|x3; θ5)
G. Obozinski Introduction to Graphical models 25/47
![Page 76: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/76.jpg)
How to parameterize an Oriented graphical model?
x1
x2 x3
x4 x5
Conditional Probability tables
x1 ∈ {0, 1}x2 ∈ {0, 1, 2}x3 ∈ {0, 1, 2}
p(x3 = k)x1 x2 0 1 2
0 0 1 0 00 1 1 0 00 2 0.1 0 0.91 0 1 0 01 1 0.5 0.5 01 2 0.2 0.3 0.5
p(x;θ) = p(x1; θ1) p(x2|x1; θ2) p(x3|x2, x1; θ3) p(x4|x3, x2; θ4) p(x5|x3; θ5)
G. Obozinski Introduction to Graphical models 25/47
![Page 77: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/77.jpg)
The Sprinkler
SR
G
R = 1: it has rained
S = 1: the sprinkler worked
G = 1: the grass is wet
P(S = 1) = 0.5
P(R = 1) = 0.2
P(G = 1|S ,R) R=0 R=1
S=0 0.01 0.8S=1 0.8 0.95
Given that we observe that the grass is wet, are R and Sindependent?
G. Obozinski Introduction to Graphical models 26/47
![Page 78: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/78.jpg)
The Sprinkler
SR
G
R = 1: it has rained
S = 1: the sprinkler worked
G = 1: the grass is wet
P(S = 1) = 0.5
P(R = 1) = 0.2
P(G = 1|S ,R) R=0 R=1
S=0 0.01 0.8S=1 0.8 0.95
Given that we observe that the grass is wet, are R and Sindependent?
G. Obozinski Introduction to Graphical models 26/47
![Page 79: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/79.jpg)
The Sprinkler
SR
G
R = 1: it has rained
S = 1: the sprinkler worked
G = 1: the grass is wet
P(S = 1) = 0.5
P(R = 1) = 0.2
P(G = 1|S ,R) R=0 R=1
S=0 0.01 0.8S=1 0.8 0.95
Given that we observe that the grass is wet, are R and Sindependent?
G. Obozinski Introduction to Graphical models 26/47
![Page 80: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/80.jpg)
The Sprinkler II
SR
G
PR = 1: it has rained
S = 1: the sprinkler worked
G = 1: the grass is wet
P =2: the paws of the dog arewet
P(S = 1) = 0.5 P(R = 1) = 0.2
P(G = 1|S ,R) R=0 R=1
S=0 0.01 0.8S=1 0.8 0.95
P(P = 1|G ) G=0 G=1
0.2 0.7
G. Obozinski Introduction to Graphical models 27/47
![Page 81: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/81.jpg)
The Sprinkler II
SR
G
P
R = 1: it has rained
S = 1: the sprinkler worked
G = 1: the grass is wet
P =2: the paws of the dog arewet
P(S = 1) = 0.5 P(R = 1) = 0.2
P(G = 1|S ,R) R=0 R=1
S=0 0.01 0.8S=1 0.8 0.95
P(P = 1|G ) G=0 G=1
0.2 0.7
G. Obozinski Introduction to Graphical models 27/47
![Page 82: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/82.jpg)
The Sprinkler II
SR
G
PR = 1: it has rained
S = 1: the sprinkler worked
G = 1: the grass is wet
P =2: the paws of the dog arewet
P(S = 1) = 0.5 P(R = 1) = 0.2
P(G = 1|S ,R) R=0 R=1
S=0 0.01 0.8S=1 0.8 0.95
P(P = 1|G ) G=0 G=1
0.2 0.7
G. Obozinski Introduction to Graphical models 27/47
![Page 83: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/83.jpg)
Factorization and Independence
A factorization imposes independence statements
∀x , p(x) =
p∏j=1
p(xj |xΠj) ⇔ ∀j , Xj ⊥⊥X{1,..., j−1}\Πj
| XΠj
Is it possible to read from the graph the (conditional) independencestatements that hold given the factorization.
X5
?⊥⊥X2 | X4
x1
x2 x3
x4 x5
x6 x7
G. Obozinski Introduction to Graphical models 28/47
![Page 84: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/84.jpg)
Factorization and Independence
A factorization imposes independence statements
∀x , p(x) =
p∏j=1
p(xj |xΠj) ⇔ ∀j , Xj ⊥⊥X{1,..., j−1}\Πj
| XΠj
Is it possible to read from the graph the (conditional) independencestatements that hold given the factorization.
X5
?⊥⊥X2 | X4
x1
x2 x3
x4 x5
x6 x7
G. Obozinski Introduction to Graphical models 28/47
![Page 85: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/85.jpg)
Blocking nodes
diverging edges head-to-tail converging edges
c
a b
a c b
c
a b
=a⊥⊥� b a⊥⊥� b a⊥⊥ b
c
a b
a c b
c
a b
= =a⊥⊥ b | c a⊥⊥ b | c a⊥⊥� b | c
The configuration with converging edges is called a v-structure
G. Obozinski Introduction to Graphical models 29/47
![Page 86: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/86.jpg)
d-separation
f
e b
a
c
Theorem
If A,B and C are three disjoint sets of node, the statementXA⊥⊥XB |XC holds if all paths joining A to B go through at least oneblocking node. A node j is blocking a path
if the edges of the paths are diverging/following and j ∈ C
if the edges of the paths are converging (i.e. form a v-structure)and neither j nor any of its descendants is in C
G. Obozinski Introduction to Graphical models 30/47
![Page 87: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/87.jpg)
d-separation
f
e b
a
c
Theorem
If A,B and C are three disjoint sets of node, the statementXA⊥⊥XB |XC holds if all paths joining A to B go through at least oneblocking node. A node j is blocking a path
if the edges of the paths are diverging/following and j ∈ C
if the edges of the paths are converging (i.e. form a v-structure)and neither j nor any of its descendants is in C
G. Obozinski Introduction to Graphical models 30/47
![Page 88: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/88.jpg)
Factorization et Independence II
Several graphs can induce the same set of conditionalindependences .
c
a b
a c b
p(c)p(a|c)p(b|c) = p(a)p(c |a)p(b|c)
Some combinations of conditional independences cannot befaithfully represented by a graphical model
Ex1: X ∼ Ber 12 Y ∼ Ber 1
2 Z = X ⊕ Y .Ex2: X ⊥⊥Y | Z = 1 but X ⊥⊥� Y | Z = 0
G. Obozinski Introduction to Graphical models 31/47
![Page 89: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/89.jpg)
Factorization et Independence II
Several graphs can induce the same set of conditionalindependences .
c
a b
a c b
p(c)p(a|c)p(b|c) = p(a)p(c |a)p(b|c)
Some combinations of conditional independences cannot befaithfully represented by a graphical model
Ex1: X ∼ Ber 12 Y ∼ Ber 1
2 Z = X ⊕ Y .Ex2: X ⊥⊥Y | Z = 1 but X ⊥⊥� Y | Z = 0
G. Obozinski Introduction to Graphical models 31/47
![Page 90: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/90.jpg)
Factorization et Independence II
Several graphs can induce the same set of conditionalindependences .
c
a b
a c b
p(c)p(a|c)p(b|c)
= p(a)p(c |a)p(b|c)
Some combinations of conditional independences cannot befaithfully represented by a graphical model
Ex1: X ∼ Ber 12 Y ∼ Ber 1
2 Z = X ⊕ Y .Ex2: X ⊥⊥Y | Z = 1 but X ⊥⊥� Y | Z = 0
G. Obozinski Introduction to Graphical models 31/47
![Page 91: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/91.jpg)
Factorization et Independence II
Several graphs can induce the same set of conditionalindependences .
c
a b
a c b
p(c)p(a|c)p(b|c) = p(a)p(c |a)p(b|c)
Some combinations of conditional independences cannot befaithfully represented by a graphical model
Ex1: X ∼ Ber 12 Y ∼ Ber 1
2 Z = X ⊕ Y .Ex2: X ⊥⊥Y | Z = 1 but X ⊥⊥� Y | Z = 0
G. Obozinski Introduction to Graphical models 31/47
![Page 92: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/92.jpg)
Factorization et Independence II
Several graphs can induce the same set of conditionalindependences .
c
a b
a c b
p(c)p(a|c)p(b|c) = p(a)p(c |a)p(b|c)
Some combinations of conditional independences cannot befaithfully represented by a graphical model
Ex1: X ∼ Ber 12 Y ∼ Ber 1
2 Z = X ⊕ Y .Ex2: X ⊥⊥Y | Z = 1 but X ⊥⊥� Y | Z = 0
G. Obozinski Introduction to Graphical models 31/47
![Page 93: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/93.jpg)
Factorization et Independence II
Several graphs can induce the same set of conditionalindependences .
c
a b
a c b
p(c)p(a|c)p(b|c) = p(a)p(c |a)p(b|c)
Some combinations of conditional independences cannot befaithfully represented by a graphical model
Ex1: X ∼ Ber 12 Y ∼ Ber 1
2 Z = X ⊕ Y .
Ex2: X ⊥⊥Y | Z = 1 but X ⊥⊥� Y | Z = 0
G. Obozinski Introduction to Graphical models 31/47
![Page 94: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/94.jpg)
Factorization et Independence II
Several graphs can induce the same set of conditionalindependences .
c
a b
a c b
p(c)p(a|c)p(b|c) = p(a)p(c |a)p(b|c)
Some combinations of conditional independences cannot befaithfully represented by a graphical model
Ex1: X ∼ Ber 12 Y ∼ Ber 1
2 Z = X ⊕ Y .Ex2: X ⊥⊥Y | Z = 1 but X ⊥⊥� Y | Z = 0
G. Obozinski Introduction to Graphical models 31/47
![Page 95: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/95.jpg)
Outline
1 Preliminary concepts
2 Directed graphical models
3 Markov random field
4 Operations on graphical models
G. Obozinski Introduction to Graphical models 32/47
![Page 96: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/96.jpg)
Markov random field (MRF) or Oriented graphical model
Is it possible to associate to each graph a family of distribution so thatconditional independence coincides exactly with the notion of separationin the graph?
Global Markov Property
XA⊥⊥XB | XC ⇔ C separates A et B
A
CB
G. Obozinski Introduction to Graphical models 33/47
![Page 97: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/97.jpg)
Gibbs distribution
Clique Set of nodes that are all connected to one another.
Potential function The potential ψC (xC ) ≥ 0 is associated to clique C .
Gibbs distribution
p(x) =1
Z
∏C
ψC (xC )
Partition function
Z =∑x
∏C
ψC (xC )
x1
x2
x3
x4
Writing potential in exponential form ψC (xC ) = exp{−E (xC )}.E (xC ) is an energy.This a Boltzmann distribution.
G. Obozinski Introduction to Graphical models 34/47
![Page 98: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/98.jpg)
Gibbs distribution
Clique Set of nodes that are all connected to one another.
Potential function The potential ψC (xC ) ≥ 0 is associated to clique C .
Gibbs distribution
p(x) =1
Z
∏C
ψC (xC )
Partition function
Z =∑x
∏C
ψC (xC )
x1
x2
x3
x4
Writing potential in exponential form ψC (xC ) = exp{−E (xC )}.E (xC ) is an energy.This a Boltzmann distribution.
G. Obozinski Introduction to Graphical models 34/47
![Page 99: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/99.jpg)
Gibbs distribution
Clique Set of nodes that are all connected to one another.
Potential function The potential ψC (xC ) ≥ 0 is associated to clique C .
Gibbs distribution
p(x) =1
Z
∏C
ψC (xC )
Partition function
Z =∑x
∏C
ψC (xC )
x1
x2
x3
x4
Writing potential in exponential form ψC (xC ) = exp{−E (xC )}.E (xC ) is an energy.This a Boltzmann distribution.
G. Obozinski Introduction to Graphical models 34/47
![Page 100: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/100.jpg)
Gibbs distribution
Clique Set of nodes that are all connected to one another.
Potential function The potential ψC (xC ) ≥ 0 is associated to clique C .
Gibbs distribution
p(x) =1
Z
∏C
ψC (xC )
Partition function
Z =∑x
∏C
ψC (xC )
x1
x2
x3
x4
Writing potential in exponential form ψC (xC ) = exp{−E (xC )}.E (xC ) is an energy.This a Boltzmann distribution.
G. Obozinski Introduction to Graphical models 34/47
![Page 101: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/101.jpg)
Example 1: Ising model
X = (X1, . . . ,Xd) is a collection of binary variables, whose jointprobability distribution is
p(x1, . . . , xd) =1
Z (η)exp
(∑i∈V
ηixi +∑{i ,j}∈E
ηijxixj
)=
1
Z (η)
∏i∈V
eηixi∏{i ,j}∈E
eηijxixj
=1
Z (η)
∏i∈V
ψi (xi )∏{i ,j}∈E
ψi (xi , xj)
with ψi (xi ) = eηixi and ψij(xi , xj) = eηijxixj .
G. Obozinski Introduction to Graphical models 35/47
![Page 102: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/102.jpg)
Example 2: Directed graphical model
Consider a distribution p that factorizes according to a directed graphG = (V ,E ), then
p(x1, . . . , xd) =d∏
i=1
p(xi | xπi )
=d∏
i=1
ψCi(xCi
) with Ci = {i} ∪ πi
Consequence: A distribution that factorizes according to a directedmodel is a Gibbs distribution for the cliques Ci = {i} ∪ πi . As aconsequence, it factorizes according to an undirected graph in which Ci
are cliques.
G. Obozinski Introduction to Graphical models 36/47
![Page 103: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/103.jpg)
Theorem of Hammersley and Clifford (1971)
A distribution p, which is such that p(x) > 0 for all x satisfies the globalMarkov property for graph G if and only if it is a Gibbs distributionassociated with G .
Gibbs distribution: PG : p(x) =1
Z
∏C∈CG
ψC (xC )
Global Markov property:
PM : XA⊥⊥XB | XC si C separated A and B in G
Theorem
We have PG ⇒ PM and (HC): if ∀x , p(x) > 0, then PM ⇒ PG
G. Obozinski Introduction to Graphical models 37/47
![Page 104: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/104.jpg)
Markov Blanket in an undirected graph
Definition
The Markov Blanket B of a node i is the smallest set of nodes B suchthat
Xi ⊥⊥XR | XB , with R = V \(B ∪ {i})
or equivalently such that
p(Xi | X−i ) = p(Xi | XB).
G. Obozinski Introduction to Graphical models 38/47
![Page 105: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/105.jpg)
Markov Blanket in an undirected graph
Definition
The Markov Blanket B of a node i is the smallest set of nodes B suchthat
Xi ⊥⊥XR | XB , with R = V \(B ∪ {i})
or equivalently such that
p(Xi | X−i ) = p(Xi | XB).
G. Obozinski Introduction to Graphical models 38/47
![Page 106: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/106.jpg)
Markov Blanket in an undirected graph
Definition
The Markov Blanket B of a node i is the smallest set of nodes B suchthat
Xi ⊥⊥XR | XB , with R = V \(B ∪ {i})
or equivalently such that
p(Xi | X−i ) = p(Xi | XB).
G. Obozinski Introduction to Graphical models 38/47
![Page 107: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/107.jpg)
Markov Blanket for a directed graph?
What is the Markov Blanket in a directed graph? By definition: thesmallest set C of nodes such that conditionally on XC , j is independentof all the other nodes in the graph?
xi
G. Obozinski Introduction to Graphical models 39/47
![Page 108: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/108.jpg)
Markov Blanket for a directed graph?
What is the Markov Blanket in a directed graph? By definition: thesmallest set C of nodes such that conditionally on XC , j is independentof all the other nodes in the graph?
xi
G. Obozinski Introduction to Graphical models 39/47
![Page 109: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/109.jpg)
Moralization
For a given oriented graphical model
is there an unoriented graphical model which is equivalent?
is there a smallest unoriented graphical which contains the orientedgraphical model?
p(x) =1
Z
∏C
ψC (xC ) vsM∏j=1
p(xj |xΠj)
G. Obozinski Introduction to Graphical models 40/47
![Page 110: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/110.jpg)
Moralization
Given a directed graph G , its moralized graph GM is obtained by
1 For any node i , add undirected edges between all its parents
2 Remove the orientation of all the oriented edges
x1 x3
x4
x2
x1 x3
x4
x2
Proposition
If a probability distribution factorizes according to a directed graph Gthen it factorizes according to the undirected graph GM .
G. Obozinski Introduction to Graphical models 41/47
![Page 111: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/111.jpg)
Moralization
Given a directed graph G , its moralized graph GM is obtained by
1 For any node i , add undirected edges between all its parents
2 Remove the orientation of all the oriented edges
x1 x3
x4
x2
x1 x3
x4
x2
Proposition
If a probability distribution factorizes according to a directed graph Gthen it factorizes according to the undirected graph GM .
G. Obozinski Introduction to Graphical models 41/47
![Page 112: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/112.jpg)
Moralization
Given a directed graph G , its moralized graph GM is obtained by
1 For any node i , add undirected edges between all its parents
2 Remove the orientation of all the oriented edges
x1 x3
x4
x2
x1 x3
x4
x2
Proposition
If a probability distribution factorizes according to a directed graph Gthen it factorizes according to the undirected graph GM .
G. Obozinski Introduction to Graphical models 41/47
![Page 113: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/113.jpg)
Directed vs undirected trees
Definition: directed tree
A directed tree is a DAG such that each node has at most one parent
Remark: By definition a directed tree has no v-structure.
Moralizing trees
What is the moralized graph for a directed tree?
The corresponding undirected tree!
Proposition (Equivalence between directed and undirected tree)
A distribution factorizes according to a directed tree if and only if itfactorizes according to its undirected version.
Corollary
All orientations of the edges of a tree that do not create v-structure areequivalent.
G. Obozinski Introduction to Graphical models 42/47
![Page 114: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/114.jpg)
Directed vs undirected trees
Definition: directed tree
A directed tree is a DAG such that each node has at most one parent
Remark: By definition a directed tree has no v-structure.
Moralizing trees
What is the moralized graph for a directed tree?
The corresponding undirected tree!
Proposition (Equivalence between directed and undirected tree)
A distribution factorizes according to a directed tree if and only if itfactorizes according to its undirected version.
Corollary
All orientations of the edges of a tree that do not create v-structure areequivalent.
G. Obozinski Introduction to Graphical models 42/47
![Page 115: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/115.jpg)
Directed vs undirected trees
Definition: directed tree
A directed tree is a DAG such that each node has at most one parent
Remark: By definition a directed tree has no v-structure.
Moralizing trees
What is the moralized graph for a directed tree?
The corresponding undirected tree!
Proposition (Equivalence between directed and undirected tree)
A distribution factorizes according to a directed tree if and only if itfactorizes according to its undirected version.
Corollary
All orientations of the edges of a tree that do not create v-structure areequivalent.
G. Obozinski Introduction to Graphical models 42/47
![Page 116: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/116.jpg)
Directed vs undirected trees
Definition: directed tree
A directed tree is a DAG such that each node has at most one parent
Remark: By definition a directed tree has no v-structure.
Moralizing trees
What is the moralized graph for a directed tree?
The corresponding undirected tree!
Proposition (Equivalence between directed and undirected tree)
A distribution factorizes according to a directed tree if and only if itfactorizes according to its undirected version.
Corollary
All orientations of the edges of a tree that do not create v-structure areequivalent.
G. Obozinski Introduction to Graphical models 42/47
![Page 117: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/117.jpg)
Outline
1 Preliminary concepts
2 Directed graphical models
3 Markov random field
4 Operations on graphical models
G. Obozinski Introduction to Graphical models 43/47
![Page 118: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/118.jpg)
Operations on graphical models
Probabilistic inference
Compute a marginal distribution p(xi ) or a conditional marginalp(xi |x1 = 3, x7 = 0)
Decoding (aka MAP Inference)
Finding what is the most probable configuration for the set of randomvariables?
argmaxzp(z |x)
zn−1 zn zn+1
xn−1 xn xn+1
z1 z2
x1 x2
G. Obozinski Introduction to Graphical models 44/47
![Page 119: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/119.jpg)
Operations on graphical models
Probabilistic inference
Compute a marginal distribution p(xi ) or a conditional marginalp(xi |x1 = 3, x7 = 0)
Decoding (aka MAP Inference)
Finding what is the most probable configuration for the set of randomvariables?
argmaxzp(z |x)
zn−1 zn zn+1
xn−1 xn xn+1
z1 z2
x1 x2
G. Obozinski Introduction to Graphical models 44/47
![Page 120: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/120.jpg)
Learning/ estimation in graphical models
Frequentist learning
The main frequentist learning principle for graphical model is themaximum likelihood principle of R. Fisher. Letp(x ;θ) = 1
Z(θ)
∏C ψ(xC , θC ), we would like to find
argmaxθ
n∏i=1
p(x (i);θ) = argmaxθ1
Z (θ)
n∏i=1
∏C
ψ(x(i)C , θC )
Bayesian learning
Graphical models can also learn using bayesian inference.
G. Obozinski Introduction to Graphical models 45/47
![Page 121: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/121.jpg)
The “Naive Bayes” model for classification
Data
Class label: C ∈ {1, . . . ,K}Class indicator vector Z ∈ {0, 1}K
Features Xj , j = 1, . . . ,D(e.g. word presence)
Model
p(z) =∏k
πzkk
Model
Which model for
p(x1, . . . , xD |zk = 1) ?
z
x1 xD
“Naive” hypothesis
p(x1, . . . , xD |zk = 1) =D∏j=1
p(xj | zk = 1; bjk) =D∏j=1
bxjjk (1− bjk)1−xj
with bjk = P(xj = 1 | zk = 1).
G. Obozinski Introduction to Graphical models 46/47
![Page 122: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/122.jpg)
The “Naive Bayes” model for classification
Data
Class label: C ∈ {1, . . . ,K}Class indicator vector Z ∈ {0, 1}K
Features Xj , j = 1, . . . ,D(e.g. word presence)
Model
p(z) =∏k
πzkk
Model
Which model for
p(x1, . . . , xD |zk = 1) ?
z
x1 xD
“Naive” hypothesis
p(x1, . . . , xD |zk = 1) =D∏j=1
p(xj | zk = 1; bjk) =D∏j=1
bxjjk (1− bjk)1−xj
with bjk = P(xj = 1 | zk = 1).
G. Obozinski Introduction to Graphical models 46/47
![Page 123: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/123.jpg)
The “Naive Bayes” model for classification
Data
Class label: C ∈ {1, . . . ,K}Class indicator vector Z ∈ {0, 1}K
Features Xj , j = 1, . . . ,D(e.g. word presence)
Model
p(z) =∏k
πzkk
Model
Which model for
p(x1, . . . , xD |zk = 1) ?
z
x1 xD
“Naive” hypothesis
p(x1, . . . , xD |zk = 1) =D∏j=1
p(xj | zk = 1; bjk) =D∏j=1
bxjjk (1− bjk)1−xj
with bjk = P(xj = 1 | zk = 1).G. Obozinski Introduction to Graphical models 46/47
![Page 124: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/124.jpg)
Naive Bayes (continued)
Learning (estimation) with the maximum likelihood principle
π = argmaxπ:π>1=1
∏k,i
πz
(i)k
k bjk = argmaxbjk
n∑i=1
log p(x(i)j |z
(i) = k; bjk)
Prediction:
z = argmaxz
∏Dj=1 p(xj |z)p(z)∑
z ′∏D
j=1 p(xj |z ′)p(z ′)
Properties
Ignores the correlation between features
Prediction requires only to use Bayes rule
The model can be learnt in parallel
Complexity in O(nD)
G. Obozinski Introduction to Graphical models 47/47
![Page 125: Introduction to Graphical models - IMAGINEimagine.enpc.fr/~obozinsg/benicassim/benicassim1.pdf · 2017-07-04 · Introduction to Graphical models Guillaume Obozinski Ecole des Ponts](https://reader030.vdocuments.us/reader030/viewer/2022041008/5eaea0bcde4f2a5a1c65f6dc/html5/thumbnails/125.jpg)
Naive Bayes (continued)
Learning (estimation) with the maximum likelihood principle
π = argmaxπ:π>1=1
∏k,i
πz
(i)k
k bjk = argmaxbjk
n∑i=1
log p(x(i)j |z
(i) = k; bjk)
Prediction:
z = argmaxz
∏Dj=1 p(xj |z)p(z)∑
z ′∏D
j=1 p(xj |z ′)p(z ′)
Properties
Ignores the correlation between features
Prediction requires only to use Bayes rule
The model can be learnt in parallel
Complexity in O(nD)
G. Obozinski Introduction to Graphical models 47/47