1 Machine Learning for Computer Vision – Lecture 5
Iasonas Kokkinos
22 October, 2012 MVA – ENS Cachan
Machine Learning for Computer Vision
Galen Group INRIA-Saclay
Center for Visual Computing Ecole Centrale Paris
Lecture 5: Introduction to generative models
2 Machine Learning for Computer Vision – Lecture 5
Lecture outline
Bayes’ rule and generative models
Density estimation
Parametric deformable models
3 Machine Learning for Computer Vision – Lecture 5
Decision Theory
• What is an optimal decision rule? • Consider loss matrix
• Consider underlying joint distribution of data-label pairs
• Find decision rule f that minimizes expected loss:
4 Machine Learning for Computer Vision – Lecture 5
Optimal Classifier
• Consider zero-one loss function: • Form `Expected Prediction Error’:
• Optimal decision at any :
– `Bayes-optimal classifier‘
5 Machine Learning for Computer Vision – Lecture 5
Bayes’ theorem
• P(X|Y): likelihood of observations X, given class Y. • P(Y): Prior probability of class Y • P(Y|X): Posterior probability of class Y, given observations Y.
x i ; yi ; i = 1 : : : Nx i
Why is this identity important?
6 Machine Learning for Computer Vision – Lecture 5
• Discriminative
• Generative
x i ; yi ; i = 1 : : : Nx i
Posterior Model-based Likelihood
Two approaches to pattern recognition
Observations, X
Y
Bayesian Vision: Inverse Graphics
Discriminative Vision: Function approximation
Y Observations, X
7 Machine Learning for Computer Vision – Lecture 5
Generative or disciminative?
x i ; yi ; i = 1 : : : Nx i
– Discriminative Models (lectures 1-4) : Skip density estimation
• More robust to wrong distribution assumptions (e.g. outliers) • V. Vapnik: `one should solve the classification problem directly and never solve
a more general problem as an intermediate step’
Training Set
Class Distributions
Class posteriors
Density Estimation (e.g. ML)
Bayes’ Rule
– Generative Models (Lectures 5-7) : Core task: density estimation • If we know the distributions, requires smaller training sets • Dealing with missing/corrupt data • Explicit modelling of sources of variation (e.g. translation) • Conceptual clarity, ability for `visual debugging’
8 Machine Learning for Computer Vision – Lecture 5
Lecture outline
Bayes’ rule and generative models
Density estimation
Gaussian distributions
Mixture-of-Gaussian models
Hidden Variables and Expectation-Maximization algorithm
Factor Analysis & PCA
Parametric deformable models
Mixed Discrete/Continuous hidden variable models
9 Machine Learning for Computer Vision – Lecture 5
• Training set:
• Examples corresponding to class k:
• Training data for class k:
• One density estimate per class: for short:
Density Estimation
10 Machine Learning for Computer Vision – Lecture 5
Parametric Distributions: Gaussian
– 1 D
– N D
• e.g. 2D:
11 Machine Learning for Computer Vision – Lecture 5
• Covariance matrix:
• Uncorrelated coordinates: diagonal covariance
Covariance matrix reminder
Height, Income Height, Weight
12 Machine Learning for Computer Vision – Lecture 5
Density Estimation for a Gaussian distribution • Given:
• Notation: • Maximum Likelihood Estimation for class k:
x i ; yi ; i = 1 : : : Nx i
13 Machine Learning for Computer Vision – Lecture 5
Classification task: Ferrari or Fiat?
• Consider placing a personalized ad. Which car will you try?
• New client:
• Classification problem: new client is likely to buy Fiat/Ferarri. • Build class-specific probability distributions
x i ; yi ; i = 1 : : : Nx i
14 Machine Learning for Computer Vision – Lecture 5
Ferrari or Fiat, continued
x i ; yi ; i = 1 : : : Nx i
Can we proceed to classification?
Need to estimate:
ML estimate:
Parameter estimation: Maximum Likelihood (ML)
Class-specific Gaussian Distributions
Bayes’ rule:
15 Machine Learning for Computer Vision – Lecture 5
Bayes rule, 1D
16 Machine Learning for Computer Vision – Lecture 5
Classifier form for Gaussian Distributions
• Choose class
• Decision boundary for the binary case:
Quadratic Decision Boundaries
• Special case:
Linear Decision Boundaries
17 Machine Learning for Computer Vision – Lecture 5
Lecture outline
Bayes’ rule and generative models
Density estimation
Gaussian distributions
Mixture-of-Gaussian models
Expectation-Maximization algorithm and hidden variables
Factor Analysis & PCA
Parametric deformable models
Mixed Discrete/Continuous hidden variable models
18 Machine Learning for Computer Vision – Lecture 5
Mixture of Gaussians model
•
P(x) = .2
P(x) = .5
P(x) = .1
Main challenge: parameter estimation Which points go with which cluster?
19 Machine Learning for Computer Vision – Lecture 5
K-Means algorithm
– Coordinate descent on distortion cost:
– Local minima (multiple initializations to find better solution)
20 Machine Learning for Computer Vision – Lecture 5
Lecture outline
Bayes’ rule and generative models
Density estimation
Gaussian distributions
Mixture-of-Gaussian models
Expectation-Maximization algorithm and hidden variables
Factor Analysis & PCA
Parametric deformable models
Mixed Discrete/Continuous hidden variable models
21 Machine Learning for Computer Vision – Lecture 5
K-Means algorithm
22 Machine Learning for Computer Vision – Lecture 5
Adaptation for Gaussian distributions
23 Machine Learning for Computer Vision – Lecture 5
Expectation Maximization algorithm
M-step
E-step
24 Machine Learning for Computer Vision – Lecture 5
K-means vs. EM
k-means EM Closest center’s index Soft assignment, R Isotropic Distance Anisotropic Likelihood (Euclidean) (Covariance-based,`Mahalanobis’)
Fast (e.g. kd-trees) Accurate & more flexible More robust to initalization Prone to local minima
Typical usage: initialize EM with k-means results
Coordinate Descent on Coordinate descent on?
25 Machine Learning for Computer Vision – Lecture 5
Mixture of Gaussians
•
• Maximum Likelihood Estimation:
26 Machine Learning for Computer Vision – Lecture 5
27 Machine Learning for Computer Vision – Lecture 5
Hidden Variables:
• Criterion:
– Problem: Summation inside logarithm – We do not know which component generated each point – What if we knew?
28 Machine Learning for Computer Vision – Lecture 5
Plato’s cave
Observations: B&W Images Models: 3D surfaces
Hidden variables: positions
29 Machine Learning for Computer Vision – Lecture 5
Hidden Variables:
• Criterion:
– Problem: Summation inside logarithm – We do not know which component generated each point – What if we knew?
• Hidden variable – Indicate which component is responsible for each point – Multinomially distributed variable
30 Machine Learning for Computer Vision – Lecture 5
Rewriting the MoG distribution
• Marginalization
• Chain rule
• We have
31 Machine Learning for Computer Vision – Lecture 5
Complete Log-Likelihood
• Assume hidden variables are given • Data+ hidden variables = complete observations • Complete log-likelihood
• Summation falls outside the logarithm!
32 Machine Learning for Computer Vision – Lecture 5
• Given: Hidden Variables
• Maximize w.r.t . parameters
Full Observation Log Likelihood
33 Machine Learning for Computer Vision – Lecture 5
• We do not know the hidden variables (`missing data’) • Complete log-likelihood is a random quantity. • Form its expectation, using a distribution q(h) on hidden variables:
• Expected complete log-likelihood
Expected Complete Log-Likelihood
34 Machine Learning for Computer Vision – Lecture 5
• Given: Hidden Variables
• Maximize w.r.t . parameters
Full Observation Log-Likelihood
35 Machine Learning for Computer Vision – Lecture 5
• Given: Probability of assignment
• Maximize w.r.t . parameters • M-step!
Expected Log-Likelihood
36 Machine Learning for Computer Vision – Lecture 5
Lecture outline
Bayes’ rule and generative models
Density estimation
Gaussian distributions
Mixture-of-Gaussian models
Expectation-Maximization algorithm and hidden variables
Factor Analysis & PCA
Parametric deformable models
Mixed Discrete/Continuous hidden variable models
37 Machine Learning for Computer Vision – Lecture 5
P(Grades|MVA) • 10 students, 20 courses
– How can we model the distribution of the grades? – Consider a Gaussian distribution..
• 20X19/2 Parameters in covariance, 10 measurements – Could we `summarize’ performance in a more compact way?
• 3 `hidden’ causes – Math skills, CS skills, Effort – Different skills per student – Different effects of skills on grade per course
Observed grades Influence of skills on grade Skills per student
38 Machine Learning for Computer Vision – Lecture 5
Generative Model: Factor Analysis
• Hidden variables (skills) • Observations
– `factor loading’ matrix Λ (course-specific effect of skills on grade) – noise covariance matrix Ψ (performance on exam)
• Linear model • Distribution of x (see end of slides)
• Density estimation: recover optimal µ, Λ, Ψ, for a set of data Χ
39 Machine Learning for Computer Vision – Lecture 5
Continuous Hidden Variables: Factor Analysis
• Find low-dimensional subspace (`skills’) explaining data • Hidden variables: coordinates on subspace
– E-step: posterior on coordinates – M-step: subspace
40 Machine Learning for Computer Vision – Lecture 5
EM for Factor Analysis
• E-step: distribution on h(skills), conditioned on x (grades)
• M-step: plug in distribution on h, and maximize w.r.t. parameters
41 Machine Learning for Computer Vision – Lecture 5
Principal Component Analysis (PCA)
• Find a low-dimensional subspace to reconstruct high-dimensional data • Reconstruction on orthogonal basis Approximation with K terms
42 Machine Learning for Computer Vision – Lecture 5
Relation with Factor Analysis?
• PCA criterion:
• Regularize solution
• Equivalently:
• Difference from FA:
• What we gain: no EM, factorization-based estimate of Λ, h • What we lose: proper probabilistic framework.
43 Machine Learning for Computer Vision – Lecture 5
Principal component analysis
• The k orthogonal directions that capture most of the data variance are the k leading (largest-eigenvalue) covariance eigenvectors
Factor Analysis PCA Λ matrix Leading K eigenvectors of covariance Hidden variables Inner product of data with eigenvectors
44 Machine Learning for Computer Vision – Lecture 5
PCA: decorrelation/dimensionality reduction • `Hidden variables’: projection onto eigenvectors of covariance matrix
Dimensionality reduction by using only leading eigenvectors
Grades in 60 courses -> Good in math, computer science
45 Machine Learning for Computer Vision – Lecture 5
Lecture outline
Bayes’ rule and generative models
Density estimation
Gaussian distributions
Mixture-of-Gaussian models
Expectation-Maximization algorithm and hidden variables
Factor Analysis & PCA
Parametric deformable models
Mixed Discrete/Continuous hidden variable models
46 Machine Learning for Computer Vision – Lecture 5
Continuous Hidden Variables: Factor Analysis
• Also known as Dimensionality Reduction
47 Machine Learning for Computer Vision – Lecture 5
Discrete hidden variables: Mixture of Gaussians
• Also known as Clustering
48 Machine Learning for Computer Vision – Lecture 5
• Consider shift as a hidden variable, l • Estimate model with EM
Transformation-resilient image averaging
Observed Image
Deformation-free image Shift
Input Plain mean & std
With transformation & EM
49 Machine Learning for Computer Vision – Lecture 5
• Latent variables for synthesis (continuous) • Latent variables for shift (discrete)
• Estimate mean basis using EM
Transformed Components Analysis
Plain mean & PCA
With offset Input
Samples of model
50 Machine Learning for Computer Vision – Lecture 5
• Latent variables for cluster (discrete) • Latent variables for shift (discrete)
Transformed Mixture of Gaussians
Input Plain Mixture-of-Gaussians
With offset
51 Machine Learning for Computer Vision – Lecture 5
Transformed Mixture of Gaussians
Plain Mixture-of-Gaussians
With offset
Input
52 Machine Learning for Computer Vision – Lecture 5
• Latent variables for cluster • Latent variables for components • Latent variables for shift
Mixture of Transformed Components
53 Machine Learning for Computer Vision – Lecture 5
Lecture outline
Bayes’ rule and generative models
Density estimation
Parametric deformable models
Eigenfaces Active appearance models
3D Morphable models
Statistical active shape models
54 Machine Learning for Computer Vision – Lecture 5
Example: bone contours
54
Task: localize anatomical structures
55 Machine Learning for Computer Vision – Lecture 5
Task: Analyze a hand radiograph
56 Machine Learning for Computer Vision – Lecture 5
Task: Analyze a hand radiograph
Assume: we are looking for proximal phalanx 2
PP3
PP2
PP4
PP5
MC2
MC3
MC4
MC5
MP5
MP4
MP3
MP2
57 Machine Learning for Computer Vision – Lecture 5
Analyzing a hand radiograph
We have a priori knowledge about the typical appearance: e.g. bone shapes and texture
PP2
How can we represent this knowledge? How can we exploit it?
58 Machine Learning for Computer Vision – Lecture 5
Statistical Shape Models
Each example is represented by a vector containing the coordinates of the landmarks.
Learning: Model Acquisition Inference: Model Fitting
59 Machine Learning for Computer Vision – Lecture 5
• Bone shapes: vectors in
• Goal: project data onto a low-dimensional linear subspace that best
explains their variation.
The space of all bone shapes
60 Machine Learning for Computer Vision – Lecture 5 New subspace: `better’ coordinate system
60 1. Active Shape Models
New coordinates reflect the distribution of the data. Few coordinates suffice to represent a high dimensional vector They can be viewed as parameters of a model
Mean
61 Machine Learning for Computer Vision – Lecture 5
Using PCA to model shape
+ + = +
62 Machine Learning for Computer Vision – Lecture 5
Active shape models (ASM)
• A set of training examples (images) • A set of landmarks, that are present on all images • Build a statistical model of shape variation (PCA) • Build a statistical model of the local texture (PCA) • Use the model for the search in a new image
63 Machine Learning for Computer Vision – Lecture 5
ASM search
Adjust to texture Fit to shape model
Initialize
64 Machine Learning for Computer Vision – Lecture 5
ASM search
64
65 Machine Learning for Computer Vision – Lecture 5
Lecture outline
Bayes’ rule and generative models
Density estimation
Parametric deformable models
Eigenfaces Active appearance models
3D Morphable models
Statistical active shape models
= +
µ + w1u1+w2u2+w3u3+w4u4+ … ^ x =
66 Machine Learning for Computer Vision – Lecture 5
Appearance modelling for faces • When viewed as vectors of pixel values, face images are extremely high-dimensional
– 100x100 image = 10,000 dimensions • Very few vectors correspond to valid face images
• Original coordinates are not revealing about face properties • We want to model the subspace (`manifold’) of face images
67 Machine Learning for Computer Vision – Lecture 5
Continuous Hidden Variables: Appearance Manifolds
x2
x1
xn
Lighting x Pose [Murase and Nayar 1993]
68 Machine Learning for Computer Vision – Lecture 5
Eigenfaces (Murase & Nayar, 91) • Training images • x1,…,xN
69 Machine Learning for Computer Vision – Lecture 5
Eigenfaces Top eigenvectors: u1,…uk
Mean: µ
70 Machine Learning for Computer Vision – Lecture 5
Eigenfaces Principal component (eigenvector) uk
µ + 3σkuk
µ – 3σkuk
71 Machine Learning for Computer Vision – Lecture 5
Eigenfaces example
• Face x in “face space” coordinates:
• Reconstruction:
= +
µ + w1u1+w2u2+w3u3+w4u4+ …
=
^ x =
72 Machine Learning for Computer Vision – Lecture 5
Limitations
• Global appearance method: not robust to misalignment, background variation
73 Machine Learning for Computer Vision – Lecture 5
Lecture outline
Bayes’ rule and generative models
Density estimation
Parametric deformable models
Eigenfaces Active appearance models
3D Morphable models
Statistical active shape models
74 Machine Learning for Computer Vision – Lecture 5
Active Appearance Models (AAMs)
Shape:
Appearance:
Synthesis:
I(S (x)) = T (x)
X S (X )
T emplate Ins tance
75 Machine Learning for Computer Vision – Lecture 5
Playing with the AAM parameters
First two modes of shape variation First two modes of gray-level variation
First four modes of appearance variation
76 Machine Learning for Computer Vision – Lecture 5
Active Appearance Model Search (Results)
77 Machine Learning for Computer Vision – Lecture 5
AAM Search
78 Machine Learning for Computer Vision – Lecture 5
Lecture outline
Bayes’ rule and generative models
Density estimation
Parametric deformable models
Eigenfaces Active appearance models
3D Morphable models
Statistical active shape models
79 Machine Learning for Computer Vision – Lecture 5
3-D surface acquisition
Laser Range Scanners Stereo Cameras Structured Light (Kinect) Photometric Stereo
80 Machine Learning for Computer Vision – Lecture 5
What can we do with 3d shape models?
[Blanz and Vetter 1999, 2003]
81 Machine Learning for Computer Vision – Lecture 5
Building a Morphable Face Model
[Blanz and Vetter 1999, 2003]
82 Machine Learning for Computer Vision – Lecture 5
3-D Morphable Models
[Blanz and Vetter 1999, 2003]
83 Machine Learning for Computer Vision – Lecture 5
3D Morphable models
Recover Shape
Synthesize new views
Synthesize new expressions
84 Machine Learning for Computer Vision – Lecture 5
• Rough manual initialization • Gradient descent to minimize reconstruction error functional
• And then
3-D Morphable Model fitting
85 Machine Learning for Computer Vision – Lecture 5
3D AAM for face tracking
CMU group: I. Matthews, S. Baker, R. Gross (230 Frames per second, 2004)
86 Machine Learning for Computer Vision – Lecture 5
86
3D AAM for face tracking
87 Machine Learning for Computer Vision – Lecture 5
Playing with Facial Attributes Several classes of attributes are modeled: • Facial expressions (smile, frown) • Individual characteristics (double chin, hooked nose, ‘maleness’) • Distinctiveness
88 Machine Learning for Computer Vision – Lecture 5
Manipulating Facial Attributes via Deformations • For each face in the database, two scans are recorded: Sneutral, and Sexpression. • The difference vector ΔS = Sexpression - Sneutral is saved and later on simply added to the
3D reconstruction of the input image.
89 Machine Learning for Computer Vision – Lecture 5
90 Machine Learning for Computer Vision – Lecture 5
90
91 Machine Learning for Computer Vision – Lecture 5
APPENDIX
x i ; yi ; i = 1 : : : Nx i
92 Machine Learning for Computer Vision – Lecture 5
Factor Analysis: Generative Model
• Hidden variables • Observations
– noise covariance matrix • Linear model
• Distribution of x
93 Machine Learning for Computer Vision – Lecture 5
Full observation distribution
• Consider covariance of x, h:
• Full observations
• Distribution
• We will need to write • Problem: non-diagonal matrix
94 Machine Learning for Computer Vision – Lecture 5
Block matrix diagonalization
¨ Schur Complement
95 Machine Learning for Computer Vision – Lecture 5
Factorizing a Gaussian distribution
96 Machine Learning for Computer Vision – Lecture 5
PCA criterion
• Minimize reconstruction error of training set
97 Machine Learning for Computer Vision – Lecture 5
Spectral Decomposition of a matrix
98 Machine Learning for Computer Vision – Lecture 5
Principal Component Analysis
• Given: N data points x1, … ,xN in Rd
• We want to find a new set of features that are linear combinations of original ones:
u(xi) = uT(xi – µ)
(µ: mean of data points)
• What unit vector u in Rd captures the most variance of the data?
99 Machine Learning for Computer Vision – Lecture 5
Principal Component Analysis • Variance of projection on u:
Projection of data point
Covariance matrix of data
The direction that maximizes the variance: the eigenvector associated with the largest eigenvalue of Σ
Direction: Unit norm vector