generative models for image understanding
Post on 22-Feb-2016
32 Views
Preview:
DESCRIPTION
TRANSCRIPT
Generative Models for Image Understanding
Nebojsa Jojic and Thomas HuangBeckman Institute and ECE Dept.
University of Illinois
Problem: Summarization of High Dimensional Data
• Pattern Analysis: – For several classes c=1,..,C of the data, define probability
distribution functions p(x| c)
• Compression: – Define a probabilistic model p(x) and devise an optimal coding
approach
• Video Summary: – Drop most of the frames in a video sequence and keep interesting
information that summarizes it.
Generative density modeling• Find a probability model that
– reflects desired structure– randomly generates plausible images, – represents the data by parameters
• ML estimation
• p(image|class) used for recognition, detection, ...
Problems we attacked• Transformation as a discrete variable in
generative models of intensity images• Tracking articulated objects in dense stereo
maps• Unsupervised learning for video summary
• Idea - the structure of the generative model reveals the interesting objects we want to extract.
Mixture of Gaussians
c
z
The probability of pixel intensities z given that the image is from cluster c is p(z|c) = N(z; c , c)
P(c) = c
Mixture of Gaussians
cP(c) = c
zp(z|c) = N(z; c , c)
• Parameters c, c and c represent the data
• For input z, the cluster responsibilities are
P(c|z) = p(z|c)P(c) / c p(z|c)P(c)
Example: Simulation
c=1P(c) = c
z=
p(z|c) = N(z; c , c)
1= 0.6,
2= 0.4,
Example: Simulation
c=2P(c) = c
z=
p(z|c) = N(z; c , c)
1= 0.6,
2= 0.4,
Example: Learning - E step
c=1
Images from data set
z=
c=2
P(c|z)
c0.52
0.48
1= 0.5,
2= 0.5,
Example: Learning - E step
Images from data set
z=
cc=1
c=2
P(c|z)0.48
0.52
1= 0.5,
2= 0.5,
Example: Learning - M step
c
1= 0.5,
2= 0.5,
zSet 1 to the average of zP(c=1|z)
Set 2 to the average of zP(c=2|z)
Example: Learning - M step
c
1= 0.5,
2= 0.5,
zSet 1 to the average of
diag((z-1)T (z-1))P(c=1|z)Set 2 to the average of
diag((z-2)T (z-2))P(c=2|z)
Transformation as a Discrete Latent Variable
withBrendan J. Frey
Computer Science, University of Waterloo, CanadaBeckman Institute & ECE, Univ of Illinois at Urbana
Kind of data we’re interested in
Even after tracking, the features still have unknown positions, rotations, scales, levels of shearing, ...
Oneapproach
Normalization
PatternAnalysis
Images
Normalizedimages
Labor
Ourapproach
Joint Normalization
andPattern Analysis
Images
• A continuous transformation moves an image, , along a continuous curve
• Our subspace model should assign images near this nonlinear manifold to the same point in the subspace
What transforming an image does in the vector space of pixel intensities
Tractable approaches to modeling the transformation manifold
\ Linear approximation - good locally
• Discrete approximation - good globally
Adding “transformation” as a discrete latent variable
• Say there are N pixels
• We assume we are given a set of sparse N x N transformation generating matrices G1,…,Gl ,…,GL
• These generate points from point
Transformed Mixture of Gaussians
• l, c, c and c represent the data
• The cluster/transf responsibilities, P(c,l|x), are quite easy to compute
p(x|z,l) = N(x; Gl z , )
x
P(l) = l l
p(z|c) = N(z; c , c)
c
z
P(c) = c
Example: Simulation
l=1
c=1
G1 = shift left and up, G2 = I, G3 = shift right and up
z=
x=
ML estimation of a Transformed Mixture of Gaussians using EM
x
l
c
z
• E step: Compute P(l|x), P(c|x) and p(z|c,x) for each x in data
• M step: Set– c = avg of P(c|x)
– l = avg of P(l|x)
– c = avg mean of p(z|c,x)
– c = avg variance of p(z|c,x)
– = avg var of p(x-Gl z|x)
Face ClusteringExamples of 400 outdoor images of 2 people
(44 x 28 pixels)
Mixture of Gaussians15 iterations of EM (MATLAB takes 1 minute)
Cluster meansc = 1 c = 2 c = 3 c = 4
30 iterations of EM
Cluster meansc = 1 c = 2 c = 3 c = 4
Transformed mixture of Gaussians
Video Analysis Using Generative Models
with Brendan Frey, Nemanja Petrovic and Thomas Huang
Idea
• Use generative models of video sequences to do unsupervised learning
• Use the resulting model for video summarization, filtering, stabilization, recognition of objects, retrieval, etc.
Transformed Hidden Markov Model
x
l
c
z
x
l
c
z
tt-1
P(c,l|past)
THMM Transition Models
• Independent probability distributions for class and transformations; relative motion
P(ct , lt | past)= P(ct | ct-1) P(d(lt , l t-1))
• Relative motion dependent on the classP(ct , lt | past)= P(ct | ct-1) P(d(lt , l t-1) | ct)
• Autoregressive model for transformation distribution
Inference in THMM
• Tasks:– Find the most likely state at time t given the
whole observed sequence {xt} and the model parameters (class means and variances, transition probabilities, etc.)
– Find the distribution over states for each time t– Find the most likely state sequence– Learn the parameters that maximize he
likelihood of the observed data
Video Summary and Filtering
x
l
c
z
p(x|z,l) = N(x; Gl z , )
p(z|c) = N(z; c , c) Video summary
Image segmentation
Removal of sensor noise
Image Stabilization
Example: Learning
• Hand-held camera• Moving subject• Cluttered backgroundDATA
c 1 class
121 translations (11 vertical and 11 horizontal shifts)
c5 classes
c
c
Examples
• Normalized sequence
• Simulated sequence
• De-noising
• Seeing through distractions
Future work
• Fast approximate learning and inference
• Multiple layers
• Learning transformations from images
Nebojsa Jojic: www.ifp.uiuc.edu/~jojic
Subspace models of imagesExample: Image, R 1200 = f (y, R 2)
Frown
Shut eyes
y
z
The density of pixel intensitiesz given subspace pointy is p(z|y) = N(z; +y, )
p(y) = N(y; 0, I)
Factor analysis (generative PCA)
Manifold: f (y) = +y, linear
• Parameters , represent the manifold• Observing z induces a Gaussian p(y|z):
COV[y|z] = (I)
E[y|z] = COV[y|z] z
y
z
p(z|y) = N(z; +y, )
p(y) = N(y; 0, I)
Factor analysis (generative PCA)
Example: Simulation
Shut
eye
s
Frow
n=
y
z
p(z|y) = N(z; +y, )
p(y) = N(y; 0, I) Frn
SE =
Example: Simulation
Shut
eye
s
Frow
n=
y
z
p(z|y) = N(z; +y, )
p(y) = N(y; 0, I) Frn
SE =
Example: Simulation
Shut
eye
s
Frow
n=
y
z
p(z|y) = N(z; +y, )
p(y) = N(y; 0, I) Frn
SE =
y
z
p(z|y) = N(z; +y, )
Transformed Component Analysis
lP(l) = l
p(y) = N(y; 0, I)
The probability of observedimage x is p(x|z,l) = N(x; Gl z , )
x
Example: Simulation
Shut
eye
s
Frow
n=
=
G1 = shift left & up, G2 = I, G3 = shift right & up
zl=3
yFrn
SE
x
Example: InferenceG1 = shift left & up, G2 = I, G3 = shift right & up
zl=3
x
yFrn
SE
zl=2
x
yFrn
SE
zl=1
x
yFrn
SE
Garbage
Garbage
P(l=1|x) =
P(l=3|x) =
P(l=2|x) =
EM algorithm for TCA• Initialize , , , to random values • E Step
– For each training case x(t), infer
q(t)(l,z,y) = p(l,z,y |x(t))• M Step
– Compute new,new, new,new,new to maximize
t E[ log p(y) p(z|y) P(l) p(x(t)|z,l)],
where E[] is wrt q(t)(l,z,y) • Each iteration increases log p(Data)
A tough toy problem• 144, 9 x 9 images• 1 shape (pyramid)• 3-D lighting• cluttered background
• 25 possible locations
1st 8 principal components:
TCA:
• 3 components• 81 transformations
- 9 horiz shifts- 9 vert shifts
• 10 iters of EM
• Model generates realistic examples
:1:2 :3
Expression modeling
• 100 16 x 24 training images
• variation in expression
• imperfect alignment
PCA: Mean + 1st 10 principal components
Factor Analysis: Mean + 10 factors after 70 its of EM
TCA: Mean + 10 factors after 70 its of EM
Fantasies from FA model Fantasies from TCA model
Modeling handwritten digits
• 200 8 x 8 images of each digit
• preprocessing normalizes vert/horiz translation and scale
• different writing angles (shearing) - see “7”
TCA: - 29 shearing + translation combinations - 10 components per digit - 30 iterations of EM per digit
Mean of each digitTransformed means
FA: Mean + 10 components per digit
TCA: Mean + 10 components per digit
Classification Performance• Training: 200 cases/digit, 20 components, 50 EM iters
• Testing: 1000 cases, p(x|class) used for classification
• Results:
Method Error ratek-nearest neighbors (optimized k) 7.6%Factor analysis 3.2%Tranformed component analysis 2.7%
• Bonus: P(l|x) infers the writing angle!
Wrap-up• Papers, MATLAB scripts:
www.ifp.uiuc.edu/~jojicwww.cs.uwaterloo.ca/~frey
• Other domains: audio, bioinfomatics, …
• Other latent image models, p(z)– mixtures of factor analyzers (NIPS99)– layers, multiple objects, occlusions– time series (in preparation)
Wrap-up• Discrete+Linear Combination: Set some
components equal to derivatives of wrt transformations
• Multiresolution approach
• Fast variational methods, belief propagation,...
Other generative models
• Modeling human appearance in stereo images: articulated, self-occluding Gaussians
top related