![Page 1: Real-time on-line learning of transformed hidden Markov models Nemanja Petrovic, Nebojsa Jojic, Brendan Frey and Thomas Huang Microsoft, University of](https://reader035.vdocuments.us/reader035/viewer/2022062618/5513f1745503463a298b6027/html5/thumbnails/1.jpg)
Real-time on-line learning of transformed hidden Markov models
Nemanja Petrovic, Nebojsa Jojic, Brendan Frey and Thomas Huang
Microsoft, University of Toronto, University of Illinois
![Page 2: Real-time on-line learning of transformed hidden Markov models Nemanja Petrovic, Nebojsa Jojic, Brendan Frey and Thomas Huang Microsoft, University of](https://reader035.vdocuments.us/reader035/viewer/2022062618/5513f1745503463a298b6027/html5/thumbnails/2.jpg)
2
Six break points vs. six things in video
• Traditional video segmentation: Find breakpoints Example: MovieMaker (cut and paste)
• Our goal: Find possibly recurring scenes or objects
1 32 42 1 4 3 2 3 2 3 5 6
timeline
REPRESENTATIVE FRAMES
![Page 3: Real-time on-line learning of transformed hidden Markov models Nemanja Petrovic, Nebojsa Jojic, Brendan Frey and Thomas Huang Microsoft, University of](https://reader035.vdocuments.us/reader035/viewer/2022062618/5513f1745503463a298b6027/html5/thumbnails/3.jpg)
3
Transformed hidden Markov model
zT
x
c Class with prior P(c=k) = πk
P(z|c) = N(z;μc,Φc)
x = Tz
Ex = Tμ
Var x = TΦTT
p(x|c,T) = N(x; Tμc, TΦcTT)
Generation is repeated for each frame of the sequence, with the pair (T,c) being the state of a Markov chain.
Translation T with uniform prior
Latent image z
Observed frame x
![Page 4: Real-time on-line learning of transformed hidden Markov models Nemanja Petrovic, Nebojsa Jojic, Brendan Frey and Thomas Huang Microsoft, University of](https://reader035.vdocuments.us/reader035/viewer/2022062618/5513f1745503463a298b6027/html5/thumbnails/4.jpg)
4
Goal: maximize total likelihood of a dataset
log p(X) = log Σ{T,c} Σz p(X,{T,c},Z)
= log Σ{T,c} Σz q({T,c},Z)p(X,{T,c},Z)/q({T,c},Z)
≥ Σ{T,c} Σz q({T,c},Z)log p(X,{T,c},Z)
- Σ{T,c} Σz q({T,c},Z)log q({T,c},Z) = B
We express q(T,c,z) = q({T,c}) * q(Z|{T,c})
{T,c} represents values of transformation and class for all frames, i.e., the path that the video sequence takes through the state space of the model.
Instead of the likelihood, we optimize the bound B, which is tight for q=p({T,c},Z|X)
![Page 5: Real-time on-line learning of transformed hidden Markov models Nemanja Petrovic, Nebojsa Jojic, Brendan Frey and Thomas Huang Microsoft, University of](https://reader035.vdocuments.us/reader035/viewer/2022062618/5513f1745503463a298b6027/html5/thumbnails/5.jpg)
5
Posterior approximation
We allow q({T,c}) to have a non-zero probability only on M most probable paths:
q({T,c}) = Σm=1:M rmδ({T,c} - {T,c}*m)
(Viterbi 1982)
This reduces a number of problems with adaptive scaling in the exact forward-backward inference.
![Page 6: Real-time on-line learning of transformed hidden Markov models Nemanja Petrovic, Nebojsa Jojic, Brendan Frey and Thomas Huang Microsoft, University of](https://reader035.vdocuments.us/reader035/viewer/2022062618/5513f1745503463a298b6027/html5/thumbnails/6.jpg)
6
Expensive part of the E stepFind quick way to calculatelog p(x|c,T) = -N/2 log(2π) – ½ log|TΦcTT| - ½ (x-Tμc)T(TΦcTT)-1(x-Tμc) for all possible shifts T in the E step of EM algorithm
Shifted cluster mean Tμ
Frame x
T
log p
![Page 7: Real-time on-line learning of transformed hidden Markov models Nemanja Petrovic, Nebojsa Jojic, Brendan Frey and Thomas Huang Microsoft, University of](https://reader035.vdocuments.us/reader035/viewer/2022062618/5513f1745503463a298b6027/html5/thumbnails/7.jpg)
7
Computing Mahalanobis distance using FFTs
= Σ .* T
= sum x.*Tμ
Φ
= IFFT FFT(x) .* conj( FFT )
xT(TΦTT)-1Tμ = xTT(Φ-1μ) = xTT(diag Φ-1 .* μ)
All terms that have to be evaluated for all T can be expressed as correlations, e.g. :
μ
Φ
(where summation is over pixels)
N log N versus N2 !
![Page 8: Real-time on-line learning of transformed hidden Markov models Nemanja Petrovic, Nebojsa Jojic, Brendan Frey and Thomas Huang Microsoft, University of](https://reader035.vdocuments.us/reader035/viewer/2022062618/5513f1745503463a298b6027/html5/thumbnails/8.jpg)
8
Parameter optimization
F = ΣT Σc Σz q(T,c,z)log p(X,T,c,z)
= ΣT Σc Σz q({T,c}) * q(z|{T,c}) x
(logπ{Tc} + Σtimelog p(xt,zt|Tt,ct)
+ Σtimelog p(ct+1|ct) log p(Tt+1|Tt,ct))
Solve ∂F/∂()=0 for an estimated q.
![Page 9: Real-time on-line learning of transformed hidden Markov models Nemanja Petrovic, Nebojsa Jojic, Brendan Frey and Thomas Huang Microsoft, University of](https://reader035.vdocuments.us/reader035/viewer/2022062618/5513f1745503463a298b6027/html5/thumbnails/9.jpg)
9
On-line vs. batch EM
Example:Update equation for the class mean
Σt ΣT q(Tt,ct)E[z|xt,ct,Tt] = Σtq(ct) μct
Batch EM:– solve for μ using all frames.– Inference and parameter optimization iterated.
On-line EM:– rewrite the equation for one extra frame– establish the relationsip between μ(t+1) and μ(t). – Parameters updated after each frame. No need for iteration.
![Page 10: Real-time on-line learning of transformed hidden Markov models Nemanja Petrovic, Nebojsa Jojic, Brendan Frey and Thomas Huang Microsoft, University of](https://reader035.vdocuments.us/reader035/viewer/2022062618/5513f1745503463a298b6027/html5/thumbnails/10.jpg)
10
Reducing the complexity of the M step
ΣT q(Tt,ct)E[z|xt,ct,Tt] can be expressed as a sum of convolutions.
For example, when there is no observation noise, E[z|xt,ct,Tt ]= Tt
Txt, and
ΣT q(Tt,ct)E[z|xt,ct,Tt] = IFFT (FFT(q).* FFT(x))
(similar trick applies to variance estiamates)
![Page 11: Real-time on-line learning of transformed hidden Markov models Nemanja Petrovic, Nebojsa Jojic, Brendan Frey and Thomas Huang Microsoft, University of](https://reader035.vdocuments.us/reader035/viewer/2022062618/5513f1745503463a298b6027/html5/thumbnails/11.jpg)
11
Represent pixels on a polar grid!
Shifts in the log-polar
coordinates correspond
to scale and rotation
changes in the Cartesian
coordiante system
How to deal with scale and rotation?
rotation
scale
![Page 12: Real-time on-line learning of transformed hidden Markov models Nemanja Petrovic, Nebojsa Jojic, Brendan Frey and Thomas Huang Microsoft, University of](https://reader035.vdocuments.us/reader035/viewer/2022062618/5513f1745503463a298b6027/html5/thumbnails/12.jpg)
12
Estimating the number of classes
• The algorithm is initialized with a single class
• A new class is introduced whenever the frame likelihood drops bellow a threshold
• The classes can be merged in the end to achieve a more compact representation
![Page 13: Real-time on-line learning of transformed hidden Markov models Nemanja Petrovic, Nebojsa Jojic, Brendan Frey and Thomas Huang Microsoft, University of](https://reader035.vdocuments.us/reader035/viewer/2022062618/5513f1745503463a298b6027/html5/thumbnails/13.jpg)
13
Clustering a 20-minute whale watching video
![Page 14: Real-time on-line learning of transformed hidden Markov models Nemanja Petrovic, Nebojsa Jojic, Brendan Frey and Thomas Huang Microsoft, University of](https://reader035.vdocuments.us/reader035/viewer/2022062618/5513f1745503463a298b6027/html5/thumbnails/14.jpg)
Clustering a 20-minute beach video
![Page 15: Real-time on-line learning of transformed hidden Markov models Nemanja Petrovic, Nebojsa Jojic, Brendan Frey and Thomas Huang Microsoft, University of](https://reader035.vdocuments.us/reader035/viewer/2022062618/5513f1745503463a298b6027/html5/thumbnails/15.jpg)
15
0 min 9 min
Shots from the first class
![Page 16: Real-time on-line learning of transformed hidden Markov models Nemanja Petrovic, Nebojsa Jojic, Brendan Frey and Thomas Huang Microsoft, University of](https://reader035.vdocuments.us/reader035/viewer/2022062618/5513f1745503463a298b6027/html5/thumbnails/16.jpg)
16
Discovering objects using motion priors
Different motion prior predefined for each of the classes
Three characteristic frames from 240x320 input sequence
Learned means and variances
![Page 17: Real-time on-line learning of transformed hidden Markov models Nemanja Petrovic, Nebojsa Jojic, Brendan Frey and Thomas Huang Microsoft, University of](https://reader035.vdocuments.us/reader035/viewer/2022062618/5513f1745503463a298b6027/html5/thumbnails/17.jpg)
17
Tracking results
![Page 18: Real-time on-line learning of transformed hidden Markov models Nemanja Petrovic, Nebojsa Jojic, Brendan Frey and Thomas Huang Microsoft, University of](https://reader035.vdocuments.us/reader035/viewer/2022062618/5513f1745503463a298b6027/html5/thumbnails/18.jpg)
18
Summary
Before - CVPR 99/00
28x44 images
Grayscale images
1 day of computation for 15 sec video
Batch EM
Exact inference
Fixed number of clusters
Limited number of translations
Memory inefficient
Now
120x160 images
Full color images
5-10 frames/sec
On-line EM
Approximate inference
Variable number of clusters
All possible translations
Memory efficient
![Page 19: Real-time on-line learning of transformed hidden Markov models Nemanja Petrovic, Nebojsa Jojic, Brendan Frey and Thomas Huang Microsoft, University of](https://reader035.vdocuments.us/reader035/viewer/2022062618/5513f1745503463a298b6027/html5/thumbnails/19.jpg)
19
Sneak preview: Panoramic THMMs
zT
x
c P(c=k) = πk
P(z|c) = N(z;μc,Φc)
x = WTz
Ex = WTμ
Var x = WTΦTTWT
p(x|c,T) = N(x; WTμc, WTΦcTTWT)
WT
![Page 20: Real-time on-line learning of transformed hidden Markov models Nemanja Petrovic, Nebojsa Jojic, Brendan Frey and Thomas Huang Microsoft, University of](https://reader035.vdocuments.us/reader035/viewer/2022062618/5513f1745503463a298b6027/html5/thumbnails/20.jpg)
20
Video clustering - model
• Appearance meanvariance
• Camera/object motion
• Temporal constraints
• Unsupervised learning – the only input is the video
![Page 21: Real-time on-line learning of transformed hidden Markov models Nemanja Petrovic, Nebojsa Jojic, Brendan Frey and Thomas Huang Microsoft, University of](https://reader035.vdocuments.us/reader035/viewer/2022062618/5513f1745503463a298b6027/html5/thumbnails/21.jpg)
21
Current implementation
• DirectShow filter for frame clustering (5-15 frames/sec!)
• Translation invariance• On-line learning algorithm• Classes repeating across video• Potential applications:
– Video segmentation– Content based search/retrieval– Short video summary creation– DVD chapter creation
![Page 22: Real-time on-line learning of transformed hidden Markov models Nemanja Petrovic, Nebojsa Jojic, Brendan Frey and Thomas Huang Microsoft, University of](https://reader035.vdocuments.us/reader035/viewer/2022062618/5513f1745503463a298b6027/html5/thumbnails/22.jpg)
22
Comparing with layered sprites
“Perfect” segmentation
Layered sprites. Jojic, CVPR 01
But, THMM is hundreds/thousands of time faster!
![Page 23: Real-time on-line learning of transformed hidden Markov models Nemanja Petrovic, Nebojsa Jojic, Brendan Frey and Thomas Huang Microsoft, University of](https://reader035.vdocuments.us/reader035/viewer/2022062618/5513f1745503463a298b6027/html5/thumbnails/23.jpg)
Example with more content