component analysis for pr & hsbcc(dc) b = − ∂ ∂ (lee & seung, 1999;lee & seung,...
TRANSCRIPT
1
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 7
Component Analysis for PR & HS
• Computer Vision & Image Processing– Structure from motion.
– Spectral graph methods for segmentation.
– Appearance and shape models.
– Fundamental matrix estimation and calibration.
– Compression.
– Classification.
– Dimensionality reduction and visualization.
• Signal Processing– Spectral estimation, system identification (e.g. Kalman filter), sensor
array processing (e.g. cocktail problem, eco cancellation), blind source separation, -
• Computer Graphics– Compression (BRDF), synthesis,-
• Speech, bioinformatics, combinatorial problems.
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 8
• Computer Vision & Image Processing– Structure from motion.
– Spectral graph methods for segmentation.
– Appearance and shape models.
– Fundamental matrix estimation and calibration.
– Compression.
– Classification.
– Dimensionality reduction and visualization.
• Signal Processing– Spectral estimation, system identification (e.g. Kalman filter), sensor
array processing (e.g. cocktail problem, eco cancellation), blind source separation, -
• Computer Graphics– Compression (BRDF), synthesis,-
• Speech, bioinformatics, combinatorial problems.
Structure from motion
Component Analysis for PR & HS
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 9
• Computer Vision & Image Processing– Structure from motion.
– Spectral graph methods for segmentation.
– Appearance and shape models.
– Fundamental matrix estimation and calibration.
– Compression.
– Classification.
– Dimensionality reduction and visualization.
• Signal Processing– Spectral estimation, system identification (e.g. Kalman filter), sensor
array processing (e.g. cocktail problem, eco cancellation), blind source separation, -
• Computer Graphics– Compression (BRDF), synthesis,-
• Speech, bioinformatics, combinatorial problems.
Spectral graph methods for segmentation.
Component Analysis for PR & HS
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 10
• Computer Vision & Image Processing– Structure from motion.
– Spectral graph methods for segmentation.
– Appearance and shape models.
– Fundamental matrix estimation and calibration.
– Compression.
– Classification.
– Dimensionality reduction and visualization.
• Signal Processing– Spectral estimation, system identification (e.g. Kalman filter), sensor
array processing (e.g. cocktail problem, eco cancellation), blind source separation, -
• Computer Graphics– Compression (BRDF), synthesis,-
• Speech, bioinformatics, combinatorial problems.
Appearance and shape models
Component Analysis for PR & HS
2
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 11
• Computer Vision & Image Processing– Structure from motion.
– Spectral graph methods for segmentation.
– Appearance and shape models.
– Fundamental matrix estimation and calibration.
– Compression.
– Classification.
– Dimensionality reduction and visualization.
• Signal Processing– Spectral estimation, system identification (e.g. Kalman filter), sensor
array processing (e.g. cocktail problem, eco cancellation), blind source separation, -
• Computer Graphics– Compression (BRDF), synthesis,-
• Speech, bioinformatics, combinatorial problems.
Dimensionality reduction and visualization
Component Analysis for PR & HS
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 12
• Computer Vision & Image Processing– Structure from motion.
– Spectral graph methods for segmentation.
– Appearance and shape models.
– Fundamental matrix estimation and calibration.
– Compression.
– Classification.
– Dimensionality reduction and visualization.
• Signal Processing– Spectral estimation, system identification (e.g. Kalman filter), sensor
array processing (e.g. cocktail problem, eco cancellation), blind source separation, -
• Computer Graphics– Compression (BRDF), synthesis,-
• Speech, bioinformatics, combinatorial problems.
cocktail problem
Component Analysis for PR & HS
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 13
Independent Component Analysis (ICA)
Sound
Source 1
Sound
Source 2
Mixture 1
Mixture 2
Output 1
Output 2
I
C
A
Sound
Source 3
Mixture 3
Output 3
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 14
Why CA for PR & HS?
• Learn from high dimensional data and few samples.– Useful for dimensionality reduction specially when functions are
smooth.
features samples
• Efficient methods O( d n< <n2 )
(Everitt,1984)
• Easy to formulate, to solve and to extend
– Non-linearities (Kernel methods) (Scholkopf & Smola,2002; Shawe-Taylor &
Cristianini,2004)
– Probabilistic (latent variable models)
– Multi-factorial (tensors) (Paatero & Tapper, 1994 ;O’Leary & Peleg,1983; Vasilescu & Terzopoulos,2002; Vasilescu & Terzopoulos,2003)
– Exponential family PCA (Gordon,2002; Collins et al. 01)
• Natural geometric interpretation
3
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 15
Are CA methods popular/useful/used?
• Still work to do
– Results 1 - 10 of about 83,000 for “Spanish crisis"
– Results 1 - 10 of about 287,000,000 for "Britney Spears"
• About 28% of CVPR-07 papers use CA.
• Google:– Results 1 - 10 of about 1,870,000 for "principal component
analysis".
– Results 1 - 10 of about 506,000 for "independent componentanalysis"
– Results 1 - 10 of about 273,000 for "linear discriminantanalysis"
– Results 1 - 10 of about 46,100 for "negative matrix
factorization"
– Results 1 - 10 of about 491,000 for "kernel methods"
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 16
Outline
• Introduction (15 min)
• Generative models (40 min)– (PCA, k-means, spectral clustering, NMF, ICA, MDS)
• Discriminative models (40 min)– (LDA, SVM, OCA, CCA)
• Standard extensions of linear models (30 min)– (Kernel methods, Latent variable models, Tensor
factorization )
• Unified view (20 min)
Generative models (40 min)(PCA, k-means, spectral clustering, NMF, ICA, MDS)
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 17
Generative models
• Principal Component Analysis/Singular Value Decomposition
• Non-Negative Matrix Factorization
• Independent Component Analysis
• K-means and spectral clustering
• Multi-dimensional Scaling
BCD ≈
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 18
Principal Component Analysis (PCA)
• PCA finds the directions of maximum variation of the data
• PCA decorrelates the original variables
(Pearson, 1901; Hotelling, 1933;Mardia et al., 1979; Jolliffe, 1986; Diamantaras, 1996)
+*+=+
-*- =+
=Σ
9.075.0
75.08.0
1x
2x
=Σ
1.10
01.2+*+=+
-*- =+
+*-=-+*-=-
4
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 19
PCA
kccc ++++≈ ......21µ
kbbb
21
[ ] T
nn µ1BCdddD +≈= ...21
d=pixels
n= images
1××
××
ℜ∈ℜ∈
ℜ∈ℜ∈dnk
kdnd
µC
BD
•Assuming zero mean data, the basis B that preserve themaximum variation of the signal is given by the eigenvectorsof DDT.
B ΛBDD =Td
d=pixels
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 20
Snap-shot method & SVD• If d>>n (e.g., images 100*100 vs. 300 samples) no DDT.
• DDT and DTD have the same eigenvalues (energy) and
related eigenvectors (by D).
• B is a linear combination of the data!
• [α,L]=eig(DTD) B=D α(diag(diag(L))) -0.5
ΛDαDDαDDDBΛBDD TTTT ==
SVDPCA
diagonal
nnnkkd
T
ΣIVVIUU
VΣU
VUΣD
TT ==
ℜ∈ℜ∈ℜ∈
=×××
• SVD factorizes the data matrix D as:
Λ==
ℜ∈ℜ∈
=××
TT CCIBB
CB
BCD
nkkd
TT UUΛDD =
TT VVΛDD =
(Beltrami, 1873; Schmidt, 1907; Golub & Loan, 1989)
(Sirovich, 1987)DαB =
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 21
Error function for PCA
(Eckardt & Young, 1936; Gabriel & Zamir, 1979; Baldi & Hornik, 1989; Shum et al.,
1995; De la Torre & Black, 2003a)
• Not unique solution: kk×− ℜ∈= RBCCBRR 1 (De la Torre 2012)
F
n
i
iiE BCDBcdCB, −=−=∑=1
2
21 )(
• PCA minimizes the following function:
(Baldi & Hornik, 1989)
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 22
PCA/SVD in Computer Vision• PCA/SVD has been applied to:
– Recognition (eigenfaces:Turk & Pentland, 1991; Sirovich & Kirby, 1987; Leonardis & Bischof, 2000; Gong et al., 2000; McKenna et al., 1997a)
– Parameterized motion models (Yacoob & Black, 1999; Black et al., 2000; Black, 1999; Black & Jepson, 1998)
– Appearance/shape models (Cootes & Taylor, 2001; Cootes et al., 1998; Pentland et al., 1994; Jones & Poggio, 1998; Casia & Sclaroff, 1999; Black & Jepson, 1998; Blanz & Vetter, 1999; Cootes et al., 1995; McKenna et al., 1997; de la Torre et al., 1998b; de la Torre et al., 1998b)
– Dynamic appearance models (Soatto et al., 2001; Rao, 1997; Orriols & Binefa, 2001; Gong et al., 2000)
– Structure from Motion (Tomasi & Kanade, 1992; Bregler et al., 2000; Sturm & Triggs, 1996; Brand, 2001)
– Illumination based reconstruction (Hayakawa, 1994)
– Visual servoing (Murase & Nayar, 1995; Murase & Nayar, 1994)
– Visual correspondence (Zhang et al., 1995; Jones & Malik, 1992)
– Camera motion estimation (Hartley, 1992; Hartley & Zisserman, 2000)
– Image watermarking (Liu & Tan, 2000)
– Signal processing (Moonen & de Moor, 1995)
– Neural approaches (Oja, 1982; Sanger, 1989; Xu, 1993)
– Bilinear models (Tenenbaum & Freeman, 2000; Marimont & Wandell, 1992)
– Direct extensions (Welling et al., 2003; Penev & Atick, 1996)
5
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 23
Generative models
• Principal Component Analysis/Singular Value Decomposition
• Non-Negative Matrix Factorization
• Independent Component Analysis
• K-means and spectral clustering
• Multi-dimensional Scaling
BCD ≈
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 24
“Intercorrelations among
variables are the bane of the
multivariate researcher’s struggle
for meaning”Cooley and Lohnes, 1971
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 25
Part-based representation
�The firing rates of neurons are never negative
� Independent representations
NMF & ICA
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 26
Non-negative Matrix Factorization (NMF)
• Positive factorization.
• Leads to part-based representation.
0||||)( ≥−= CB,BCDCB, FE
(Lee & Seung, 1999)
6
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR
NMF
• Multiplicative algorithm can be interpreted as
diagonally rescaled gradient descent
∑ −=≥≥
ij
ijijdF2
0,0)(min BC
CB
ij
ij
ijij)(
)(
BVB
DBCC
T
T
←
Inference:
Learning:
ij
T
ij
T
ijij)(
)(
BCC
DCBB ←
Derivatives:
ijij
ij
F)()( CBBCB
C
TT −=∂∂
ij
T
ij
T
ij
F)()( DCBCC
B−=
∂∂
(Lee & Seung, 1999;Lee & Seung, 2000)
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 28
Independent Component Analysis (ICA)
• We need more than second order statistics to represent
the signal
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 29
ICA
• Look for si that are independent
• PCA finds uncorrelated variables
• For Gaussian distributions independence and
uncorrelated is the same
• Uncorrelated E(sisj)= E(si)E(sj)
• Independent E(g(si)f(sj))= E(g(si))E(f(sj)) for any non-
linear f,g
1−≈=≈= BWWDSCBCD
(Hyvrinen et al., 2001)
PCA ICA
S=(1,0)
S=(0,1)
S=(0.5, 0.5)
S=(-0.5, -0.5)
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 30
ICA vs PCA(Hyvrinen et al., 2001)
7
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 31
Many optimization criteria
• Minimize high order moments: e.g. kurtosis
kurt(W) = E{s4} -3(E{s2}) 2
• Many other information criteria.
∑∑==
+−n
i
i
n
i
ii S11
)(cBcdSparseness (e.g. S=| |)
(Olhausen & Field, 1996)
(Chennubhotla & Jepson, 2001b; Zou et al., 2005; dAspremont et al., 2004;)
• Also an error function:
• Other sparse PCA.
WDS =
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 32
Basis of natural images
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 33
Denoising
Original
image Noisy Image
(30% noise)
Denoise
(Wiener filter) ICA
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 34
Generative models
• Principal Component Analysis/Singular Value Decomposition
• Non-Negative Matrix Factorization
• Independent Component Analysis
• K-means and spectral clustering
• Multi-dimensional Scaling
BCD ≈
8
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR
The clustering problem
• Partition the data set in c-disjoint “clusters” of data points.
12
1
104
21)1(
1),( ≈
=
=
−= ∑
= c
ni
i
c
ccnS n
c
i
c
• Number of possible partitions
• NP-hard and approximate algorithms (k-means, hierarchical
clustering, mog, -)
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR
K-means
F
TE ||||),(0 MGDGM −=(Ding et al., ‘02, Torre et al ‘06)
=TMGxy
1
2
3
4
5
6
7
8
9
10
x
y
=TG
xy=D
=Mxy
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR
Spectral Clustering
37
Affinity Matrix
(Dhillon et al., ‘04, Zass & Shashua, 2005; Ding et al., 2005, De la Torre et al ‘06)
Eigenvectors
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 38
Generative models
• Principal Component Analysis/Singular Value Decomposition
• Non-Negative Matrix Factorization
• Independent Component Analysis
• K-means and spectral clustering
• Multi-dimensional Scaling
BCD ≈
9
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR
Multi-Dimensional Scaling (MDS)
• MDS takes a matrix of pair-wise distances and finds
an embedding that preserves the inter-point
distances.
39
Pair-wise distances
of US cities
Spatial layout of cities
in an embedded space
North/South
East/West
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR
∑ ∑ −i j
ijij dfn
2
},,{
])([min1
δyy K
Pair-wise distances
in original data
space (Given)
Pair-wise distances in
the embedded space
(Unknown)
2
2jiijd yy −=
MDS (II)
Original data space Embedded space
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 41
MDS (III)
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 42
Outline
• Introduction (15 min)
• Generative models (40 min)– (PCA, k-means, spectral clustering, NMF, ICA, MDS)
• Discriminative models (40 min)– (LDA, SVM, OCA, CCA)
• Standard extensions of linear models (30 min)– (Kernel methods, Latent variable models, Tensor
factorization )
• Unified view (20 min)
10
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 43
Discriminative models
• Linear Discriminant Analysis (LDA)
• Support Vector Machines (SVM)
• Oriented Component Analysis (OCA)
• Canonical Correlation Analysis (CCA)
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 44
Linear Discriminant Analysis (LDA)
• Optimal linear dimensionality reduction if classes are
Gaussian with equal covariance matrix.
BΛSBSBSB
BSBB b
bt
t
T
T
J ==||
||)(
∑∑= =
−−=C
i
C
j
T
jijib
1 1
))(( µµµµS
(Fisher, 1938;Mardia et al., 1979; Bishop, 1995)
∑=
==n
i
T
ii
T
t
1
ddDDS
T
ji
c
j
C
i
jiw
i
)()(1 1
µdµdS −−=∑∑= =
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR
Support Vector Machines (SVM)
• Linear classifier
45
denotes +1
denotes -1
How would you
classify this data?
• Infinite amount lines classify the data well, but which is
the best?
)()( bsignf T += xwx
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR
Large margin linear classifier
“safe zone”
• The linear discriminant
function with the maximum
margin is the best
Margin
x1
x2
denotes +1
denotes -1
• Why is the best?
– Robust to outliers and
shown to improve
generalization
• The margin with is:
2
2
2
w=+= +− ddM
Support
vectors
−d
−d
11
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR
Large margin linear classifier
“safe zone”
• The linear discriminant
function with the maximum
margin is the best
Margin
x1
x2
denotes +1
denotes -1
• Why is the best?
– Robust to outliers and
shown to improve
generalization
• The margin with is:
2
2
2
w=M
Support
vectors
11
11
..min2
2
−≤+−=
≥++=
byfor
byfor
ts
i
T
i
i
T
i
xw
xw
w
Quadratic optimization problem
with linear constraints
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR
SVM formulation
• But what if we have error or non-linear decision boundaries.
x1
x2
• Slack variables ξi can be added
To allow misclassification of difficult
or noisy data points
0
11
11
..min1
2
2
≥
+−≤+−=
−≥++=
+ ∑=
i
ii
T
i
ii
T
i
n
i
i
by
by
tsC
ξ
ξ
ξ
ξ
xw
xw
w
Balance trade-off between margin and
classification error. Controls over-fitting
1ξ2ξ
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR
SVM is a regularized network
• In the noiseless case, LDA on the support vectors is
equivalent to SVM (Shashua 99)
• SVM classifier can be also optimized in the primal
without constraints:
49
∑=
+−+N
i
T
ii xyC1
0
2
2))(1,0max(min wxw
Regularization Training error with hinge-loss function
0 1-1
y=-1y=+1
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR
Other classifiers
• Several other loss-functions for other classifiers (e.g.,
logistic regression, Adaboost)
50
0 1
Hinge-loss 0/1 loss
Penalize examples that are well classified but close to the margin
Logistic loss )1log()( ii fy
ex+
12
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 51
Discriminative Models
• Linear Discriminant Analysis (LDA)
• Support Vector Machines (SVM)
• Oriented Component Analysis (OCA)
• Canonical Correlation Analysis (CCA)
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 52
Oriented Component Analysis (OCA)
• Generalized eigenvalue problem:
• boca is steered by the distribution of noise.
OCA
T
OCA
T
OCA
noiseOCA
signal
bΣb
bΣb
λkeki bΣbΣ =
signal
noise
OCAb
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 53
OCA for face recognition
k
T
k
T
k
ek
i
bΣb
bΣb1
1
j
T
j
j
T
j
e
i
bΣb
bΣb2
2
(De la Torre et al. 2005)
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 54
Canonical Correlation Analysis (CCA)
• Perform PCA independently and learn a mapping
PCA PCA
• Independent dimensionality reduction between set can loose signals with small energy but highly correlated
13
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 55
Canonical Correlation Analysis (CCA)
• Learn relations between multiple data sets? (e.g. find
features in one set related to another data set)
• Given two sets , CCA finds the pair
of directions wx and wy that maximize the correlation
between the projections (assume zero mean data)
T
y
TT
y
T
x
TT
x
y
TT
x
YwYwXwXw
YwXw=ρ
ndnd and ×× ℜ∈ℜ∈ 21 YX
=ℜ∈
=ℜ∈
= +×++×+
y
xdddd
T
Tdddd
T
T
w
ww
YY0
0XXΒ
0YX
YX0A
)()()()( 21212121 ,
ΒwAw λ=
(Mardia et al., 1979; Borga 98)
• Several ways of optimizing it:
• An stationary point of r is the solution to CCA.
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 56
Virtual avatars with CCA(De la Torre & Black 2001)
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 57
Outline
• Introduction (15 min)
• Generative models (40 min)– (PCA, k-means, spectral clustering, NMF, ICA, MDS)
• Discriminative models (40 min)– (LDA, SVM, OCA, CCA)
• Standard extensions of linear models (30 min)– (Kernel methods, Latent variable models, Tensor
factorization )
• Unified view (20 min)
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR
Linear methods fail
58
• Data points in a non-linear manifold
• There is no good linear mapping to map to a plane
• Linear methods only rotate/translate/scale the data
14
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 59
Linear methods fail • Learning a non-linear representation for classification
Cos close to 0
Cos close to -1
Cos close to 1
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR
Linear methods not enough
• Linear methods:– Unique optimal solutions
– Fast learning algorithms
– Better statistical analysis
• Problem:– Insufficient capacity. Minsky and Pappert pointed out in their books
Perceptrons
– Neural networks adding non-linear layers (e.g., MLP). Solve the
capacity problem but hard to train and local minima.
60
• Kernel methods:– Use linear techniques but work in a high-dimensional space.
)(xx Φ→
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 61
Kernel methods
• The kernel defines an implicit mapping (usually high dimensional)
from the input to feature space, so the data becomes linearly
separable.
• Computation in the feature space can be costly because it is
(usually) high dimensional
– The feature space can be infinite-dimensional!
Feature spaceInput space
),,(),,(),(321
22
212121
zzzxxxxxx =+→
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 62
Kernel methods (II)• Suppose φ(.) is given as follows
• An inner product in the feature space is
• So, if we define the kernel function as follows, there is no need to carry out φ(.) explicitly
• This use of kernel function to avoid carrying out φ(.) explicitly is known as the kernel trick. In any linear algorithm that can be expressed by inner products can be made nonlinear by going to the feature space
15
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 63
Kernel PCA(Scholkopf et al., 1998)
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 64
Kernel PCA (II)
• Eigenvectors of the cov. Matrix in feature space.
• Eigenvectors lie in the span of data in feature space.
∑=
ΦΦ=n
i
iin 1
T)()(1
ddC λ11 bbC =
∑=
Φ=n
i
ii
1
1 )(db α
λαKα =
(Scholkopf et al., 1998)
λαα ])([),()(1
1 11
∑ ∑∑= ==
Φ=Φn
i
i
n
i
ijii
n
j
i Kn
dddd
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 65
Outline• Introduction (5 min)
• Generative models (20 min)– (PCA, k-means, spectral clustering, NMF, ICA, MDS)
• Discriminative models (20 min)– (LDA, SVM, OCA, CCA)
• Standard extensions of linear models (15 min)– (Kernel methods, Latent variable models, Tensor
factorization )
• Unified view (15 min)
• Extended generative models (50 min)– RPCA, PaCA, ACA
• Extended discriminative models (1 hour)– MODA, Parda, CTW, seg-SVM
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 66
Factor Analysis• A Gaussian distribution on the coefficients and noise is
added to PCA� Factor Analysis.
• Inference (Roweis & Ghahramani, 1999;Tipping & Bishop, 1999a)
Ψ+=−−=
=ΨΨ=
Ψ+==
++=
TT
d
k
E
diagNp
NpNp
BBµdµdd
0,cη
BcµdBcdI0,cc
ηBcµd
)))((()cov(
),...,,()|()(
),|(),|()|()(
21 ηηη
(Mardia et al., 1979)
11
1
)(
)()(
),|()(
−−
−
Ψ+=
−Ψ+=
=
BBIV
µdBBBm
Vmcd|c
T
TT
Np
PCA reconstruction low error.
FA high reconstruction error (low likelihood).
),( dcp Jointly Gaussian
16
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 67
Ppca• If PPCA.
• If is equivalent to PCA. TTTTBBBBBB
11 )()(0 −− =Ψ+→ε
d
TE Iηη ε==Ψ )(
0→ε
• Probabilistic visual learning (Moghaddam & Pentland, 1997;)
∑
=
Σ
=
Σ
== −
−
=
−−+−−−Σ−−
∏∫
=
−−
2
)(
2
)(
1
21
2
2
1
2
12
)()()(2
1
2
12
)()(2
1
)2()2()2()2(
)()()(
2
1
2
11
kdk
i
i
d
c
dd
eeeedppp
k
i i
iTT
πρλπππ
ρε
λεd
µdIBBµdµdµd T
ccc|dd
i
T
i dBc =
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 68
More on PPCA
• Tracking (Yang et al., 1999; Yang et al., 2000a; Lee et al., 2005; de la Torre et
al., 2000b)
• Recognition/Detection (Moghaddam et al., 2000; Shakhnarovich &
Moghaddam, 2004; Everingham & Zisserman, 2006)
• PCA for the exponential family (collins et al., 2001)
(Tipping & Bishop, 1999b; Black et al., 1998; Jebara et al., 1998)
• Extension to mixtures of Ppca (mixture of subspaces).
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 69
Outline• Introduction (5 min)
• Generative models (20 min)– (PCA, k-means, spectral clustering, NMF, ICA, MDS)
• Discriminative models (20 min)– (LDA, SVM, OCA, CCA)
• Standard extensions of linear models (15 min)– (Kernel methods, Latent variable models, Tensor
factorization )
• Unified view (15 min)
• Extended generative models (50 min)– RPCA, PaCA, ACA
• Extended discriminative models (1 hour)– MODA, Parda, CTW, seg-SVM
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 70
Tensor faces(Vasilescu & Terzopoulos, 2002; Vasilescu & Terzopoulos, 2003)
viewsilluminations
expressions
people
17
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 71
Eigenfaces• Facial images (identity change)
• Eigenfaces bases vectors capture the variability in facial
appearance (do not decouple pose, illumination, -)
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 72
Data Organization• Linear/PCA: Data Matrix
– Rpixels x images
– a matrix of image vectors
• Multilinear: Data Tensor
– Rpeople x views x illums x express x pixels
– N-dimensional matrix
– 28 people, 45 images/person
– 5 views, 3 illuminations,
3 expressions per person
exilvpp ,,,iIllu
min
ati
on
s
Views
D
D
Pix
els
ImagesD
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 73
N-Mode SVD Algorithm
N = 3
pixelsxexpressx
illums.x
views x
peoplex 51 UUUUU .... ZZZZDDDD 432=
∑∑∑=l p s
lpslpsijk uuuzd
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 74
PCA:
TensorFaces:
18
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 75
TensorFaces
Mean Sq. Err. = 409.15
3 illum + 11 people param.
33 basis vectors
PCA
Mean Sq. Err. = 85.75
33 parameters
33 basis vectors
Strategic Data Compression =
Perceptual Quality
Original
176 basis vectors
TensorFaces
6 illum + 11 people param.
66 basis vectors
• TensorFaces data reduction in illumination space primarily
degrades illumination effects (cast shadows, highlights)
• PCA has lower mean square error but higher perceptual error
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 76
Outline
• Introduction (15 min)
• Generative models (40 min)– (PCA, k-means, spectral clustering, NMF, ICA, MDS)
• Discriminative models (40 min)– (LDA, SVM, OCA, CCA)
• Standard extensions of linear models (30 min)– (Kernel methods, Latent variable models, Tensor
factorization )
• Unified view (20 min)
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR
The fundamental equation of CA(De la Torre & Kanade 06, De la Torre 2012)
Fc
T
rE ||)(||),(0 WBAWBA Ψ−Γ=
Weights
for rows
Weights
for columns
Regression
matrices
Given two datasets : nxnd and ×× ℜ∈ℜ∈ XD
XD
AB
)(θ)(ϕ
CC
77 Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR
Properties of the cost function
• E0(A,B) has a unique global minimum (Baldi and Hornik-89).
• Closed form solutions for A and B are:
( ))()()( 2212
0 AWWWAAWAA T
r
TTTT
ccctrE ΨΓΓΨΨΨ= −
( )))(()()( 22122212
0 BWWWWWBBWBB r
T
c
T
c
T
cr
T
r
TtrE ΓΨΨΨΨΓ= −−
78
19
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR
Principal Component Analysis (PCA)
• PCA finds the directionsof maximum variation ofthe data based on linearcorrelation.
(Pearson, 1901; Hotelling, 1933;Mardia et al., 1979; Jolliffe, 1986; Diamantaras, 1996)
• Kernel PCA finds thedirections of maximumvariation of the data inthe feature space.
),,(),,(),(321
22
212121
zzzxxxxxx =+→
79 Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR
PCA-Kernel PCA
• The primal problem:
Fc
T
rE ||)(||),(0 WBAWBA Ψ−Γ=
F
T
kpcaE ||)(||),( BADBA −=)(Dϕ
( )))()(()()( 1ADDAAAA ϕϕ TTT
kpca trE −=
( ))()()( 1BDDBBBB
TTT
kpca trE −=
• Error function for KPCA: (Eckardt & Young, 1936; Gabriel & Zamir, 1979; Baldi
& Hornik, 1989; Shum et al., 1995; de la Torre & Black, 2003a)
• The dual problem:
80
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR
Error function for LDA
F
TTT
LDAE ||)()(||),( 2
1
DBAGGGBA −=−
[d1 d2 ... dn]
Equations n×c Unknowns d×c
[ ]A
d=pixels
K=dim
subspace
kn
nk
ijg
×ℜ∈
=
∈
G
1G1
}1,0{
=
0...0
1...0
0...1TG
c=classes
n=samples
• d>>n an UNDETERMINED system of equations! (over-fitting)
(de la Torre & Kanade, 2006)
81 Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR
• CCA is the same as LDA changing the label matrix by a
new set X
Fc
T
rE ||)(||),(0 WBAWBA Ψ−Γ=
F
TT
CCAE ||)()(||),( 2
1
XBADDDBA −=−
Canonical Correlation Analysis (CCA)
=
0...0
1...0
0...1TG
c=classes
n=samples
82
20
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR
K-means
Fc
T
rE ||)(||),(0 WBADWBA Ψ−=(Ding et al., ‘02, Torre et al ‘06)
=TBA
xy
1
2
3
4
5
6
7
8
9
10
x
y
=TA
xy=D
=Bxy
83 Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR
Normalized cuts
Fc
T
rE ||)(||),(0 WBAΓWBA Ψ−=
)(DΓ ϕ=)](...)()([ 21 ndddΓ ϕϕϕ=
(Dhillon et al., ‘04, Zass & Shashua, 2005; Ding et al., 2005, De la Torre et al ‘06)
Affinity Matrix
Normalized Cuts (Shi & Malik ’00)
Ratio-cuts(Hagen & Kahng ’02)
84
Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR
Other Connections
• The LS-KRRR (E0) is also the generative model for:
– Laplacian Eigenmaps, Locality Preserving projections, MDS,
Partial least-squares, -.
• Benefits of LS framework:
– Common framework to understand difference and communalities
between different CA methods (e.g. KPCA, KLDA, KCCA, Ncuts)
– Better understanding of normalization factors and
generalizations
– Efficient numerical optimization less than θ(n3) or θ(d3), where n
is number of samples and d dimensions
85