advanced topics in learning and vision - national taiwan …mhyang/course/u0030/... · ·...

Advanced Topics in Learning and Vision

Ming-Hsuan [email protected]

Lecture 3 (draft)

Overview

• Unsupervised Learning

• Multivariate Gaussian

• EM Algorithm

• Mixture of Gaussians

• Mixture of Factor Analyzers

• Mixture of Probabilistic Component Analyzers

• Isometric Mapping

• Local Linear Embedding

• Global coordination of local representation

Lecture 3 (draft) 1

Announcements

• Required and supplementary material available on the course web page

• Send your critiques by Oct 18

• Term project: tinkering your ideas as early as possible

Lecture 3 (draft) 2

Unsupervised Learning

• Goal:

- dimensionality reduction- finding clusters from data- finding hidden causes or sources of data (i.e, factors, principal

components)- model data density

• Application:

- data compression- denoising, outlier detection- classification- efficient computation- explain human learning and perception- ...

Lecture 3 (draft) 3

PCA Application: Web Search

• PageRank: Suppose we have a set of four web pages, A,B, C, and D asdepicted above. The PageRank (PR) of A is

PR(A) = PR(B)2 + PR(C)

1 + PR(D)3

PR(A) = PR(B)L(B) + PR(C)

L(C) + PR(D)L(D)

(1)

• Random surfer: Markov process

PR(pi) =q

N+ (1− q)

∑pj∈NE(pi)

PR(pj)L(pj)

(2)

Lecture 3 (draft) 4

• The PR values are the entries of the dominant eigenvector of the modifiedadjacency matrix. The dominant eigenvector is

PR(p1)PR(p2)

...PR(pN)

(3)

of

R =

q/Nq/N

...q/N

+ (1− q)

l(p1, p1) l(p1, p2) . . . l(p1, pN)l(p2, p1) . . .

...l(pN , p1) l(pN , pN)

R (4)

where l(pi, pj) is an adjacency function.

• Related to random walk, Markov process and spectral clustering

Lecture 3 (draft) 5

• L. Page and S. Brim Pagerank, “An eigenvector based ranking approach forhypertext,” In 21st Annual ACM/SIGIR International Conference onResearch and Development in Information Retrieval, 1998.

Lecture 3 (draft) 6

PCA Application: Account for Illumination Change[Belhumeur and Kriegman 97]

• What is the set of images of an object under all possible illuminationconditions?

• Illumination cone lies near a low dimensional linear PCA subspace of theimage space

• Can be used for object recognition

Lecture 3 (draft) 7

Lecture 3 (draft) 8

PCA Application: Appearance Compression and Synthesis[Nishino et al. 99]

• Given a 3D model (can be obtained by various vision algorithms or rangesensors), how to capture the variation of object appearance under differentviewing and illumination conditions?

• Take a sequence of the same image patch under different viewingconditions

• Under different view angles (left: input images, right: synthesized images)

Lecture 3 (draft) 9

• Under different lighting condition

Lecture 3 (draft) 10

Review

p(x, y) = p(x)p(y|x)= p(y)p(x|y)

p(y|x) = p(x|y)p(y)p(x)

(5)

• The joint probability of x and y is p(x, y)

• The marginal probability of x is p(x) =∑

y p(x, y)

• The conditional probability of x given y is: p(x|y)


Bayesian Learning

• M are the models (or model parameters): unknown

• D is the data: known

p(M|D) =p(D|M)p(M)

p(D)(6)

• p(D|M) is the likelihood.

• p(M) is the prior probability of M

• p(M|D) is the posterior probability of M.

• p(D) =∫

p(D|M)p(M) is the marginal likelihood or evidence.

• Given D, want to M- Maximum likelihood (ML): that gives highest likelihood, p(D|M)- Maximum a posterior (MAP): that gives highest posterior probability,

p(M|D)


Multivariate Gaussian

p(x|µ,Σ) = |2π|−N2 |Σ|−1

2 exp{−12(x− µ)TΣ−1(x− µ)} (7)

where µ is the mean and Σ is the covariance matrix.

• Given a data set X = {x1, . . . , xN}, the likelihood isp(data|model) =

∏Ni=1 p(xi|µ,Σ)

• Goal: find µ and Σ that maximize log likelihood:

L = logN∏

i=1

p(xi|µ,Σ) = −N

2log |2πΣ| − 1

2

N∑i=1

(xi − µ)TΣ−1(xi − µ) (8)

• Maximum likelihood estimate:

∂L∂µ = 0 ⇒ µ̂ = 1

N

∑i xi (sample mean)

∂L∂Σ = 0 ⇒ Σ̂ = 1

N

∑i(xi − µ̂)(xi − µ̂)T (sample covariance)

(9)


Limitations of Gaussian, FA and PCA

• Linear methods: easy to understand and use in practice.

• Efficient way to find structure in high dimensional data, e.g., as apreprocessing step

• All based on Gaussian assumption: only the mean and variance of data aretaken into account

• Based on second order statistics


advanced topics in learning and vision - national taiwan …mhyang/course/u0030/... · ·...

Documents