statistical methods for learning multimedia …infolab.stanford.edu/~echang/icme03-ucsb.pdf7/6/2003...

118
7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics Edward Chang Associate Professor, Electrical Engineering, UC Santa Barbara CTO, VIMA Technologies

Upload: hoangnhi

Post on 24-May-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 1

Statistical Methods for Learning Multimedia Semantics

Edward ChangAssociate Professor,Electrical Engineering, UC Santa BarbaraCTO, VIMA Technologies

Page 2: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 2

Outline

Statistical LearningMultimedia Applications’ Data CharacteristicsClassical ModelsKernel Methods

Linear Model ViewNearest Neighbor ViewGeometric View

Dimension Reduction Methods

Page 3: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 3

Statistical Learning

Program the computers to learn!Computers improve performancewith experience at some taskExample:

Task: classify imagesPerformance: prediction accuracyExperience: labeled images

Page 4: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 4

Definition

X: Data poolU: Unlabeled pool L: Labeled pool

G: LabelsRegression: G → RClassification: G → +1, -1

H: Learning algorithm

Page 5: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 5

Statistical LearningExperience

Characterized by training data LTraining

f = H(L)Task (e.g., prediction)

ŷ = f(u), u ∈ UPerformance

Measured by some error functione.g., maximizing yf(u)

Page 6: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 6

Learning Algorithms (H)

Linear RegressionK-NNBayesian AnalysisNeural NetworksDecision TreesKernel MethodsEtc.

Page 7: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 7

H Having a hypothesis spaceFind the “best” hypothesis based on the training data (L) efficiently

Best solutionFitting L well? Predicting U accurately!

EfficiencyComputational complexity and resource requirements

Page 8: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 8

Classical Model [Donoho 2000]

N:Number of training instancesN = |U|

N+, N-

D:DimensionalityN >> D N → ∞

E.g., PAC learnabilityN- ≈ N+

Page 9: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 9

Emerging MM Applications

N < DN+ << N-

ExamplesInformation retrieval with relevance feedbackK-class classification⌧Image classification⌧Gene profiling

Page 10: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 10

Gene Profiling ExampleN = 59 cases, D = 4026 genes

Page 11: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 11

Image Retrieval Demo

N < DN < 50D = 150

N+ << N-

ACM SIGMOD 01; ACM MM 01,02; IEEE CVPR 03Also see my Web site

Page 12: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 12

SVMactive

Page 13: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 13

SVMactive

Page 14: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 14

SVMactive

Page 15: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 15

SVMactive

Page 16: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 16

Ranking

Page 17: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 17

Solution Summary N < D

ACM MM 2001 (SVM Active)⌧Make each u in U most informative

PCM 2002, ICIP 2003⌧Increase N- through co-training

ACM MM 2002 (DPF)⌧Reduce D

N+ << N-

ACM MM 2003, ICML 2003⌧Conformal transformation ⌧Kernel boundary alignment

Page 18: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 18

Outline

Statistical LearningMM Applications’ Data CharacteristicsClassical Models (Classification)Kernel Methods

Linear Model ViewNearest Neighbor ViewGeometric View

Dimension Reduction Methods

Page 19: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 19

Classical Methods

Linear ModelLeast SquareMaximum Likelihood Naïve BayesianLDAMaximum Margin Hyperplane

Nearest Neighbor

Page 20: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 20

Linear Regression

Page 21: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 21

Least Square

Y = β0 + ΣΣ βj Xj (j = 1 to D)Y = XTβRSS(β) = (Y – Xβ)T(Y – Xβ)

RSS: Residual Sum of Squareβ = (XTX)-1 XTY

Page 22: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 22

Maximum Likelihood

Y = β0 + ΣΣ βj Xj (j = 1 to p)Y = XTβY = XTβ + ε

ε (noise signals) are independentε → N (0, ∂2)

P(y|βx) has a normal dist. withMean at y = βxVariance ∂2

Page 23: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 23

Maximum Likelihood

P(y|βx) → N (0, ∂2) Training

Given (x1,y1) (x2,y2) … (xn,yn)Infer P(β | x1, x2,… xn, y1, y2,…yn )By Bayes rule, orMaximum Likelihood Estimate

Page 24: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 24

Maximum Likelihood

For what β isP(y1, y2,…yn | x1, x2,… xn, β) maximized?ΠΠ P(yi|βxi) maximized? ΠΠ exp(-½(yi-βxi/∂)2) maximized?ΣΣ (-½(yi-βxi/∂)2 maximized?ΣΣ (yi-βxi)2 minimized?

Page 25: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 25

Least Square Linear Model

Solution Method #1RSS(β) = (Y – Xβ)T(Y – Xβ)β = (XTX)-1 XTY

Solution Method #2 (for D > N)Gradient decentPerceptron

Page 26: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 26

Other Linear Models

LDAFind the projection direction which minimizes the overlap for two Gaussian distributions

Separating Hyperplane

Page 27: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 27

LDA

Page 28: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 28

Page 29: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 29

Separating Hyperplane

Page 30: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 30

Separating Hyperplane

Page 31: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 31

Maximum Margin Hyperplane

Only support vectors involve in class prediction!

Page 32: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 32

Linear Models

N ≥ DLeast SquareLDA

D > NPerceptron (using gradient decent)Maximum Hyperplane

Generative vs. Discriminative Model

Page 33: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 33

Linear Model Fits All Data?

Page 34: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 34

How about Joining the Dots?

Y(x) = 1/k ΣΣ yi,

xi ∈Nk(x)K = 1

Page 35: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 35

Linear Model Fits All?

Page 36: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 36

NN with k = 1

Page 37: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 37

Nearest Neighbor

Four Things Make a Memory Based Learner

A distance function?K: number of neighbors to consider?A weighted function (optional)?How to fit with the local points?

Page 38: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 38

Problems of K=1

Fitting NoiseJagged Boundaries

Page 39: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 39

Solutions

Fitting NoisePick a larger K?

Page 40: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 40

NN with k = 15

Page 41: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 41

NN

Page 42: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 42

Solutions

Fitting NoisePick a larger K?

Jagged BoundariesIntroducing Kernel as a weighting function

Page 43: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 43

Nearest Neighbor → Kernel Method

Four Things Make a Memory Based Learner

A distance functionK: number of neighbors to consider? AllA weighted function: RBF kernelsHow to fit with the local points? Predict weights

Page 44: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 44

Kernel Method

RBF Weighted FunctionKernel width holds the key⌧Implying KUse cross validation to find the “optimal” width

Fitting with the Local PointsWhere NN meets Linear Model

Page 45: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 45

LM vs. NNLinear Model

f(x) is approximated by a global linear functionMore stable, less flexible

Nearest NeighborK-NN assumes f(x) is well approximated by a locally constant functionLess stable, more flexible

Between LM and NNThe other models…

Page 46: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 46

Decision Theories

Bias & Variance TradeoffBayes PredictionVC DimensionalityPAC Learnability

Page 47: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 47

Variance vs. Bias

MSE(x) = ET [f(x) – ŷ]2

= ET[ŷ – ET(ŷ)]2 + [ET(ŷ)– f(x)]2

Error = VarT(ŷ) + Bias2(ŷ)

Page 48: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 48

Variance vs. Bias

Page 49: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 49

Outline

Statistical LearningEmerging Applications Data CharacteristicsClassical Models (Classification)Kernel MethodsDimension Reduction Methods

Page 50: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 50

Where Are We and Where Am I Heading To ?

LM and NNKernel Method of Three Views

LM viewNN viewGeometric view

Page 51: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 51

Linear Model View

Y = β0 + ΣΣ β XSeparating Hyperplane

Max||β||=1 CSubject to yyii f(f(xxii) ) ≥≥ C, orC, oryyii ((β0 +β xi) ≥≥ CC

Page 52: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 52

Classifier Margin

Margin Defined as width of the boundary before hitting a data object

Maximum MarginTends to minimize classification varianceNo formal theory for this yet

Page 53: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 53

Separating Hyperplane

Page 54: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 54

M’s Mathematical Representation

Plus-plane{x: wx+b = +1}

Minus-plane{x: wx+b = -1}

w ⊥ Plus-planew(u – v) = 0, if u and v on plus-plane

w ⊥ Minus-plane

Page 55: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 55

Separating Hyperplane

Page 56: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 56

M

Let x- be any point on minus-planeLet x+ be the closest plus-plane-point to x-

x+ = x- + λw, whyThe line (x+x-) ⊥ minus-plane

M = |x+ - x-|

Page 57: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 57

M

1. wx- + b = -1 2. wx+ + b = 1 3. x+ = x- + λw 4. M = |x+ - x-|5. w(x- + λw) + b = 1 (from 2 & 3)6. wx- + b + λww = 17. λww = 2

Page 58: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 58

M

1. λww = 22. λ = 2/ww3. M = |x+ - x-| = |λw| = λ|w| = 2/|w|

4. Max MGradient decent, simulated annealing, EM, Newton’s method…

Page 59: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 59

Max M

Max M = 2/|w|Min |w|/2Min |w|2/2

subject to yi(xiw+b) ≥ 1i = 1,…,N

Quadratic criterion with linear inequality constraints

Page 60: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 60

Max M

Min |w|2/2subject to yi(xiw+b) ≥ 1i = 1,…,N

Lp = minw,b |w|2/2 + ΣΣi=1..N αi[yi(xiw+b)-1]

w = ΣΣi=1..N αiyixi

0 = ΣΣi=1..N αiyi

Page 61: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 61

Wolfe Dual

Ld = ΣΣi=1..N α - 1/2 ΣΣΣΣi,j=1..Nαiαjyiyjxixj

Subject to αi ≥ 0αi [yi(xiw+b)-1] = 0KKT conditions⌧αi > 0, yi(xiw+b) = 1 (Support Vectors)⌧αi = 0, yi(xiw+b) > 1

Page 62: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 62

Class Predictionyyqq = = w xq + b

w = ΣΣi=1..N αiyixi

yyqq = sign(= sign(ΣΣi=1..N αiyi(xi ·Xq) + b)

Page 63: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 63

Non-separatable Classes

Soft Margin HyperplaneBasis Expansion

Page 64: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 64

Non-separating Case

Page 65: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 65

Soft Margin SVMs

Min |w|2/2subject to yi(xiw+b) ≥ 1i = 1,…,N

Min |w|2/2 + C ∑εi

xiw+b ≥ 1 - εi if yi = 1xiw+b ≤ -1 + εi if yi = -1εi ≥ 0

Page 66: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 66

Non-separating Case

Page 67: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 67

Wolfe Dual

Ld = ΣΣi=1..N α - 1/2 ΣΣΣΣi,j=1..Nαiαjyiyjxixj

Subject to C ≥ αi ≥ 0ΣΣ αiyi = 0KKT conditions

yyqq = = sign ((ΣΣi=1..N αiyi(xi ·Xq) + b)

Page 68: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 68

Basis Function

Page 69: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 69

Harder 1D Example

Page 70: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 70

Basis Function

Φ(X) = (x, x2)

Page 71: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 71

Harder 1D Example

Page 72: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 72

Some Basis Functions

Φ(X) = ΣΣ γmhm(X) hm(X) Rp → R

Common FunctionsPolynomialRadial basis functionsSigmoid functions

Page 73: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 73

Kernel FunctionLd = ΣΣi=1..N α - 1/2 ΣΣΣΣi,j=1..Nαiαjyiyj Φ(xi)Φ (xj)Subject to

C ≥ αi ≥ 0ΣΣ αiyi = 0KKT conditions

yyqq = sign (= sign (ΣΣi=1..N αiyi(Φ(xi)·Φ(Xq)) + b)K(xi, xj) = Φ(xi)·Φ(Xj)

Kernel function!

Page 74: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 74

Quadratic Basis Functions

Φ(a) = {1, ai, ai aj}, ij = 1..D(D+1)(D+2)/2 termsD2 termsO(D2) computational cost

It is equivalent to (ab+1)2

O(D) computational costTotal computational cost

O(N2D)

Page 75: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 75

Dot Product Saves the Day

O(N2D)Quadratic

O(N2D2)Cubic

O(N2D3)Quartic

O(N2D4)

Page 76: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 76

Quiz

What is a polynomial kernel degree dfunction’s signature?(ab+1)d

Page 77: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 77

Outline

LM and NNKernel Method of Three Views

LM viewNN viewGeometric view

Page 78: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 78

Nearest Neighbor View

Z, a set of zero mean jointly Gaussian random variables,

Each Zi corresponds to one example Xi

Cov (zi, zj) = K(xi, xj)yi, the lable of zi, +1 or -1

P(yi | zi) = σ(yi,zi)

Page 79: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 79

Training Data

Page 80: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 80

General Kernel Classifier [Jaakkola, etc. 99]

MAP Classification for xt

yt = sign (Σ αi yi K(xt,xi)) K(xi, xj) = Cov (zi, zj) (some similarity function)

Supervised Training: Compute αi Given X and y, andAn error function such as J(α) = - ½ Σ αi αj yi yj K(xi,xj) + Σ F(αi)

Page 81: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 81

Leave One Out

Page 82: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 82

SVMsyt = sign (Σ αi yi K(xt,xi))(yi xi) training data, αi nonnegative, and kernel K positive definiteαi is obtained by maximizing

J(α) = - ½ Σ αi αj yi yj K(xi,xj) + Σ F(αi)F(αi) = αi

αi ≥ 0, Σyiαi = 0

Page 83: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 83

Important Insight

K(xi, xj) = Cov (zi, zj) To design of a kernel is to design a similarity function that produces a positive definite covariance matrix on the training instances

Page 84: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 84

Basis Function Selection

Three General ApproachesRestriction methods⌧Limit the class of functionsSelection methods⌧Scan the dictionary adaptively (Boosting)Regularization methods⌧Use the entire dictionary but restrict

coefficients (Ridge Regression)

Page 85: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 85

Overfitting?

Probably NotBecause

N free parameters (not D)Maximizing margin

Page 86: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 86

Geometrical View

S = w X + b|w| = 1, b = 0V = {w | Si f(xi) > 0; i = 1..n, |w| = 1}SVM is the center of the largest sphere contained in V

Page 87: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 87

SVMs

Page 88: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 88

BPMs

Bayes Objective FunctionŜt = Bayes Z (Xt) = argmin Si in S E H|Z = x [l(H(x), Si)]

BPMs [Herbrich, etc. 2001]Abp= argmin h in H Ex[E H|Z = x [l(H(x), h(x))]]

Page 89: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 89

BPMs

Linear ClassifierInput X Posses Spherical Gaussian Density

BP is the Center of Mass of the Version Space

Page 90: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 90

BPMs vs. SVMs

Page 91: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 91

BPMs

Use SVMs to find a good h in HFind the BP

Billiard Algorithm [Herbrich, etc. 2001]

Perceptron Algorithm [Herbrich, etc. 2001]

Page 92: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 92

Billiard Ball Algorithm (R. Herbrich )

Page 93: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 93

Outline

Statistical LearningEmerging Applications Data CharacteristicsClassical Models (Classification)Kernel MethodsDimension Reduction Methods

Page 94: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 94

Similarity Measurement

Page 95: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 95

Perceptual Distance FunctionTwo Monumental Challenges

Formulating a perceptual feature spaceFormulating a perceptual distance function

Page 96: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 96

Dimensionality Curse

D: Data DimensionWhen D increases

Nearest neighbors are not localAll points are equally distanced

Page 97: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 97

Sparse High-D Space [C. Aggarwal, etc. ICDT 2001]

Hyper-cube Range Queries

dd ssP =][

Page 98: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 98

Page 99: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 99

Sparse High-D Space

Spherical Range Queries

Page 100: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 100

)12(

)5.0()]5.0,([+Γ

•=∈ dQspRP

ddd π

Page 101: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 101

Page 102: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 102

Dimensionality Curse

Page 103: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 103

So?

Is nearest neighbor estimate cursed in high-D spaces?

Yes!When D is large and N is relatively small, the estimate is off!!

Page 104: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 104

Are We Doomed?

How does the curse affect classification?Similar objects tend to clustertogetherClassification makes binary prediction

Page 105: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 105

Distribution of Distances

Page 106: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 106

Some Solutions to High-D

Restricted Estimators Specifying the nature of local neighborhood

Adaptive Feature Reduction PCA, LDA

Dynamic Partial Function

Page 107: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 107

Three Major Paradigms

Preserve data description in a lower dimensional space

PCAMaximize discriminability in a lower dimensional space

LDAActivate only similar channels

DPF

Page 108: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 108

Minkowski Distance

Objects P and QD = (ΣM (pi - qi)n)1/n

Similar images are similar in all M features

Page 109: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 109

1.0E-06

1.0E-05

1.0E-04

1.0E-03

1.0E-02

1.0E-01

00.

060.

130.

190.

250.

320.

380.

440.

510.

570.

630.

690.

760.

820.

880.

95

Feature Distance

Freq

uenc

y

1.0E-06

1.0E-05

1.0E-04

1.0E-03

1.0E-02

1.0E-01

00.

060.

130.

190.

250.

320.

380.

440.

510.

570.

630.

690.

760.

820.

880.

95

Feature Distance

Freq

uenc

y

Page 110: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 110

Weighted Minkowski Distance

D = (ΣM wi(pi - qi)n)1/n

Similar images are similar in the same subset of the M features

Page 111: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 111

0 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 0

0.007545 0.01307 0.004637 0.002413 0.002635 0.002954 0.0020070.014669 0.02717 0.010578 0.006734 0.007725 0.006379 0.0057660.012615 0.023055 0.009333 0.006764 0.007363 0.006593 0.0054430.082128 0.212612 0.068016 0.037835 0.032241 0.018068 0.0132030.061564 0.176548 0.045542 0.026445 0.026374 0.018583 0.0220370.019243 0.037016 0.015684 0.010834 0.012792 0.013536 0.0093460.09418 0.153677 0.066896 0.040249 0.036368 0.030341 0.0211380.1284 0.335405 0.13774 0.072613 0.054947 0.039216 0.043319

0.041414 0.101403 0.035881 0.022633 0.018991 0.017131 0.019450.014024 0.049782 0.01457 0.0053 0.004439 0.003041 0.0052260.049319 0.120274 0.045804 0.020165 0.019499 0.013805 0.018513

GIF

00.020.040.060.080.1

0.120.14

1 11 21 31 41 51 61 71 81 91 101

111

121

131

141

Feature Number

Aver

age

Dis

tanc

e0 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 0

0.002923 0.004377 0.029086 0.017063 0.007649 0.002019 0.001984 0.011560.006648 0.010143 0.070708 0.046142 0.023502 0.005178 0.005169 0.030140.006298 0.009264 0.075118 0.042225 0.020053 0.006285 0.006533 0.0300430.010198 0.056025 0.052869 0.033199 0.018294 0.00688 0.006858 0.023620.017066 0.047514 0.104013 0.073459 0.037468 0.013849 0.01293 0.0483440.008148 0.015337 0.074134 0.044238 0.021222 0.005197 0.005099 0.0299780.013529 0.051743 0.063263 0.038084 0.020885 0.010481 0.009844 0.0285110.045746 0.104141 0.145924 0.11276 0.065015 0.026333 0.02593 0.0751920.026167 0.034522 0.085067 0.054154 0.02918 0.015887 0.014371 0.0397320.002676 0.012148 0.008913 0.004682 0.002452 0.000913 0.000905 0.0035730.014527 0.036084 0.046779 0.024712 0.017418 0.004182 0.004991 0.0196160.012121 0.030269 0.045198 0.022268 0.012468 0.004706 0.004955 0.017919

Scale up/down

00.050.1

0.150.2

0.250.3

0.350.4

1 11 21 31 41 51 61 71 81 91 101

111

121

131

141

Feature Number

Aver

age

Dis

tanc

e

0.024788 0.069615 0.0226 0.009364 0.01 0.00678 0.0097120.094781 0.227558 0.099002 0.046466 0.047815 0.036883 0.0246990.093399 0.233519 0.188091 0.043026 0.037991 0.022151 0.0240640.040228 0.102763 0.034949 0.014184 0.01465 0.010237 0.0155170.001163 0.000896 0.000722 0.000627 0.000349 0.000452 0.0027580.006947 0.006769 0.003541 0.006377 0.002048 0.005515 0.0130060.006365 0.005313 0.002064 0.004006 0.002055 0.003338 0.01010.011705 0.010935 0.006615 0.007506 0.003319 0.005911 0.0152110.009434 0.010169 0.004484 0.006306 0.002582 0.004798 0.0136570.006305 0.005997 0.003392 0.005719 0.002382 0.004853 0.0128020.005835 0.00945 0.004323 0.00564 0.002688 0.004535 0.0063320.008149 0.009636 0.0047 0.006213 0.002564 0.003375 0.0064210.006776 0.010315 0.005393 0.008004 0.003845 0.005659 0.0132030.001526 0.002551 0.000576 0.000371 0.000331 0.000286 0.000380.016302 0.022657 0.007055 0.00353 0.002171 0.004162 0.003980.012414 0.020159 0.007076 0.003102 0.00188 0.004606 0.003490.007231 0.013591 0.004979 0.001092 0.000582 0.002766 0.0007410.011588 0.015102 0.005764 0.003855 0.00262 0.004584 0.0037920.01212 0.016013 0.006441 0.004048 0.002728 0.004856 0.004241

0.012235 0.01671 0.00483 0.002616 0.00197 0.00268 0.001672

Cropping

00.050.1

0.150.2

0.250.3

0.35

1 11 21 31 41 51 61 71 81 91 101

111

121

131

141

Feature Number

Ave

rage

Dis

tanc

e

0.006109 0.019169 0.032795 0.015229 0.008667 0.002357 0.00292 0.0123940.01223 0.070665 0.046472 0.02549 0.017445 0.008694 0.00841 0.021302

0.019067 0.08113 0.04592 0.024327 0.014169 0.004995 0.005275 0.0189370.011323 0.029089 0.063856 0.037716 0.01988 0.00522 0.005556 0.0264460.000995 0.000971 0.00241 0.001415 0.000736 0.000275 0.000272 0.0010220.007103 0.006337 0.015615 0.008709 0.003433 0.001572 0.002071 0.006280.004321 0.004457 0.012494 0.007507 0.003403 0.001351 0.001976 0.0053460.007451 0.008135 0.017145 0.008711 0.003192 0.001154 0.00223 0.0064860.00576 0.006822 0.015235 0.00869 0.003676 0.001193 0.002159 0.006191

0.006491 0.005948 0.013473 0.007436 0.003165 0.001777 0.002377 0.0056460.003832 0.005257 0.011884 0.008077 0.002654 0.001227 0.001213 0.0050110.004812 0.005389 0.011737 0.00729 0.003216 0.001534 0.002039 0.0051630.008795 0.007888 0.016303 0.008801 0.004048 0.002367 0.0027 0.0068440.000451 0.000707 0.002277 0.001346 0.000797 0.000253 0.000239 0.0009820.004914 0.006924 0.01499 0.009123 0.006657 0.003364 0.003391 0.0075050.004473 0.006398 0.017247 0.008858 0.005219 0.002338 0.002392 0.0072110.001723 0.003639 0.010426 0.005216 0.003024 0.00043 0.000423 0.0039040.00427 0.005712 0.011221 0.00856 0.006923 0.004464 0.004462 0.007126

0.004978 0.006186 0.009864 0.007161 0.005881 0.003835 0.003847 0.0061180.001722 0.0046 0.015611 0.007291 0.00338 0.000508 0.00049 0.005456

Rotation

0

0.02

0.04

0.06

0.08

0.1

0.12

1 10 19 28 37 46 55 64 73 82 91 100

109

118

127

136

Feature Number

Aver

age

Dis

tanc

e

Page 112: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 112

Similarity Theories

Objects are similar in all respects (Richardson 1928)Objects are similar in some respects (Tversky 1977)Similarity is a process of determining respects, rather than using predefined respects (Goldstone 94)

Page 113: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 113

DPF

Which Place is Similar to DC?PartialDynamicDynamic Partial FunctionSee ACM MM 2002, ICIP 2002, ACM MM Journal

Page 114: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 114

Precision/Recall

Page 115: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 115

Summary

Statistical LearningEmerging Applications Data CharacteristicsClassical Models (Classification)Kernel Methods

Linear Model ViewNearest Neighbor ViewGeometric View

Dimension Reduction Methods

Page 116: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 116

Advanced Topics

Imbalance Data LearningN- >> N+

See our ICML 2003 papersSequence-data KernelKernel Alignment & Boosting

Page 117: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 117

Useful Links

Related Publicationshttp://www-db.stanford.edu/~echang/

Online DemoVIMA Technologies

Six deployments as July 2003www.vimatech.com

Page 118: Statistical Methods for Learning Multimedia …infolab.stanford.edu/~echang/ICME03-UCSB.pdf7/6/2003 ICME Tutorial, Baltimore 1 Statistical Methods for Learning Multimedia Semantics

7/6/2003 ICME Tutorial, Baltimore 118

References1. The Elements of Statistical Learning, T. Hastie, R. Tibshirani, and J.

Friedman, Springer, N.Y., 20012. Machine Learning, T. Mitchell, 19973. High-dimensional Data Analysis, D. Donoho, American Math. Society Lecture,

20004. Support Vector Machine Active Learning for Image Retrieval, S. Tong and E.

Chang, ACM MM, 20015. Dynamic Partial Function, B. Li and E. Chang, ACM Multimedia Journal, 20036. Pattern Discovery in Sequences under a Markov Assumption, D. Chudova and

P. Smyth, ACM KDD 20027. Bayes Point Machines, R. Herbrich, T. Graepel and C. Campbell, Journal of

Machine Learning Research, 20018. The Nature of Statistical Learning Theory, V. Vapnik, Springer, N.Y., 19959. Probabilistic Kernel Regression Models, T. Jaakkola and D. Haussler,

Conference of AI and Statistics, 199910. Support Vector Machines, Lecture Notes, A. Moore, CMU11. On the Surprising Behavior of Distance Metrics in High-dimensional Space, C.

Aggarwal, A. Hinneburg, and D. Keim, ICDT 2001 12. Adaptive Conformal Transformation for Learning Imbalanced Data, G. Wu, E.

Chang, International Conference on Machine Learning, August 2003