6.s093’visual’recogni4on’through’...

6.S093 Visual Recogni4on through Machine Learning Compe44on

Image by kirkh.deviantart.com

Joseph Lim and Aditya Khosla

Acknowledgment: Many slides from David Sontag and Machine Learning Group of UT Aus4n

What is Machine Learning???

What is Machine Learning??? According to the Wikipedia, Machine learning concerns the construction and study of systems that can learn from data.

What is ML?

•  Classifica4on – From data to discrete classes

What is ML?


•  Regression – From data to a con4nuous value

What is ML?



•  Ranking – Comparing items

What is ML?




•  Clustering – Discovering structure in data

What is ML?




•  Clustering – Discovering structure in data

•  Others

Classifica4on: data to discrete classes

•  Spam filtering

Spam Regular Urgent

Classifica4on: data to discrete classes

•  Object classifica4on

Fries Hamburger None

Regression: data to a value

•  Stock market

Regression: data to a value

•  Weather predic4on

Clustering: Discovering structure

Set of Images

Clustering

Clustering

•  Unsupervised learning •  Requires data, but NO LABELS •  Detect paTerns

– Group emails or search results – Regions of images

•  Useful when don’t know what you’re looking for

Clustering

•  Basic idea: group together similar instances •  Example: 2D point paTerns

Clustering

•  Basic idea: group together similar instances •  Example: 2D point paTerns

•  What could “similar” mean? – One op4on: small Euclidean distance – Clustering is crucially dependent on the measure of similarity (or distance) between “points”

K-‐Means •  An itera4ve clustering algorithm –  Ini4alize: Pick K random points as cluster centers

– Alternate: •  Assign data points to closest cluster center

•  Change the cluster center to the average of its assigned points

– Stop when no points’ assignments change

Animation is from Andrey A. Shabalin’s website

K-‐Means •  An itera4ve clustering algorithm

–  Ini4alize: Pick K random points as cluster centers – Alternate:

•  Assign data points to closest cluster center

•  Change the cluster center to the average of its assigned points

– Stop when no points’ assignments change

Proper4es of K-‐means algorithm

•  Guaranteed to converge in a finite number of itera4ons

•  Running 4me per itera4on: – Assign data points to closest cluster center

O(KN)

– Change the cluster center to the average of its assigned points O(N)

Example: K-‐Means for Segmenta4on K=2 Original

Goal of Segmentation is to partition an image into regions each of which has reasonably homogenous visual appearance.

Example: K-‐Means for Segmenta4on K=2 K=3 K=10 Original

Picalls of K-‐Means

•  K-‐means algorithm is heuris4c – Requires ini4al means –  It does ma2er what you pick!

•  K-‐means can get stuck

K=1 should be better K=2 should be better

Classification

Typical Recogni4on System

features classify +1 bus

-1 not bus

•  Extract features from an image •  A classifier will make a decision based on extracted features

x F(x) y

Classifica4on

•  Supervised learning •  Requires data, AND LABELS •  Useful when you know what you’re looking for

Linear classifier

•  Which of these linear separators is op4mal?

Maximum Margin Classifica4on •  Maximizing the margin is good according to intui4on and PAC

theory. •  Implies that only support vectors maTer; other training

examples are ignorable.

Classifica4on Margin •  Distance from example xi to the separator is •  Examples closest to the hyperplane are support vectors. •  Margin ρ of the separator is the distance between support

vectors.

wxw br i

T +=

r

ρ

Linear SVM Mathema4cally •  Let training set {(xi, yi)}i=1..n, xi ∈ Rd, yi ∈ {-‐1, 1} be separated by a

hyperplane with margin ρ. Then for each training example (xi, yi):

•  For every support vector xs the above inequality is an equality. Ajer rescaling w and b by ρ/2 in the equality, we obtain that distance between each xs and the hyperplane is

•  Then the margin can be expressed through (rescaled) w and b as:

wTxi + b ≤ - ρ/2 if yi = -1 wTxi + b ≥ ρ/2 if yi = 1

w22 == rρ

r = ys (wTxs + b)w

=1w

yi(wTxi + b) ≥ ρ/2 ⇔

Linear SVMs Mathema4cally (cont.)

•  Then we can formulate the quadra.c op.miza.on problem:

Find w and b such that

is maximized

and for all (xi, yi), i=1..n : yi(wTxi + b) ≥ 1

ρ =2w

Find w and b such that

Φ(w) = ||w||2=wTw is minimized

and for all (xi, yi), i=1..n : yi (wTxi + b) ≥ 1

Is this perfect?

Soj Margin Classifica4on

•  What if the training set is not linearly separable? •  Slack variables ξi can be added to allow misclassifica4on of difficult or noisy examples, resul4ng margin called so;.

ξi

ξi

Soj Margin Classifica4on Mathema4cally

•  The old formula4on:

•  Modified formula4on incorporates slack variables:

•  Parameter C can be viewed as a way to control overfilng: it “trades off” the rela4ve importance of maximizing the margin and filng the training data.

Find w and b such that Φ(w) =wTw is minimized and for all (xi ,yi), i=1..n : yi (wTxi + b) ≥ 1

Find w and b such that Φ(w) =wTw + CΣξi is minimized and for all (xi ,yi), i=1..n : yi (wTxi + b) ≥ 1 – ξi, , ξi ≥ 0

Exercise

•  Exercise 1. Implement K-‐means

•  Exercise 2. Play with SVM’s C-‐parameter

6.s093’visual’recogni4on’through’...

Documents