6.s093’visual’recogni4on’through’...
TRANSCRIPT
6.S093 Visual Recogni4on through Machine Learning Compe44on
Image by kirkh.deviantart.com
Joseph Lim and Aditya Khosla
Acknowledgment: Many slides from David Sontag and Machine Learning Group of UT Aus4n
What is Machine Learning???
What is Machine Learning??? According to the Wikipedia, Machine learning concerns the construction and study of systems that can learn from data.
What is ML?
• Classifica4on – From data to discrete classes
What is ML?
• Classifica4on – From data to discrete classes
• Regression – From data to a con4nuous value
What is ML?
• Classifica4on – From data to discrete classes
• Regression – From data to a con4nuous value
• Ranking – Comparing items
What is ML?
• Classifica4on – From data to discrete classes
• Regression – From data to a con4nuous value
• Ranking – Comparing items
• Clustering – Discovering structure in data
What is ML?
• Classifica4on – From data to discrete classes
• Regression – From data to a con4nuous value
• Ranking – Comparing items
• Clustering – Discovering structure in data
• Others
Classifica4on: data to discrete classes
• Spam filtering
Spam Regular Urgent
Classifica4on: data to discrete classes
• Object classifica4on
Fries Hamburger None
Regression: data to a value
• Stock market
Regression: data to a value
• Weather predic4on
Clustering: Discovering structure
Set of Images
Clustering
Clustering
• Unsupervised learning • Requires data, but NO LABELS • Detect paTerns
– Group emails or search results – Regions of images
• Useful when don’t know what you’re looking for
Clustering
• Basic idea: group together similar instances • Example: 2D point paTerns
Clustering
• Basic idea: group together similar instances • Example: 2D point paTerns
Clustering
• Basic idea: group together similar instances • Example: 2D point paTerns
Clustering
• Basic idea: group together similar instances • Example: 2D point paTerns
• What could “similar” mean? – One op4on: small Euclidean distance – Clustering is crucially dependent on the measure of similarity (or distance) between “points”
K-‐Means • An itera4ve clustering algorithm – Ini4alize: Pick K random points as cluster centers
– Alternate: • Assign data points to closest cluster center
• Change the cluster center to the average of its assigned points
– Stop when no points’ assignments change
Animation is from Andrey A. Shabalin’s website
K-‐Means • An itera4ve clustering algorithm
– Ini4alize: Pick K random points as cluster centers – Alternate:
• Assign data points to closest cluster center
• Change the cluster center to the average of its assigned points
– Stop when no points’ assignments change
Proper4es of K-‐means algorithm
• Guaranteed to converge in a finite number of itera4ons
• Running 4me per itera4on: – Assign data points to closest cluster center
O(KN)
– Change the cluster center to the average of its assigned points O(N)
Example: K-‐Means for Segmenta4on K=2 Original
Goal of Segmentation is to partition an image into regions each of which has reasonably homogenous visual appearance.
Example: K-‐Means for Segmenta4on K=2 K=3 K=10 Original
Picalls of K-‐Means
• K-‐means algorithm is heuris4c – Requires ini4al means – It does ma2er what you pick!
• K-‐means can get stuck
K=1 should be better K=2 should be better
Classification
Typical Recogni4on System
features classify +1 bus
-1 not bus
• Extract features from an image • A classifier will make a decision based on extracted features
x F(x) y
Typical Recogni4on System
features classify +1 bus
-1 not bus
• Extract features from an image • A classifier will make a decision based on extracted features
x F(x) y
Classifica4on
• Supervised learning • Requires data, AND LABELS • Useful when you know what you’re looking for
Linear classifier
• Which of these linear separators is op4mal?
Maximum Margin Classifica4on • Maximizing the margin is good according to intui4on and PAC
theory. • Implies that only support vectors maTer; other training
examples are ignorable.
Classifica4on Margin • Distance from example xi to the separator is • Examples closest to the hyperplane are support vectors. • Margin ρ of the separator is the distance between support
vectors.
wxw br i
T +=
r
ρ
Linear SVM Mathema4cally • Let training set {(xi, yi)}i=1..n, xi ∈ Rd, yi ∈ {-‐1, 1} be separated by a
hyperplane with margin ρ. Then for each training example (xi, yi):
• For every support vector xs the above inequality is an equality. Ajer rescaling w and b by ρ/2 in the equality, we obtain that distance between each xs and the hyperplane is
• Then the margin can be expressed through (rescaled) w and b as:
wTxi + b ≤ - ρ/2 if yi = -1 wTxi + b ≥ ρ/2 if yi = 1
w22 == rρ
r = ys (wTxs + b)w
=1w
yi(wTxi + b) ≥ ρ/2 ⇔
Linear SVMs Mathema4cally (cont.)
• Then we can formulate the quadra.c op.miza.on problem:
Find w and b such that
is maximized
and for all (xi, yi), i=1..n : yi(wTxi + b) ≥ 1
ρ =2w
Find w and b such that
Φ(w) = ||w||2=wTw is minimized
and for all (xi, yi), i=1..n : yi (wTxi + b) ≥ 1
Is this perfect?
Is this perfect?
Soj Margin Classifica4on
• What if the training set is not linearly separable? • Slack variables ξi can be added to allow misclassifica4on of difficult or noisy examples, resul4ng margin called so;.
ξi
ξi
Soj Margin Classifica4on Mathema4cally
• The old formula4on:
• Modified formula4on incorporates slack variables:
• Parameter C can be viewed as a way to control overfilng: it “trades off” the rela4ve importance of maximizing the margin and filng the training data.
Find w and b such that Φ(w) =wTw is minimized and for all (xi ,yi), i=1..n : yi (wTxi + b) ≥ 1
Find w and b such that Φ(w) =wTw + CΣξi is minimized and for all (xi ,yi), i=1..n : yi (wTxi + b) ≥ 1 – ξi, , ξi ≥ 0
Exercise
• Exercise 1. Implement K-‐means
• Exercise 2. Play with SVM’s C-‐parameter