nov 23rd, 2001copyright © 2001, 2003, andrew w. moore linear document classifier
TRANSCRIPT
Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore
Linear Document Classifier
Support Vector Machines: Slide 2Copyright © 2001, 2003, Andrew W. Moore
Linear Classifiers
• Binary classification • y=+1 for positive class, y=-1 for negative class
• Vector representation for documents
denotes +1
denotes -1 How would you classify this data?
b
a
Support Vector Machines: Slide 3Copyright © 2001, 2003, Andrew W. Moore
Linear Classifiers
• Binary classification • y=+1 for positive class, y=-1 for negative class
• Vector representation for documents
denotes +1
denotes -1 How would you classify this data?
+
f(d)
Support Vector Machines: Slide 4Copyright © 2001, 2003, Andrew W. Moore
Decision Boundary
d1
d2
d4
d3
f(d)
1. How to classify documents using f(d)?
2. How to find the line f(d) ?
• wa and wb are the weights for word a and b
a
b
Support Vector Machines: Slide 5Copyright © 2001, 2003, Andrew W. Moore
How to Classify Documents ?
d1
d2
d4
d3
f(d)
• wa and wb are the weights for word a and b
a
b
Support Vector Machines: Slide 6Copyright © 2001, 2003, Andrew W. Moore
Decision Boundary
d1
d2
d4
d3
f(d)
1. How to classify documents using f(d)?
2. How to find the line f(d) ?
• wa and wb are the weights for word a and b
a
b
Support Vector Machines: Slide 7
Perception Algorithm
• Initialize • Repeat
• Receive a labeled document (d, y) (y=+1 or -1)
• Check if doc d is classified correctly•yf(d) > 0 ?
• Yes: do nothing• No:
d1
d2
d4
d3
f(d)
y w w d
b
a
Support Vector Machines: Slide 8
Perception Algorithm
• Initialize • Repeat
• Receive a labeled document (d, y) (y=+1 or -1)
• Check if doc d is classified correctly•yf(d) > 0 ?
• Yes: do nothing• No:
d1
d2
d4
d3
f(d)
y w w d
b
a
Support Vector Machines: Slide 9
Perception Algorithm
• Initialize • Repeat
• Receive a labeled document (d, y) (y=+1 or -1)
• Check if doc d is classified correctly•yf(d) > 0 ?
• Yes: do nothing• No:
d1
d2
d4
d3
f(d)
y w w d
b
a
Support Vector Machines: Slide 1010
Geometrical Interpretation
Support Vector Machines: Slide 11Copyright © 2001, 2003, Andrew W. Moore
Linear Classifiers
denotes +1
denotes -1
How would you classify this data?
f(d)
Support Vector Machines: Slide 12Copyright © 2001, 2003, Andrew W. Moore
Linear Classifiers
denotes +1
denotes -1
How would you classify this data?
f(d)
Support Vector Machines: Slide 13Copyright © 2001, 2003, Andrew W. Moore
Linear Classifiers
denotes +1
denotes -1
Any of these would be fine..
..but which is best?
Support Vector Machines: Slide 14Copyright © 2001, 2003, Andrew W. Moore
Classifier Margin
denotes +1
denotes -1 Define the margin of a linear classifier as the width that the boundary could be increased by before hitting a datapoint.
Support Vector Machines: Slide 15Copyright © 2001, 2003, Andrew W. Moore
Maximum Margin
denotes +1
denotes -1 The maximum margin linear classifier is the linear classifier with the, maximum margin.
Support Vector Machines: Slide 16Copyright © 2001, 2003, Andrew W. Moore
Maximum Margin
denotes +1
denotes -1 The maximum margin linear classifier is the linear classifier with the, maximum margin.
Called Linear Support Vector Machine (SVM)
Support Vector Machines: Slide 17Copyright © 2001, 2003, Andrew W. Moore
Empirical Studies with Text Categorization
• 10 Categories from Reuters-21578
• For a few categories, the SVM method significantly outperforms the KNN approach
Category KNN SVM
earn 97.3 98.0
acq 92.0 93.6
money-fx 78.2 74.5
grain 82.2 94.6
crude 85.7 88.9
trade 77.4 75.9
interest 74.0 77.7
ship 79.2 85.6
wheat 76.6 91.8
corn 77.9 90.3
Classification accuracy
Support Vector Machines: Slide 18Copyright © 2001, 2003, Andrew W. Moore
Doing multi-class classification
• SVMs can only handle two-class outputs (i.e. a categorical output variable with arity 2).
• How to handle multiple classes• E.g., classify documents into three
categories: sports, business, politics
Support Vector Machines: Slide 19Copyright © 2001, 2003, Andrew W. Moore
Doing multi-class classification
• SVMs can only handle two-class outputs (i.e. a categorical output variable with arity 2).
• How to handle multiple classes• E.g., classify documents into three
categories: sports, business, politics• Answer: one-vs-all, learn N SVM’s
• SVM 1 learns “Output==1” vs “Output != 1”• SVM 2 learns “Output==2” vs “Output != 2”• :• SVM N learns “Output==N” vs “Output != N”
Support Vector Machines: Slide 20
One-vs-All • vs the other classes: red(d)
Copyright © 2001, 2003, Andrew W. Moore
Support Vector Machines: Slide 21
One-vs-All • vs the other classes: red(d)• vs the other classes: yellow(d)
Copyright © 2001, 2003, Andrew W. Moore
Support Vector Machines: Slide 22
One-vs-All • vs the other classes: red(d)• vs the other classes: yellow(d)• vs the other classes: cyan(d)
Copyright © 2001, 2003, Andrew W. Moore
Support Vector Machines: Slide 23
One-vs-All • vs the other classes: red(d)• vs the other classes: yellow(d)• vs the other classes: cyan(d)• Given a test document d, how to decide its
color ?
Copyright © 2001, 2003, Andrew W. Moore
Support Vector Machines: Slide 24
One-vs-All • vs the other classes: red(d)• vs the other classes: yellow(d)• vs the other classes: cyan(d)• Given a test document d, how to decide its
color ?• Assign d to the color function with the largest
score
Copyright © 2001, 2003, Andrew W. Moore
Support Vector Machines: Slide 25Copyright © 2001, 2003, Andrew W. Moore
Suppose we’re in 1-dimension
What would SVMs do with this data?
x=0
Support Vector Machines: Slide 26Copyright © 2001, 2003, Andrew W. Moore
Suppose we’re in 1-dimension
Not a big surprise
x=0
Support Vector Machines: Slide 27Copyright © 2001, 2003, Andrew W. Moore
Harder 1-dimensional dataset
What can be done about this?
x=0
Support Vector Machines: Slide 28Copyright © 2001, 2003, Andrew W. Moore
Harder 1-dimensional datasetExpand from one
dimensional space to a two dimensional space
x=0
x2
x
Support Vector Machines: Slide 29Copyright © 2001, 2003, Andrew W. Moore
Harder 1-dimensional datasetExpand from one
dimensional space to a two dimensional space
x=0
x2
x
Kernel trick: expand the dimensionality by a kernel function
Support Vector Machines: Slide 30Copyright © 2001, 2003, Andrew W. Moore
Nonlinear Kernel (I)
Support Vector Machines: Slide 31Copyright © 2001, 2003, Andrew W. Moore
Nonlinear Kernel (II)
Support Vector Machines: Slide 32
Software for SVM• SVMlight (http://svmlight.joachims.org/)• Libsvm (http://www.csie.ntu.edu.tw/~cjlin/libsvm/)
• It is faster than SVMlight• Sparse data representation
• The occurrences of most words in a document are zero
• <label> <index1>:<value1> <index2>:<value2>
Copyright © 2001, 2003, Andrew W. Moore
class label word-id-1: word-occurrence
Support Vector Machines: Slide 33
Software for SVM• SVMlight (http://svmlight.joachims.org/)• Libsvm (
http://www.csie.ntu.edu.tw/~cjlin/libsvm/)• It is faster than SVMlight
• Sparse data representation• The occurrences of most words in a document
are zero• Example
• D = (‘hello’: 2, ‘world’: 3), negative document• Wor-id for `hello’ is 100, word-id for ‘world’ is
54• -1 100:2 54:3
Copyright © 2001, 2003, Andrew W. Moore