Download - Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…
![Page 1: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/1.jpg)
Some Common Nonlinear Learning Methods
![Page 2: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/2.jpg)
Learning methods we’ve seen so far:• Linear classifiers:–naïve Bayes, logistic regression, perceptron, support vector machines
• Linear regression– ridge regression, lasso (regularizers)
• Nonlinear classifiers:–neural networks
![Page 3: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/3.jpg)
On beyond linearity• Decision tree learning–What decision trees are–How to learn them–How to prune them
• Rule learning• Nearest neighbor learning
![Page 4: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/4.jpg)
DECISION TREE LEARNING: OVERVIEW
![Page 5: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/5.jpg)
Decision tree learning
![Page 6: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/6.jpg)
A decision tree
![Page 7: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/7.jpg)
Another format: a set of rulesif O=sunny and H<= 70 then PLAYelse if O=sunny and H>70 then DON’T_PLAYelse if O=overcast then PLAYelse if O=rain and windy then DON’T_PLAYelse if O=rain and !windy then PLAY
if O=sunny and H> 70 then DON’T_PLAYelse if O=rain and windy then DON’T_PLAYelse PLAY
Simpler rule set
One rule per leaf in the tree
![Page 8: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/8.jpg)
A regression tree
Play = 30m, 45min
Play = 0m, 0m, 15m
Play = 0m, 0m Play = 20m, 30m, 45m,
Play ~= 33
Play ~= 24
Play ~= 37
Play ~= 5
Play ~= 0
Play ~= 32
Play ~= 18
Play = 45m,45,60,40
Play ~= 48
![Page 9: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/9.jpg)
Motivations for trees and rules• Often you can find a fairly accurate classifier which is small and easy to understand.
– Sometimes this gives you useful insight into a problem, or helps you debug a feature set.• Sometimes features interact in complicated ways
– Trees can find interactions (e.g., “sunny and humid”) that linear classifiers can’t– Again, sometimes this gives you some insight into the problem.
• Trees are very inexpensive at test time– You don’t always even need to compute all the features of an example.– You can even build classifiers that take this into account….– Sometimes that’s important (e.g., “bloodPressure<100” vs “MRIScan=normal” might have different costs to compute).
![Page 10: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/10.jpg)
An example: using rules• On the Onion data…
![Page 11: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/11.jpg)
![Page 12: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/12.jpg)
Translation:
if “enlarge” is in the set-valued attribute wordsArticlethen class = fromOnion. this rule is correct 173 times, and never wrong…if “added” is in the set-valued attribute wordsArticleand “play” is in the set-valued attribute wordsArticlethen class = fromOnion. this rule is correct 6 times, and wrong once…
![Page 13: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/13.jpg)
![Page 14: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/14.jpg)
After cleaning ‘Enlarge Image’ lines
Also, 10-CV error rate increases from 1.4% to 6%
![Page 15: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/15.jpg)
Different Subcategories of Economist Articles
![Page 16: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/16.jpg)
![Page 17: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/17.jpg)
Motivations for trees and rules• Often you can find a fairly accurate classifier which is small and easy to understand.
– Sometimes this gives you useful insight into a problem, or helps you debug a feature set.• Sometimes features interact in complicated ways
– Trees can find interactions (e.g., “sunny and humid”) that linear classifiers can’t– Again, sometimes this gives you some insight into the problem.
• Trees are very inexpensive at test time– You don’t always even need to compute all the features of an example.– You can even build classifiers that take this into account….– Sometimes that’s important (e.g., “bloodPressure<100” vs “MRIScan=normal” might have different costs to compute).
![Page 18: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/18.jpg)
DECISION TREE LEARNING: THE ALGORITHM(S)
![Page 19: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/19.jpg)
Most decision tree learning algorithms1. Given dataset D:
– return leaf(y) if all examples are in the same class y … or nearly so– pick the best split, on the best attribute a
• a=c1 or a=c2 or …• a<θ or a ≥ θ• a or not(a)• a in {c1,…,ck} or not
– split the data into D1 ,D2 , … Dk and recursively build trees for each subset2. “Prune” the tree
![Page 20: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/20.jpg)
Most decision tree learning algorithms1. Given dataset D:
– return leaf(y) if all examples are in the same class y … or nearly so...– pick the best split, on the best attribute a
• a=c1 or a=c2 or …• a<θ or a ≥ θ• a or not(a)• a in {c1,…,ck} or not
– split the data into D1 ,D2 , … Dk and recursively build trees for each subset2. “Prune” the tree
Popular splitting criterion: try to lower entropy of the y labels on the resulting partition• i.e., prefer splits
that contain fewer labels, or that have very skewed distributions of labels
![Page 21: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/21.jpg)
82
Most decision tree learning algorithms• “Pruning” a tree
– avoid overfitting by removing subtrees somehow 15
13
![Page 22: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/22.jpg)
Most decision tree learning algorithms1. Given dataset D:
– return leaf(y) if all examples are in the same class y … or nearly so..– pick the best split, on the best attribute a
• a<θ or a ≥ θ• a or not(a)• a=c1 or a=c2 or …• a in {c1,…,ck} or not
– split the data into D1 ,D2 , … Dk and recursively build trees for each subset2. “Prune” the tree
Same idea15
13
![Page 23: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/23.jpg)
DECISION TREE LEARNING: BREAKING IT DOWN
![Page 24: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/24.jpg)
Breaking down decision tree learning• First: how to classify - assume everything is binary
function prPos = classifyTree(T, x)if T is a leaf node with counts n,p
prPos = (p + 1)/(p + n +2) -- Laplace smoothing
elsej = T.splitAttributeif x(j)==0 then prPos = classifyTree(T.leftSon, x)else prPos = classifyTree(T.rightSon, x)
![Page 25: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/25.jpg)
Breaking down decision tree learning• Reduced error pruning with information gain– Split the data D (2/3, 1/3) into Dgrow and Dprune– Build the tree recursively with Dgrow T = growTree(Dgrow)– Prune the tree with DpruneT’ = pruneTree(Dprune,T)– Return T’
![Page 26: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/26.jpg)
Breaking down decision tree learning• First: divide & conquer to build the tree with Dgrow function T = growTree(X,Y)
if |X|<10 or allOneLabel(Y) thenT = leafNode(|Y==0|,|Y==1|) -- counts for
n,pelse
for i = 1,…n -- for each attribute i
ai = X(: , i) -- column i of X
gain(i) = infoGain( Y, Y(ai==0), Y(ai==1) )j = argmax(gain); -- the best attributeaj = X(:, j)T = splitNode( growTree(X(aj==0),Y(aj==0)), -- left
son
growTree(X(aj==1),Y(aj==1)), --right sonargmax(gain) )
![Page 27: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/27.jpg)
Breaking down decision tree learning
function e = entropy(Y)n = |Y|; p0 = |Y==0|/n; p1 = |Y==1| /n;e = - p1*log(p1) - p2*log(p2)
![Page 28: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/28.jpg)
Breaking down decision tree learningfunction g = infoGain(Y,leftY,rightY)
n = |Y|; nLeft = |leftY|; nRight = |rightY|;g = entropy(Y) - (nLeft/n)*entropy(leftY) -
(nRight/n)*entropy(rightY)
function e = entropy(Y)n = |Y|; p0 = |Y==0|/n; p1 = |Y==1| /n;e = - p1*log(p1) - p2*log(p2)
• First: how to build the tree with Dgrow
![Page 29: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/29.jpg)
Breaking down decision tree learning• Reduced error pruning with information gain– Split the data D (2/3, 1/3) into Dgrow and Dprune– Build the tree recursively with Dgrow T = growTree(Dgrow)– Prune the tree with DpruneT’ = pruneTree(Dprune)– Return T’
![Page 30: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/30.jpg)
Breaking down decision tree learning• Next: how to prune the tree with Dprune– Estimate the error rate of every subtree on Dprune– Recursively traverse the tree:
• Reduce error on the left, right subtrees of T• If T would have lower error if it were converted to a leaf, convert T to a leaf.
![Page 31: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/31.jpg)
A decision treeWe’re using the fact that the examples for sibling subtrees are disjoint.
![Page 32: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/32.jpg)
Breaking down decision tree learning• To estimate error rates, classify the whole pruning set, and keep some counts
function classifyPruneSet(T, X, Y)T.pruneN = |Y==0|; T.pruneP = |Y==1|if T is not a leaf then
j = T.splitAttributeaj = X(:, j)classifyPruneSet( T.leftSon, X(aj==0), Y(aj==0) )classifyPruneSet( T.rightSon, X(aj==1), Y(aj==1) )
function e = errorsOnPruneSetAsLeaf(T):min(T.pruneN, T.pruneP)
![Page 33: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/33.jpg)
Breaking down decision tree learning• Next: how to prune the tree with Dprune– Estimate the error rate of every subtree on Dprune– Recursively traverse the tree:function T1 = pruned(T)
if T is a leaf then -- copy T, adding an error estimate T.minErrors
T1= leaf(T, errorsOnPruneSetAsLeaf(T)) else
e1 = errorsOnPruneSetAsLeaf(T)TLeft = pruned(T.leftSon); TRight =
pruned(T.rightSon);e2 = TLeft.minErrors + TRight.minErrors;if e1 <= e2 then T1 = leaf(T,e1) -- cp + add error
estimateelse T1 = splitNode(T, e2) -- cp + add error
estimate
![Page 34: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/34.jpg)
DECISION TREESVSLINEAR CLASSIFIERS
![Page 35: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/35.jpg)
Decision trees: plus and minus• Simple and fast to learn• Arguably easy to understand (if compact)• Very fast to use:–often you don’t even need to compute all attribute values
• Can find interactions between variables (play if it’s cool and sunny or ….) and hence non-linear decision boundaries• Don’t need to worry about how numeric values are scaled
![Page 36: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/36.jpg)
Decision trees: plus and minus• Hard to prove things about• Not well-suited to probabilistic extensions• Don’t (typically) improve over linear classifiers when you have lots of features• Sometimes fail badly on problems that linear classifiers perform well on–here’s an example
![Page 37: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/37.jpg)
![Page 38: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/38.jpg)
Another view of a decision tree
![Page 39: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/39.jpg)
Another view of a decision treeSepal_length<5.7
Sepal_width>2.8
![Page 40: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/40.jpg)
Another view of a decision tree
![Page 41: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/41.jpg)
Another view of a decision tree
![Page 42: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/42.jpg)
Fixing decision trees….• Hard to prove things about• Don’t (typically) improve over linear classifiers when you have lots of features• Sometimes fail badly on problems that linear classifiers perform well on–Solution is to build ensembles of decision trees
![Page 43: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/43.jpg)
Example: bagging …more later• Repeat T times:–Draw a bootstrap sample S (n examples taken with replacement) from the dataset D.–Build a tree on S• Don’t prune, in the experiments below….
• Vote the classifications of all the trees at the end• Ok - how well does this work?
![Page 44: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/44.jpg)
![Page 45: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/45.jpg)
![Page 46: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/46.jpg)
Generally, bagged decision
trees outperform
the linear classifier
eventually if the data is large enough and
clean enough.
clean data
noisy data
![Page 47: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/47.jpg)
RULE LEARNING: OVERVIEW
![Page 48: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/48.jpg)
Rules for Subcategories of Economist Articles
![Page 49: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/49.jpg)
Trees vs Rules• For every tree with L leaves, there is a corresponding rule set with L rules–So one way to learn rules is to extract them from trees.
• But:–Sometimes the extracted rules can be drastically simplified–For some rule sets, there is no tree that is nearly the same size–So rules are more expressive given a size constraint
• This motivated learning rules directly
![Page 50: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/50.jpg)
Separate and conquer rule-learning• Start with an empty rule set• Iteratively do this–Find a rule that works well on the data–Remove the examples “covered by” the rule (they satisfy the “if” part) from the data
• Stop when all data is covered by some rule• Possibly prune the rule set
On later iterations, the data is different
![Page 51: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/51.jpg)
Separate and conquer rule-learning• Start with an empty rule set• Iteratively do this–Find a rule that works well on the data• Start with an empty rule• Iteratively–Add a condition that is true for many positive and few negative examples–Stop when the rule covers no negative examples (or almost no negative examples)
–Remove the examples “covered by” the rule• Stop when all data is covered by some rule
![Page 52: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/52.jpg)
Separate and conquer rule-learningfuncton Rules = separateAndConquer(X,Y)
Rules = empty rule setwhile there are positive examples in X,Y not covered by rule R do
R = empty list of conditionsCoveredX = X; CoveredY = Y;-- specialize R until it covers only positive exampleswhile CoveredY contains some negative examples
-- compute the “gain” for each condition x(j)==1….
j = argmax(gain); aj=CoveredX(:,j);R = R conjoined with condition “x(j)==1” -- add
best condition-- remove examples not covered by R from
CoveredX,CoveredYCoveredX = X(aj); CoveredY = Y(aj);
Rules = Rules plus new rule R-- remove examples covered by R from X,Y…
![Page 53: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/53.jpg)
NEAREST NEIGHBOR LEARNING: OVERVIEW
![Page 54: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/54.jpg)
k-nearest neighbor learning
?
Given a test example x:1. Find the k training-set
examples (x1,y1),….,(xk,yk) that are closest to x.
2. Predict the most frequent label in that set.
?
![Page 55: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/55.jpg)
Breaking it down:• To train:–save the data
• To test:–For each test example x:1. Find the k training-set examples (x1,y1),
….,(xk,yk) that are closest to x.2. Predict the most frequent label in that set.
…you might build some indices….Very fast!
Prediction is relatively slow (compared to a linear classifier or decision tree)
![Page 56: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/56.jpg)
What is the decision boundary for 1-NN?
-++ +
- - -
- -
-
Voronoi Diagram
*
Each cell Ci is the set of all points that are closest to a particular example xi
![Page 57: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/57.jpg)
What is the decision boundary for 1-NN?
-++ +
- - -
- -
-
Voronoi Diagram
![Page 58: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/58.jpg)
Effect of k on decision boundary
![Page 59: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/59.jpg)
Overfitting and k-NN• When k is small, is the classifier complex, or simple?• How about when k is large?
Error/Loss on training set D
Error/Loss on an unseen test set Dtest
hi error
59large k small k
![Page 60: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/60.jpg)
Some common variants• Distance metrics: –Euclidean distance: ||x1 - x2||–Cosine distance: 1 - <x1,x2>/||x1||*||x2||• this is in [0,1]
• Weighted nearest neighbor (def 1):– Instead of most frequent y in k-NN predict
![Page 61: Some Common Nonlinear Learning Methods. Learning methods we’ve seen so far: Linear classifiers: –…](https://reader031.vdocuments.us/reader031/viewer/2022021915/5a4d1c147f8b9ab0599f89a8/html5/thumbnails/61.jpg)
Some common variants• Weighted nearest neighbor (def 2):– Instead of Euclidean distance use a weighted version
–Weights might be based on info gain,….