![Page 1: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/1.jpg)
k-Nearest Neighbors+
Model Selection
1
10-601 Introduction to Machine Learning
Matt GormleyLecture 5
Sep. 11, 2019
Machine Learning DepartmentSchool of Computer ScienceCarnegie Mellon University
![Page 2: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/2.jpg)
Reminders
• Homework 2: Decision Trees– Out: Wed, Sep. 04– Due: Wed, Sep. 18 at 11:59pm
• Today’s Poll: – http://p5.mlcourse.org
4
![Page 3: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/3.jpg)
Moss Cheat Checker
![Page 4: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/4.jpg)
What is Moss?• Moss (Measure Of Software Similarity): is an
automatic system for determining the similarity of programs. To date, the main application of Moss has been in detecting plagiarism in programming classes.
• Moss reports:– The Andrew IDs associated with the file submissions– The number of lines matched– The percent lines matched– Color coded submissions where similarities are
found
![Page 5: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/5.jpg)
What is Moss?
At first glance, the submissions may look different
![Page 6: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/6.jpg)
What is Moss?Moss can quickly find the similarities
![Page 7: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/7.jpg)
K-NEAREST NEIGHBORS
10
![Page 8: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/8.jpg)
11
![Page 9: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/9.jpg)
k-Nearest Neighbors
Chalkboard:– Nearest Neighbor classifier– KNN for binary classification
12
![Page 10: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/10.jpg)
KNN: RemarksDistance Functions:• KNN requires a distance function
• The most common choice is Euclidean distance
• But other choices are just fine (e.g. Manhattan distance)
14
![Page 11: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/11.jpg)
KNN: Remarks
15
In-Class Exercises1. How can we handle ties
for even values of k?
2. What is the inductive bias of KNN?
Answer(s) Here:
![Page 12: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/12.jpg)
KNN: RemarksComputational Efficiency:
• Suppose we have N training examples, and each one has M features
• Computational complexity for the special case where k=1:
18
Task Naive k-d Tree
Train O(1) ~ O(M N log N)
Predict (one test example)
O(MN) ~ O(2M log N) on average
Problem: Very fast for small M, but very slow for large M
In practice: use stochastic approximations (very fast, and empirically often as good)
![Page 13: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/13.jpg)
KNN: Remarks
Theoretical Guarantees:
19
Cover & Hart (1967)
Let h(x) be a Nearest Neighbor (k=1) binary classifier. As the number of training examples N goes to infinity…
errortrue(h) < 2 x Bayes Error Rate
“In this sense, it may be said that half the classification information in an infinite sample set is contained in the nearest neighbor.”
very informally, Bayes Error Rate can be thought of as:‘the best you could possibly do’
![Page 14: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/14.jpg)
Decision Boundary Example
21
In-Class Exercise
Dataset: Outputs {+,-}; Features x1 and x2
Question 1:a) Can a k-Nearest Neighbor classifier
with k=1 achieve zero training error on this dataset?
b) If ‘Yes’, draw the learned decision boundary. If ‘No’, why not?
Question 2:a) Can a Decision Tree classifier achieve
zero training error on this dataset?b) If ‘Yes’, draw the learned decision
boundary for k-Nearest Neighbors with k=1. If ‘No’, why not?
x1
x2
x1
x2
![Page 15: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/15.jpg)
KNN ON FISHER IRIS DATA
22
![Page 16: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/16.jpg)
Fisher Iris DatasetFisher (1936) used 150 measurements of flowers from 3 different species: Iris setosa (0), Iris virginica (1), Iris versicolor (2) collected by Anderson (1936)
23Full dataset: https://en.wikipedia.org/wiki/Iris_flower_data_set
Species Sepal Length
Sepal Width
Petal Length
Petal Width
0 4.3 3.0 1.1 0.1
0 4.9 3.6 1.4 0.1
0 5.3 3.7 1.5 0.2
1 4.9 2.4 3.3 1.0
1 5.7 2.8 4.1 1.3
1 6.3 3.3 4.7 1.6
1 6.7 3.0 5.0 1.7
![Page 17: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/17.jpg)
Fisher Iris Dataset
Fisher (1936) used 150 measurements of flowers from 3 different species: Iris setosa (0), Iris
virginica (1), Iris versicolor (2) collected by
Anderson (1936)
24
Full dataset: https://en.wikipedia.org/wiki/Iris_flower_data_set
Species Sepal Length
Sepal Width
0 4.3 3.0
0 4.9 3.6
0 5.3 3.7
1 4.9 2.4
1 5.7 2.8
1 6.3 3.3
1 6.7 3.0
Deleted two of the
four features, so that
input space is 2D
![Page 18: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/18.jpg)
KNN on Fisher Iris Data
25
![Page 19: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/19.jpg)
KNN on Fisher Iris Data
29
Special Case: Nearest Neighbor
![Page 20: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/20.jpg)
KNN on Fisher Iris Data
30
![Page 21: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/21.jpg)
KNN on Fisher Iris Data
31
![Page 22: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/22.jpg)
KNN on Fisher Iris Data
32
![Page 23: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/23.jpg)
KNN on Fisher Iris Data
33
![Page 24: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/24.jpg)
KNN on Fisher Iris Data
34
![Page 25: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/25.jpg)
KNN on Fisher Iris Data
35
![Page 26: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/26.jpg)
KNN on Fisher Iris Data
36
![Page 27: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/27.jpg)
KNN on Fisher Iris Data
37
![Page 28: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/28.jpg)
KNN on Fisher Iris Data
38
![Page 29: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/29.jpg)
KNN on Fisher Iris Data
39
![Page 30: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/30.jpg)
KNN on Fisher Iris Data
40
![Page 31: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/31.jpg)
KNN on Fisher Iris Data
41
![Page 32: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/32.jpg)
KNN on Fisher Iris Data
42
![Page 33: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/33.jpg)
KNN on Fisher Iris Data
43
![Page 34: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/34.jpg)
KNN on Fisher Iris Data
44
![Page 35: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/35.jpg)
KNN on Fisher Iris Data
45
![Page 36: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/36.jpg)
KNN on Fisher Iris Data
46
![Page 37: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/37.jpg)
KNN on Fisher Iris Data
47
![Page 38: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/38.jpg)
KNN on Fisher Iris Data
48
![Page 39: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/39.jpg)
KNN on Fisher Iris Data
49
Special Case: Majority Vote
![Page 40: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/40.jpg)
KNN ON GAUSSIAN DATA
50
![Page 41: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/41.jpg)
KNN on Gaussian Data
51
![Page 42: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/42.jpg)
KNN on Gaussian Data
52
![Page 43: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/43.jpg)
KNN on Gaussian Data
53
![Page 44: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/44.jpg)
KNN on Gaussian Data
54
![Page 45: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/45.jpg)
KNN on Gaussian Data
55
![Page 46: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/46.jpg)
KNN on Gaussian Data
56
![Page 47: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/47.jpg)
KNN on Gaussian Data
57
![Page 48: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/48.jpg)
KNN on Gaussian Data
58
![Page 49: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/49.jpg)
KNN on Gaussian Data
59
![Page 50: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/50.jpg)
KNN on Gaussian Data
60
![Page 51: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/51.jpg)
KNN on Gaussian Data
61
![Page 52: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/52.jpg)
KNN on Gaussian Data
62
![Page 53: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/53.jpg)
KNN on Gaussian Data
63
![Page 54: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/54.jpg)
KNN on Gaussian Data
64
![Page 55: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/55.jpg)
KNN on Gaussian Data
65
![Page 56: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/56.jpg)
KNN on Gaussian Data
66
![Page 57: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/57.jpg)
KNN on Gaussian Data
67
![Page 58: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/58.jpg)
KNN on Gaussian Data
68
![Page 59: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/59.jpg)
KNN on Gaussian Data
69
![Page 60: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/60.jpg)
KNN on Gaussian Data
70
![Page 61: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/61.jpg)
KNN on Gaussian Data
71
![Page 62: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/62.jpg)
KNN on Gaussian Data
72
![Page 63: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/63.jpg)
KNN on Gaussian Data
73
![Page 64: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/64.jpg)
KNN on Gaussian Data
74
![Page 65: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/65.jpg)
K-NEAREST NEIGHBORS
76
![Page 66: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/66.jpg)
Questions
• How could k-Nearest Neighbors (KNN) be applied to regression?
• Can we do better than majority vote? (e.g. distance-weighted KNN)
• Where does the Cover & Hart (1967) Bayes error rate bound come from?
77
![Page 67: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/67.jpg)
KNN Learning Objectives
You should be able to…• Describe a dataset as points in a high dimensional space
[CIML]• Implement k-Nearest Neighbors with O(N) prediction• Describe the inductive bias of a k-NN classifier and relate
it to feature scale [a la. CIML]• Sketch the decision boundary for a learning algorithm
(compare k-NN and DT)• State Cover & Hart (1967)'s large sample analysis of a
nearest neighbor classifier• Invent "new" k-NN learning algorithms capable of dealing
with even k• Explain computational and geometric examples of the
curse of dimensionality
78
![Page 68: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/68.jpg)
MODEL SELECTION
79
![Page 69: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/69.jpg)
Model Selection
WARNING: • In some sense, our discussion of model
selection is premature. • The models we have considered thus far are
fairly simple.• The models and the many decisions available
to the data scientist wielding them will grow to be much more complex than what we’ve seen so far.
80
![Page 70: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/70.jpg)
Model Selection
Statistics• Def: a model defines the data
generation process (i.e. a set or family of parametric probability distributions)
• Def: model parameters are the values that give rise to a particular probability distribution in the model family
• Def: learning (aka. estimation) is the process of finding the parameters that best fit the data
• Def: hyperparameters are the parameters of a prior distribution over parameters
Machine Learning• Def: (loosely) a model defines the
hypothesis space over which learning performs its search
• Def: model parameters are the numeric values or structure selected by the learning algorithm that give rise to a hypothesis
• Def: the learning algorithm defines the data-driven search over the hypothesis space (i.e. search for good parameters)
• Def: hyperparameters are the tunable aspects of the model, that the learning algorithm does notselect
81
![Page 71: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/71.jpg)
Model Selection
Machine Learning• Def: (loosely) a model defines the
hypothesis space over which learning performs its search
• Def: model parameters are the numeric values or structure selected by the learning algorithm that give rise to a hypothesis
• Def: the learning algorithm defines the data-driven search over the hypothesis space (i.e. search for good parameters)
• Def: hyperparameters are the tunable aspects of the model, that the learning algorithm does notselect
82
• model = set of all possible trees, possibly restricted by some hyperparameters (e.g. max depth)
• parameters = structure of a specific decision tree
• learning algorithm = ID3, CART, etc.
• hyperparameters = max-depth, threshold for splitting criterion, etc.
Example: Decision Tree
![Page 72: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/72.jpg)
Model Selection
Machine Learning• Def: (loosely) a model defines the
hypothesis space over which learning performs its search
• Def: model parameters are the numeric values or structure selected by the learning algorithm that give rise to a hypothesis
• Def: the learning algorithm defines the data-driven search over the hypothesis space (i.e. search for good parameters)
• Def: hyperparameters are the tunable aspects of the model, that the learning algorithm does notselect
83
• model = set of all possible nearest neighbors classifiers
• parameters = none (KNN is an instance-based or non-parametric method)
• learning algorithm = for naïve setting, just storing the data
• hyperparameters = k, the number of neighbors to consider
Example: k-Nearest Neighbors
![Page 73: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/73.jpg)
Model Selection
Machine Learning• Def: (loosely) a model defines the
hypothesis space over which learning performs its search
• Def: model parameters are the numeric values or structure selected by the learning algorithm that give rise to a hypothesis
• Def: the learning algorithm defines the data-driven search over the hypothesis space (i.e. search for good parameters)
• Def: hyperparameters are the tunable aspects of the model, that the learning algorithm does notselect
84
• model = set of all linear separators
• parameters = vector of weights (one for each feature)
• learning algorithm = mistake based updates to the parameters
• hyperparameters = none (unless using some variant such as averaged perceptron)
Example: Perceptron
![Page 74: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/74.jpg)
Model Selection
Statistics• Def: a model defines the data
generation process (i.e. a set or family of parametric probability distributions)
• Def: model parameters are the values that give rise to a particular probability distribution in the model family
• Def: learning (aka. estimation) is the process of finding the parameters that best fit the data
• Def: hyperparameters are the parameters of a prior distribution over parameters
Machine Learning• Def: (loosely) a model defines the
hypothesis space over which learning performs its search
• Def: model parameters are the numeric values or structure selected by the learning algorithm that give rise to a hypothesis
• Def: the learning algorithm defines the data-driven search over the hypothesis space (i.e. search for good parameters)
• Def: hyperparameters are the tunable aspects of the model, that the learning algorithm does notselect
85
If “learning” is all about picking the best
parameters how do we pick the best
hyperparameters?
![Page 75: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/75.jpg)
Model Selection• Two very similar definitions:– Def: model selection is the process by which we choose
the “best” model from among a set of candidates– Def: hyperparameter optimization is the process by
which we choose the “best” hyperparameters from among a set of candidates (could be called a special case of model selection)
• Both assume access to a function capable of measuring the quality of a model
• Both are typically done “outside” the main training algorithm --- typically training is treated as a black box
86
![Page 76: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/76.jpg)
Example of Hyperparameter Opt.
Chalkboard:– Special cases of k-Nearest Neighbors– Choosing k with validation data– Choosing k with cross-validation
87
![Page 77: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/77.jpg)
Cross-ValidationCross validation is a method of estimating loss on held out data
Input: training data, learning algorithm, loss function (e.g. 0/1 error)Output: an estimate of loss function on held-out data
Key idea: rather than just a single “validation” set, use many! (Error is more stable. Slower computation.)
88
D = y(1)
y(2)
y(N)
x(1)
x(2)
x(N)
Fold 1
Fold 2
Fold 3
Fold 4
Algorithm: Divide data into folds (e.g. 4)1. Train on folds {1,2,3} and
predict on {4}2. Train on folds {1,2,4} and
predict on {3}3. Train on folds {1,3,4} and
predict on {2}4. Train on folds {2,3,4} and
predict on {1}Concatenate all the predictions and evaluate loss (almostequivalent to averaging loss over the folds)
![Page 78: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/78.jpg)
Model SelectionWARNING (again):– This section is only scratching the surface!– Lots of methods for hyperparameter
optimization: (to talk about later)• Grid search• Random search• Bayesian optimization• Graduate-student descent• …
Main Takeaway: – Model selection / hyperparameter optimization
is just another form of learning
89
![Page 79: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/79.jpg)
Model Selection Learning ObjectivesYou should be able to…• Plan an experiment that uses training, validation, and
test datasets to predict the performance of a classifier on unseen data (without cheating)
• Explain the difference between (1) training error, (2) validation error, (3) cross-validation error, (4) test error, and (5) true error
• For a given learning technique, identify the model, learning algorithm, parameters, and hyperparamters
• Define "instance-based learning" or "nonparametric methods"
• Select an appropriate algorithm for optimizing (aka. learning) hyperparameters
90
![Page 80: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/80.jpg)
THE PERCEPTRON ALGORITHM
91
![Page 81: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/81.jpg)
Perceptron: HistoryImagine you are trying to build a new machine learning technique… your name is Frank Rosenblatt…and the year is 1957
92
![Page 82: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/82.jpg)
Perceptron: HistoryImagine you are trying to build a new machine learning technique… your name is Frank Rosenblatt…and the year is 1957
93
![Page 83: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/83.jpg)
Key idea: Try to learn this hyperplane directly
Linear Models for Classification
Directly modeling the hyperplane would use a decision function:
for:
h( ) = sign(�T )
y � {�1, +1}
Looking ahead: • We’ll see a number of
commonly used Linear Classifiers
• These include:– Perceptron– Logistic Regression– Naïve Bayes (under
certain conditions)– Support Vector
Machines
![Page 84: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/84.jpg)
Geometry
In-Class ExerciseDraw a picture of the region corresponding to:
Draw the vectorw = [w1, w2]
95
Answer Here:
![Page 85: Machine Learning Department School of Computer Science ...mgormley/courses/10601-f19/slides/... · Machine Learning • Def: (loosely) a modeldefines the hypothesis space over which](https://reader033.vdocuments.us/reader033/viewer/2022060516/5f9061e2d74fca652171c69e/html5/thumbnails/85.jpg)
Visualizing Dot-Products
Chalkboard:– vector in 2D– line in 2D– adding an intercept term– definition of orthogonality– hyperplane definition– half-space definitions– vector projection
96