![Page 1: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/1.jpg)
UVACS4501:MachineLearning
Lecture10:K-nearest-neighborClassifier/Bias-VarianceTradeoff
Dr.Yanjun Qi
UniversityofVirginiaDepartmentofComputerScience
10/6/18
Dr.YanjunQi/UVACS
1
![Page 2: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/2.jpg)
Wherearewe?èFivemajorsectionsofthiscourse
q Regression(supervised)q Classification(supervised)q Unsupervisedmodelsq Learningtheoryq Graphicalmodels
10/6/18 2
Dr.YanjunQi/UVACS
![Page 3: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/3.jpg)
Threemajorsectionsforclassification
• We can divide the large variety of classification approaches into roughly three major types
1. Discriminative- directly estimate a decision rule/boundary- e.g., support vector machine, decision tree, logistic regression, - e.g. neural networks (NN), deep NN
2. Generative:- build a generative statistical model- e.g., Bayesian networks, Naïve Bayes classifier
3. Instance based classifiers- Use observation directly (no models)- e.g. K nearest neighbors
10/6/18 3
Dr.YanjunQi/UVACS
![Page 4: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/4.jpg)
Today:
ü K-nearestneighborü ModelSelection/BiasVarianceTradeoff
ü Bias-Variancetradeoffü Highbias?Highvariance?Howtorespond?
10/6/18 4
Dr.YanjunQi/UVACS
![Page 5: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/5.jpg)
Unknown recordNearest neighbor classifiers
Requiresthree inputs:1. Thesetofstored
trainingsamples2. Distancemetricto
computedistancebetweensamples
3. Thevalueofk,i.e.,thenumberofnearestneighborstoretrieve
10/6/18
Dr.YanjunQi/UVACS
5
![Page 6: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/6.jpg)
Nearest neighbor classifiers
Toclassifyunknownsample:1. Computedistanceto
trainingrecords2. Identifyk nearest
neighbors3. Useclasslabelsofnearest
neighborstodeterminetheclasslabelofunknownrecord(e.g.,bytakingmajorityvote)
Unknown record
10/6/18
Dr.YanjunQi/UVACS
6
![Page 7: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/7.jpg)
Definition of nearest neighbor
X X X
(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor
k-nearestneighborsofasamplexaredatapoints thathavethek smallestdistancestox
10/6/18
Dr.YanjunQi/UVACS
7
![Page 8: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/8.jpg)
10/6/18
Dr.YanjunQi/UVACS
8
![Page 9: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/9.jpg)
10/6/18
Dr.YanjunQi/UVACS
9
![Page 10: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/10.jpg)
1-nearest neighbor for classification
Voronoi diagram:partitioning of a
plane into regions based on distance to points in a specific subset of the plane.
10/6/18
Dr.YanjunQi/UVACS
10
![Page 11: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/11.jpg)
• Choosing the value of k:– If k is too small, sensitive to noise points– If k is too large, neighborhood may include (many)
points from other classes
Model Selection for Nearest neighbor classification
X
10/6/18
Dr.YanjunQi/UVACS
11
![Page 12: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/12.jpg)
• Choosing the value of k:– If k is too small, sensitive to noise points– If k is too large, neighborhood may include points
from other classes
Model Selection for Nearest neighbor classification
X
10/6/18
Dr.YanjunQi/UVACS
12
![Page 13: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/13.jpg)
• Scaling issues– Attributes may have to be scaled to prevent
distance measures from being dominated by one of the attributes
– Example:• height of a person may vary from 1.5 m to 1.8 m• weight of a person may vary from 90 lb to 300 lb• income of a person may vary from $10K to $1M
Challenge in Nearest neighbor classification
10/6/18
Dr.YanjunQi/UVACS
13
![Page 14: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/14.jpg)
• k-Nearest neighbor classifier is a lazylearner – Does not build model explicitly.– Classifying unknown samples is relatively
expensive.
• k-Nearest neighbor classifier is a localmodel, vs. global model like linear classifiers.
Nearest neighbor classification
10/6/18
Dr.YanjunQi/UVACS
14
![Page 15: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/15.jpg)
K-NearestNeighborforRegression?
• ForKNN– Decisionisdelayedtillanewinstancearrives– Targetvariablemaybediscreteorreal-valued
• Whentargetiscontinuous,thenaïvepredictionisthemeanvalueoftheknearesttrainingexamples
![Page 16: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/16.jpg)
K=5-NearestNeighbor(1Dinput)forRegression
10/6/18
Dr.YanjunQi/UVACS
16
![Page 17: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/17.jpg)
NearestNeighbor(1Dinput)forRegression
10/6/18
Dr.YanjunQi/UVACS
17
![Page 18: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/18.jpg)
NearestNeighbor(1Dinput)forRegression
10/6/18
Dr.YanjunQi/UVACS
18
WhenK=2
![Page 19: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/19.jpg)
• Compute distance between two points:– For instance, Euclidean distance
• Options for determining the class from nearest neighbor list– Take majority vote of class labels among the
k-nearest neighbors– Weight the votes according to distance
• example: weight factor w = 1 / d2
Nearest neighbor Variations
∑ −=i
ii yxd 2)(),( yx
10/6/18
Dr.YanjunQi/UVACS
19
![Page 20: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/20.jpg)
Variants:Distance-Weightedk-NearestNeighborAlgorithm
• Assignweightstotheneighborsbasedontheir“distance” fromthequerypoint– Weight“may” beinversesquareofthedistances
• Extreme Option: Alltrainingpointsmayinfluenceaparticularinstance– E.g.,Shepard’smethod/ModifiedShepard,… byGeospatialAnalysis
![Page 21: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/21.jpg)
Instance-basedRegressionvs.LinearRegression
• LinearRegressionLearning– Explicitdescriptionoftargetfunctiononthewholetrainingset
• Instance-basedLearning– Learning=storingalltraininginstances– Referredtoas“Lazy” learning
![Page 22: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/22.jpg)
ComputationalTimeCost
10/6/18
Dr.YanjunQi/UVACS
22
![Page 23: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/23.jpg)
K-Nearest Neighbor
Regression/ Classification
Local Smoothness
NA
NA
Training Samples
Task
Representation
Score Function
Search/Optimization
Models, Parameters
10/6/18 23
Dr.YanjunQi/UVACS
![Page 24: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/24.jpg)
Decision boundaries in global vs. local models
Linear classification
• global• stable• can be inaccurate
15-nearest neighbor 1-nearest neighbor
• local• accurate• unstable
What ultimately matters: GENERALIZATION
10/6/18 24
Dr.YanjunQi/UVACS
K=15 K=1
• Kactsasasmoother
![Page 25: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/25.jpg)
Today:
ü K-nearestneighborü ModelSelection/BiasVarianceTradeoff
ü Bias-Variancetradeoffü Highbias?Highvariance?Howtorespond?
10/6/18 25
Dr.YanjunQi/UVACS
![Page 26: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/26.jpg)
e.g. Training Error from KNN,Lesson Learned
• When k = 1, • No misclassifications (on
training): Overfit
• Minimizing training error is not always good (e.g., 1-NN)
X1
X2
1-nearest neighbor averaging
5.0ˆ =Y
5.0ˆ <Y
5.0ˆ >Y
10/6/18 26
Dr.YanjunQi/UVACS
![Page 27: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/27.jpg)
Review:MeanandVarianceofRandomVariable(RV)
• Mean(Expectation):– DiscreteRVs:
– ContinuousRVs:
( )XEµ =
E X( ) = vi *P X = vi( )vi∑
E X( ) = x* p x( )dx−∞
+∞
∫
10/6/18 27
Dr.YanjunQi/UVACS
AdaptFromCarols’prob tutorial
X: random variables written in capital letter
![Page 28: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/28.jpg)
Review:MeanandVarianceofRandomVariable(RV)
• Mean(Expectation):– DiscreteRVs:
– ContinuousRVs:
( )XEµ =
( ) ( )X P Xii iv
E v v= =∑
E X( ) = x* p x( )dx−∞
+∞
∫€
E(g(X)) = g(vi)P(X = vi)vi∑
E(g(X)) = g(x)* p(x)dx−∞
+∞
∫10/6/18 28
Dr.YanjunQi/UVACS
AdaptFromCarols’prob tutorial
![Page 29: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/29.jpg)
Review:MeanandVarianceofRV• Variance:
– DiscreteRVs:
– ContinuousRVs:
( ) ( ) ( )2X P Xi
i ivV v vµ= − =∑
V X( ) = x − µ( )2 p x( )−∞
+∞
∫ dx
Var(X) = E((X −µ)2 )
10/6/18 29
Dr.YanjunQi/UVACS
AdaptFromCarols’prob tutorial
![Page 30: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/30.jpg)
BIAS AND VARIANCE TRADE-OFF
• Bias– measures accuracy or quality of the estimator– low bias implies on average we will accurately
estimate true parameter from training data• Variance
– Measures precision or specificity of the estimator– Low variance implies the estimator does not change
much as the training set varies
30
10/6/18
Dr.YanjunQi/UVACS
![Page 31: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/31.jpg)
10/6/18
Dr.YanjunQi/UVACS
31
BIAS AND VARIANCE TRADE-OFF for Mean Squared Error of
parameter estimation
Error due to incorrect
assumptions
Error due to variance of training
samples
E()=
![Page 32: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/32.jpg)
Bias-Variance Tradeoff / Model Selection
32
underfit regionoverfit region
10/6/18
Dr.YanjunQi/UVACS
![Page 33: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/33.jpg)
(1)Trainingvs TestError
ExpectedTestError
ExpectedTrainingError
3310/6/18
Dr.YanjunQi/UVACS
![Page 34: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/34.jpg)
(1)Trainingvs TestError• Trainingerror
canalwaysbereducedwhenincreasingmodelcomplexity, ExpectedTestError
ExpectedTrainingError
3410/6/18
Dr.YanjunQi/UVACS
![Page 35: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/35.jpg)
10/6/18
Dr.YanjunQi/UVACS
35
(1)Trainingvs TestError
ExpectedTestError
ExpectedTrainingError
• Trainingerrorcanalwaysbereducedwhenincreasingmodelcomplexity,
• ExpectedTesterrorandCVerrorè goodapproximationofofgeneralization(seeextraslidesaboutEPE)
![Page 36: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/36.jpg)
Test Error to EPE: (Extra)
• Random input vector: X• Random output variable:Y• Joint distribution: Pr(X,Y )• Loss function L(Y, f(X))
• Expected prediction error (EPE):
€
EPE( f ) = E(L(Y, f (X))) = L(y, f (x))∫ Pr(dx,dy)
e.g. = (y − f (x))2∫ Pr(dx,dy)
One way to consider
generalization: by
considering the joint
population distribution
e.g. Squared error loss (also called L2 loss )10/6/18 36
Dr.YanjunQi/UVACS
![Page 37: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/37.jpg)
(2)Bias-VarianceTrade-off
• Modelswithtoofewparametersareinaccuratebecauseofalargebias(notenoughflexibility).
• Modelswithtoomanyparametersareinaccuratebecauseofalargevariance(toomuchsensitivitytothesamplerandomness).
Slidecredit:D.Hoiem10/6/18 37
Dr.YanjunQi/UVACS
![Page 38: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/38.jpg)
(2.1) Regression: Complexity versus Goodness of Fit
HighestBiasLowestvarianceModelcomplexity=low
MediumBiasMediumVariance
Modelcomplexity=medium
SmallestBiasHighestvarianceModelcomplexity=high
3810/6/18
Dr.YanjunQi/UVACS
LowVariance/HighBias
LowBias/HighVariance
![Page 39: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/39.jpg)
(2.2) Classification, Decision boundaries in global vs. local models
linear regression• global• stable• can be inaccurate
15-nearest neighbor 1-nearest neighbor
KNN• local• accurate• unstable
What ultimately matters: GENERALIZATION10/6/18 39
LowBias/HighVariance
LowVariance/HighBias
Dr.YanjunQi/UVACS
![Page 40: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/40.jpg)
Bias-Variance Tradeoff / Model Selection
40
underfit regionoverfit region
10/6/18
Dr.YanjunQi/UVACS
![Page 41: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/41.jpg)
Model “bias” & Model “variance”
• Middle RED: – TRUE function
• Error due to bias:– How far off in general
from the middle red
• Error due to variance:– How wildly the blue
points spread
10/6/18 41
Dr.YanjunQi/UVACS
![Page 42: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/42.jpg)
Model “bias” & Model “variance”
• Middle RED: – TRUE function
• Error due to bias:– How far off in general
from the middle red
• Error due to variance:– How wildly the blue
points spread
10/6/18 42
Dr.YanjunQi/UVACS
![Page 43: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/43.jpg)
10/6/18
Dr.YanjunQi/UVACS
43
RandomnessofTrainSet=>VarianceofModels,e.g.,
![Page 44: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/44.jpg)
RandomnessofTrainSet=>VarianceofModels,e.g.,
10/6/18
Dr.YanjunQi/UVACS
44
![Page 45: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/45.jpg)
needtomakeassumptionsthatareabletogeneralize
• Components– Bias: how much the average model over all training sets differ from
the true model?• Error due to inaccurate assumptions/simplifications made by the
model– Variance: how much models estimated from different training sets
differ from each other
Slidecredit:L.Lazebnik10/6/18 45
Dr.YanjunQi/UVACS
![Page 46: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/46.jpg)
TodayExtra:
ü K-nearestneighborü ModelSelection/BiasVarianceTradeoff
ü Bias-Variancetradeoffü Highbias?Highvariance?Howtorespond?
(EXTRA)
10/6/18 46
Dr.YanjunQi/UVACS
![Page 47: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/47.jpg)
10/6/18
Dr.YanjunQi/UVACS
47Credit:StanfordMachineLearning
![Page 48: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/48.jpg)
References
q Prof.Tan,Steinbach,Kumar’s“IntroductiontoDataMining”slide
q Prof.AndrewMoore’sslidesq Prof.EricXing’sslidesq Hastie,Trevor,etal.Theelementsofstatisticallearning.Vol.2.No.1.NewYork:Springer,2009.
10/6/18 48
Dr.YanjunQi/UVACS
![Page 49: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/49.jpg)
TodayExtra:
ü K-nearestneighborü ModelSelection/BiasVarianceTradeoff
ü Bias-Variancetradeoffü Highbias?Highvariance?Howtorespond?ü EPE
10/6/18 49
Dr.YanjunQi/UVACS
![Page 50: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/50.jpg)
needtomakeassumptionsthatareabletogeneralize
• Underfitting: model is too “simple” to represent all the relevant class characteristics– High bias and low variance– High training error and high test error
• Overfitting: model is too “complex” and fits irrelevant characteristics (noise) in the data– Low bias and high variance– Low training error and high test error
Slidecredit:L.Lazebnik10/6/18 50
Dr.YanjunQi/UVACS
![Page 51: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/51.jpg)
(1)Highvariance
10/6/18 51
• Testerrorstilldecreasingasmincreases.Suggestslargertrainingsetwillhelp.
• Largegapbetweentrainingandtesterror.
• Low training error and high test error Slidecredit:A.Ng
Dr.YanjunQi/UVACS
CouldalsouseCrossV
Error
![Page 52: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/52.jpg)
Howtoreducevariance?
• Chooseasimplerclassifier– MoreBias
• Regularizetheparameters– MoreBias
• Getmoretrainingdata
• Trysmallersetoffeatures– MoreBias
Slidecredit:D.Hoiem10/6/18 52
Dr.YanjunQi/UVACS
![Page 53: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/53.jpg)
(2)Highbias
10/6/18 53
•Eventrainingerrorisunacceptablyhigh.•Smallgapbetweentrainingandtesterror.
High training error and high test errorSlidecredit:A.Ng
Dr.YanjunQi/UVACS
CouldalsouseCrossV
Error
![Page 54: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/54.jpg)
HowtoreduceBias?
54
• E.g.
- Getadditionalfeatures
- Tryaddingbasisexpansions,e.g.polynomial
- Trymorecomplexlearner
10/6/18
Dr.YanjunQi/UVACS
![Page 55: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/55.jpg)
(3)Forinstance,iftryingtosolve“spamdetection”using(Extra)
10/6/18 55
L2-
Slidecredit:A.Ng
Dr.YanjunQi/UVACS
Ifperformance isnotasdesired
![Page 56: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/56.jpg)
(4)ModelSelectionandAssessment
• ModelSelection– Estimatingperformancesofdifferentmodelstochoosethebestone
• ModelAssessment– Havingchosenamodel,estimatingthepredictionerroronnewdata
5610/6/18
Dr.YanjunQi/UVACS
![Page 57: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/57.jpg)
ModelSelectionandAssessment(Extra)
• WhenDataRichScenario:Splitthedataset
• WhenInsufficientdatatosplitinto3parts– Approximatevalidationstepanalytically
• AIC,BIC,MDL,SRM– Efficientreuseofsamples
• Crossvalidation,bootstrap
Model Selection Model assessment
Train Validation Test
5710/6/18
Dr.YanjunQi/UVACS
![Page 58: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/58.jpg)
TodayExtra:
ü K-nearestneighborü ModelSelection/BiasVarianceTradeoff
ü Bias-Variancetradeoffü Highbias?Highvariance?Howtorespond?ü EPE
10/6/18 58
Dr.YanjunQi/UVACS
![Page 59: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/59.jpg)
Statistical Decision Theory (Extra)
• Random input vector: X• Random output variable:Y• Joint distribution: Pr(X,Y )• Loss function L(Y, f(X))
• Expected prediction error (EPE):•
€
EPE( f ) = E(L(Y, f (X))) = L(y, f (x))∫ Pr(dx,dy)
e.g. = (y − f (x))2∫ Pr(dx,dy)Consider
population distribution
e.g. Squared error loss (also called L2 loss )10/6/18 59
Dr.YanjunQi/UVACS
![Page 60: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/60.jpg)
Expected prediction error (EPE)
• For L2 loss:
• For 0-1 loss: L(k, ℓ) = 1-δkl
EPE( f ) = E(L(Y, f (X))) = L(y, f (x))∫ Pr(dx,dy)Consider joint
distribution
!!
f̂ (X )=C k !if!Pr(C k |X = x)=max
g∈CPr(g|X = x)Bayes Classifier
e.g. = (y− f (x))2∫ Pr(dx,dy)
Conditional mean !! f̂ (x)=E(Y |X = x)
under L2 loss, best estimator for EPE (Theoretically) is :
e.g. KNN NN methods are the direct implementation (approximation )
10/6/18 60
Dr.YanjunQi/UVACS
![Page 61: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/61.jpg)
61
EXPECTED PREDICTION ERROR for L2 Loss
• Expected prediction error (EPE) for L2 Loss:
• Since Pr(X,Y )=Pr(Y |X )Pr(X ), EPE can also be written as
• Thus it suffices to minimize EPE pointwise
EPE( f ) = E(Y − f (X))2 = (y− f (x))2∫ Pr(dx,dy)
)|)](([EE)(EPE 2| XXfYf XYX −=
)|]([Eminarg)( 2| xXcYxf XYc =−=
)|(E)( xXYxf ==Solution for Regression:
Best estimator under L2 loss: conditional expectation
Conditional mean
Solution for kNN:
![Page 62: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/62.jpg)
KNN FOR MINIMIZING EPE• We know under L2 loss, best estimator for
EPE (theoretically) is :
– Nearest neighbors assumes that f(x) is well approximated by a locally constant function.
62
Conditional mean )|(E)( xXYxf ==
10/6/18
Dr.YanjunQi/UVACS
![Page 63: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/63.jpg)
Review : WHEN EPE USES DIFFERENT LOSS
Loss Function Estimator
L2
L1
0-1(Bayes classifier / MAP)
6310/6/18 Dr.YanjunQi/UVACS
![Page 64: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/64.jpg)
Decomposition of EPE– When additive error model: – Notations
• Output random variable:• Prediction function: • Prediction estimator:
Irreducible / Bayes error
64
MSE component of f-hat in estimating f
10/6/18
Dr.YanjunQi/UVACS
![Page 65: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/65.jpg)
Bias-VarianceTrade-offforEPE:
EPE (x_0) = noise2 + bias2 + variance
Unavoidable error
Error due to incorrect
assumptions
Error due to variance of training
samples
Slidecredit:D.Hoiem10/6/18 65
Dr.YanjunQi/UVACS
![Page 66: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/66.jpg)
10/6/18
Dr.YanjunQi/UVACS
66http://alex.smola.org/teaching/10-701-15/homework/hw3_sol.pdf
![Page 67: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/67.jpg)
10/6/18
Dr.YanjunQi/UVACS
67http://alex.smola.org/teaching/10-701-15/homework/hw3_sol.pdf
![Page 68: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/68.jpg)
MSEofModel,aka,Risk
10/6/18
Dr.YanjunQi/UVACS
68http://www.stat.cmu.edu/~ryantibs/statml/review/modelbasics.pdf
![Page 69: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/69.jpg)
CrossValidationandVarianceEstimation
10/6/18
Dr.YanjunQi/UVACS
69
http://www.stat.cmu.edu/~ryantibs/statml/review/modelbasics.pdf
![Page 70: UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown samples](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f517956796c0b044740b89d/html5/thumbnails/70.jpg)
10/6/18
Dr.YanjunQi/UVACS
70
http://www.stat.cmu.edu/~ryantibs/statml/review/modelbasics.pdf