last lecture summary. basic terminology tasks – classification – regression learner, algorithm...
TRANSCRIPT
![Page 1: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/1.jpg)
Last lecture summary
![Page 2: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/2.jpg)
Basic terminology• tasks
– classification– regression
• learner, algorithm– each has one or several parameters influencing its
behavior• model
– one concrete combination of learner and parameters– tune the parameters using the training set– the generalization is assessed using test set
(previously unseen data)
![Page 3: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/3.jpg)
• learning (training)– supervised
• a target vector t is known, parameters are tuned to achieve the best match between prediction and the target vector
– unsupervised• training data consists of a set of input vectors x without
any corresponding target value• clustering, vizualization
![Page 4: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/4.jpg)
• for most applications, the original input variables must be preprocessed– feature selection– feature extraction
x784x6x5x4x3x2x1 . . .
x456x103x5x1
x784x6x5x4x3x2x1 . . .
x*666x*
309x*152x*
18
x*784x*
6x*5x*
4x*3x*
2x*1 . . .
selection extraction
![Page 5: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/5.jpg)
• feature selection/extraction = dimensionality reduction– generally good thing– curse of dimensionality
• example:– learner: regression (polynomial, y = w0 + w1x + w2x2 + w3x3 + …)– parameters: weights (coeffiients) w, order of polynomial
• weights– adjusted so the the sum of squared errors SSE (error function)
is as small as possible
predicted known target
𝑺𝑺𝑬=12∑𝑛=1
𝑁
(𝑦 (𝑥𝑛 ,𝒘 )− 𝑡𝑛 )2suma čtverců chyb
![Page 6: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/6.jpg)
New stuff
![Page 7: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/7.jpg)
Model selection
overfitting
𝑦 (𝑥 ,𝑤 )=∑𝑗=0
𝑀
𝑤 𝑗𝑥𝑗=w0=const 𝑦 (𝑥 ,𝑤 )=∑
𝑗=0
𝑀
𝑤 𝑗𝑥𝑗=𝑤0+𝑤1𝑥
![Page 8: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/8.jpg)
𝑺𝑺𝑬=12∑𝑛=1
𝑁
(𝑦 (𝑥𝑛 ,𝒘 )− 𝑡𝑛 )2
𝑹𝑴𝑺=√ 2×𝑆𝑆𝐸𝑁
=√ 1𝑁 ∑
𝑛=1
𝑁
(𝑦 (𝑥𝑛 ,𝒘 )−𝑡𝑛 )2
𝐌𝑺𝑬= 1𝑁 ∑
𝑛=1
𝑁
(𝑦 (𝑥𝑛 ,𝒘 )−𝑡𝑛 )2
RMS – root mean squared errorodmocnina střední kvadratické chyby
MSE – mean squared errorstřední kvadratická chyba
comparing error for data sets of different size – root mean squared error RMS
![Page 9: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/9.jpg)
Summary of errors
sum of squared errors
mean squared error
root mean squared error
𝑺𝑺𝑬=12∑𝑛=1
𝑁
(𝑦 (𝑥𝑛 ,𝒘 )− 𝑡𝑛 )2
𝑴𝑺𝑬= 1𝑁 ∑
𝑛=1
𝑁
(𝑦 (𝑥𝑛 ,𝒘 )−𝑡𝑛 )2
𝑹𝑴𝑺=√ 2×𝑆𝑆𝐸𝑁
=√ 1𝑁 ∑
𝑛=1
𝑁
(𝑦 (𝑥𝑛 ,𝒘 )−𝑡𝑛 )2=√𝑀𝑆𝐸
![Page 10: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/10.jpg)
Training set
Test set
![Page 11: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/11.jpg)
• the bad result for M = 9 may seem paradoxical because– polynomial of given order contains all lower order
polynomials as special cases (M=9 polynomial should be at least as good as M=3 polynomial)
• OK, let’s examine the values of the coefficients w* for polynomials of various orders
![Page 12: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/12.jpg)
M = 0 M = 1 M = 3 M = 9w0
* 0.19 0.82 0.31 0.35
w1* -1.27 7.99 232.37
w2* -25.43 -5321.83
w3* 17.37 48568.31
w4* -231639.30
w5* 640042.26
w6* -1061800.52
w7* 1042400.18
w8* -557682.99
w9* 125201.43
![Page 13: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/13.jpg)
for a given model complexity the overfitting problem becomes lesssevere as the size of the data setincreases
M = 9N = 15
M = 9N = 100
or in other words, the larger the data set is, the more complex (flexible) model can be fitted
![Page 14: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/14.jpg)
Overfitting in classification
![Page 15: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/15.jpg)
Bias-variance tradeoff
• low flexibility (low degree of polynomial) models have large bias and low variance– bias means large quadratic error of the model– variance means that the predictions of the model
will depend only little on the particular sample that was used for building the model
• i.e. there is little change in the model if training data set is changed
• thus there is little change between predictions for given x for different models
![Page 16: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/16.jpg)
![Page 17: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/17.jpg)
![Page 18: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/18.jpg)
• high flexibility models have low bias and large variance– Large degree will make the polynomial
very sensitive to the details of the sample.– Thus the polynomial changes dramatically upon
the change of the data set.– However, bias is low, as the quadratic error is low.
![Page 19: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/19.jpg)
![Page 20: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/20.jpg)
![Page 21: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/21.jpg)
• A polynomial with too few parameters (too low degree) will make large errors because of a large bias.
• A polynomial with too many parameters (too high degree) will make large errors because of a large variance.
• The degree of the ”best” polynomial must be somewhere ”in-between” - bias-variance tradeoff
MSE = variance + bias2
![Page 22: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/22.jpg)
• This phenomenon is not specific to polynomial regression!
• In fact, it shows-up in any kind of model.• Generally, the bias-variance tradeoff principle can
be stated as:– Models with too few parameters are inaccurate
because they are not flexible enough (large bias, large error of the model).
– Models with too many parameters are inaccurate because they overfit data (large variance, too much sensitivity to the data)
– Identifying the best model requires identifying the proper “model complexity” (number of parameters).
![Page 23: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/23.jpg)
Test-data and Cross Validation
![Page 24: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/24.jpg)
Tid Refund Marital Status
Taxable Income Cheat
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes 10
attributes, input/independent variables, features
object
instance
sampleclass
![Page 25: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/25.jpg)
Attribute types• discrete
– Has only a finite or countably infinite set of values.– nominal (also categorical)
• the values are just different labels (e.g. ID number, eye color)• central tendency given by mode (median, mean not defined)
– ordinal• their values reflect the order (e.g. ranking, height in {tall,
medium, short})• central tendency given by median, mode (mean not defined)
– binary attributes - special case of discrete attributes• continuous (also quantitative)
– Has real numbers as attribute values.– central tendency given by mean, + stdev, …
![Page 26: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/26.jpg)
A regression problem
y = f(x) + noiseCan we learn from this data?
Consider three methods
x
y
taken from Cross Validation tutorialby Andrew Moorehttp://www.autonlab.org/tutorials/overfit.html
![Page 27: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/27.jpg)
Linear regression
What will the regression model will look like?
y = ax + b
Univariate linear regression with a constant term.
x
y
taken from Cross Validation tutorial by Andrew Moore http://www.autonlab.org/tutorials/overfit.html
![Page 28: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/28.jpg)
Quadratic regression
What will the regression model will look like?
y = ax2 + bx + c
x
y
taken from Cross Validation tutorial by Andrew Moore http://www.autonlab.org/tutorials/overfit.html
![Page 29: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/29.jpg)
Join-the-dots
Also known as piecewise linearnonparametric regression if thatmakes you feel better.
x
y
taken from Cross Validation tutorial by Andrew Moore http://www.autonlab.org/tutorials/overfit.html
![Page 30: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/30.jpg)
Which is best?
Why not to choose the method with the best fit to data?
taken from Cross Validation tutorial by Andrew Moore http://www.autonlab.org/tutorials/overfit.html
![Page 31: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/31.jpg)
What do we really want ?
Why not to choose the method with the best fit to data?
How well are you going to predict future data?
taken from Cross Validation tutorial by Andrew Moore http://www.autonlab.org/tutorials/overfit.html
![Page 32: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/32.jpg)
The test set method1. Randomly choose 30%
of data to be in test set.
2. The remainder is training set.
3. Perform regression on thetraining set.
4. Estimate future performancewith the test set.x
y
linear regressionMSE = 2.4
taken from Cross Validation tutorial by Andrew Moore http://www.autonlab.org/tutorials/overfit.html
![Page 33: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/33.jpg)
The test set method1. Randomly choose 30%
of data to be in test set.
2. The remainder is training set.
3. Perform regression on thetraining set.
4. Estimate future performancewith the test set.x
y
quadratic regressionMSE = 0.9
taken from Cross Validation tutorial by Andrew Moore http://www.autonlab.org/tutorials/overfit.html
![Page 34: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/34.jpg)
The test set method1. Randomly choose 30%
of data to be in test set.
2. The remainder is training set.
3. Perform regression on thetraining set.
4. Estimate future performancewith the test set.x
y
join-the-dotsMSE = 2.2
taken from Cross Validation tutorial by Andrew Moore http://www.autonlab.org/tutorials/overfit.html
![Page 35: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/35.jpg)
Test set method• good news
– very simple– Model selection: choose method with the best score.
• bad news– wastes data (we got an estimate of the best method by
using 30% less data)
– if you don’t have enough data, test set may be just lucky/unlucky
test set estimator of performance has high variance
Train Test
taken from Cross Validation tutorial by Andrew Moore http://www.autonlab.org/tutorials/overfit.html
![Page 36: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/36.jpg)
training error
testing error
model complexity
the above examples were for different algorithms, this one is about the model complexity (for the given algorithm)
![Page 37: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/37.jpg)
• stratified division– same proportion of data in the training and test
sets
![Page 38: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/38.jpg)
LOOCV (Leave-one-out Cross Validation)
y
x
1. choose one data point2. remove it from the set3. fit the remaining data points4. note your error
Repeat these steps for all points. When you are done report the mean square error.
taken from Cross Validation tutorial by Andrew Moore http://www.autonlab.org/tutorials/overfit.html
![Page 39: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/39.jpg)
MSELOOCV = 2.12
take
n fr
om C
ross
Val
idati
on tu
toria
l by
Andr
ew M
oore
htt
p://
ww
w.a
uton
lab.
org/
tuto
rials
/ove
rfit.h
tml
![Page 40: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/40.jpg)
MSELOOCV = 0.962
take
n fr
om C
ross
Val
idati
on tu
toria
l by
Andr
ew M
oore
htt
p://
ww
w.a
uton
lab.
org/
tuto
rials
/ove
rfit.h
tml
![Page 41: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/41.jpg)
MSELOOCV = 3.33
take
n fr
om C
ross
Val
idati
on tu
toria
l by
Andr
ew M
oore
htt
p://
ww
w.a
uton
lab.
org/
tuto
rials
/ove
rfit.h
tml
![Page 42: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/42.jpg)
Which kind of Cross Validation?
Good BadTest set Cheap Variance
Wastes dataLOOCV Doesn’t waste data Expensive
Can we get best of both worlds?
taken from Cross Validation tutorial by Andrew Moore http://www.autonlab.org/tutorials/overfit.html
![Page 43: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/43.jpg)
k-fold Cross Validation
x
y
Randomly break data set into k partitions.In our case k = 3.
Red partition: Train on all points not in the red partition. Find the test set sum of errors on the red points.
Blue partition: Train on all points not in the blue partition. Find the test set sum of errors on the blue points.
Green partition: Train on all points not in the green partition. Find the test set sum of errors on the green points.
Then report the mean error. linear regressionMSE3fold = 2.05
taken from Cross Validation tutorial by Andrew Moore http://www.autonlab.org/tutorials/overfit.html
![Page 44: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/44.jpg)
Results of 3-fold Cross Validation
MSE3fold
linear 2.05quadratic 1.11join-the-dots 2.93
taken from Cross Validation tutorial by Andrew Moore http://www.autonlab.org/tutorials/overfit.html
![Page 45: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/45.jpg)
Which kind of Cross Validation?Good Bad
Test set Cheap. VarianceWastes data.
LOOCV Doesn’t waste data. Expensive.3-fold Slightly better than test-
set.Wastier than LOOCV.More expensive than test-set.
10-fold Only wastes 10%.Only 10 times more expensive instead of R times.
Wastes 10%.10 times more expensive instead of R times (as LOOCV is).
R-fold is identical to LOOCV
![Page 46: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/46.jpg)
Model selection via CV• We are trying to decide which model to use. For the
polynomial regression decide about the degree of polynom.
• Train each machine and make a table.
• Whichever model gave best CV score: train it with all the data. That’s the predictive model you’ll use.
degree MSEtrain MSE10-fold Choice
1
2
3
4
5
6
take
n fr
om C
ross
Val
idati
on tu
toria
l by
Andr
ew M
oore
, htt
p://
ww
w.a
uton
lab.
org/
tuto
rials
/ove
rfit.h
tml
![Page 47: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/47.jpg)
Selection and testing• Complete procedure to algorithm selection and
estimation of its quality1. Divide data to train/test
2. By Cross Validation on the Train choose the algorithm
3. Use this algorithm to construct a classifier using Train
4. Estimate its quality on the Test
Train Test
Train
Test
Train Val
![Page 48: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/48.jpg)
• Training error can not be used as an indicator of model’s performance due to overfitting.
• Training data set - train a range of models, or a given model with a range of values for its parameters.
• Compare them on independent data – Validation set.– If the model design is iterated many times, then
some overfitting to the validation data can occur and so it may be necessary to keep aside a third
• Test set on which the performance of the selected model is finally evaluated.
![Page 49: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/49.jpg)
Fnally comes our first machine learning algorithm
![Page 50: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/50.jpg)
• Which class (Blue or Orange) would you predict for this point?
• And why?• classification boundary
x
y?
![Page 51: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/51.jpg)
x
y?
• And now?• Classification boundary is quadratic
![Page 52: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/52.jpg)
x
y?
• And now?• And why?
![Page 53: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/53.jpg)
Nearest Neighbors Classification
![Page 54: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/54.jpg)
instances
![Page 55: Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior](https://reader030.vdocuments.us/reader030/viewer/2022032607/56649ec65503460f94bd25d3/html5/thumbnails/55.jpg)
• But, what does it mean similar?
A B C D
source: Kardi Teknomo’s Tutorials, http://people.revoledu.com/kardi/tutorial/index.html