introduction to machine learning - rutgers universityelgammal/classes/cs536/... · introduction to...
TRANSCRIPT
![Page 1: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml](https://reader030.vdocuments.us/reader030/viewer/2022040306/5ec4cea1c620f72ddb6d38af/html5/thumbnails/1.jpg)
INTRODUCTION TO
Machine Learning
ETHEM ALPAYDIN© The MIT Press, 2004
[email protected]://www.cmpe.boun.edu.tr/~ethem/i2ml
Lecture Slides for
![Page 2: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml](https://reader030.vdocuments.us/reader030/viewer/2022040306/5ec4cea1c620f72ddb6d38af/html5/thumbnails/2.jpg)
CHAPTER 10:
Linear Discrimination
![Page 3: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml](https://reader030.vdocuments.us/reader030/viewer/2022040306/5ec4cea1c620f72ddb6d38af/html5/thumbnails/3.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
3
Likelihood- vs. Discriminant-based Classification
Likelihood-based: Assume a model for p(x|Ci), use Bayes’ rule to calculate P(Ci|x)
gi(x) = log P(Ci|x)
Discriminant-based: Assume a model for gi(x|Φi); no density estimation
Estimating the boundaries is enough; no need to accurately estimate the densities inside the boundaries
![Page 4: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml](https://reader030.vdocuments.us/reader030/viewer/2022040306/5ec4cea1c620f72ddb6d38af/html5/thumbnails/4.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
4
Linear Discriminant
Linear discriminant:
Advantages:Simple: O(d) space/computation
Knowledge extraction: Weighted sum of attributes; positive/negative weights, magnitudes (credit scoring)Optimal when p(x|Ci) are Gaussian with shared cov matrix; useful when classes are (almost) linearly separable
![Page 5: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml](https://reader030.vdocuments.us/reader030/viewer/2022040306/5ec4cea1c620f72ddb6d38af/html5/thumbnails/5.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
5
Generalized Linear Model
Quadratic discriminant:
Higher-order (product) terms:
Map from x to z using nonlinear basis functions and use a linear discriminant in z-space
![Page 6: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml](https://reader030.vdocuments.us/reader030/viewer/2022040306/5ec4cea1c620f72ddb6d38af/html5/thumbnails/6.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
6
Two Classes
![Page 7: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml](https://reader030.vdocuments.us/reader030/viewer/2022040306/5ec4cea1c620f72ddb6d38af/html5/thumbnails/7.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
7
Geometry
![Page 8: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml](https://reader030.vdocuments.us/reader030/viewer/2022040306/5ec4cea1c620f72ddb6d38af/html5/thumbnails/8.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
8
Multiple Classes
Classes arelinearly separable
![Page 9: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml](https://reader030.vdocuments.us/reader030/viewer/2022040306/5ec4cea1c620f72ddb6d38af/html5/thumbnails/9.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
9
Pairwise Separation
![Page 10: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml](https://reader030.vdocuments.us/reader030/viewer/2022040306/5ec4cea1c620f72ddb6d38af/html5/thumbnails/10.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
10
From Discriminants to Posteriors
When p (x | Ci ) ~ N ( µi , ∑)
![Page 11: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml](https://reader030.vdocuments.us/reader030/viewer/2022040306/5ec4cea1c620f72ddb6d38af/html5/thumbnails/11.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
11
![Page 12: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml](https://reader030.vdocuments.us/reader030/viewer/2022040306/5ec4cea1c620f72ddb6d38af/html5/thumbnails/12.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
12
Sigmoid (Logistic) Function
![Page 13: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml](https://reader030.vdocuments.us/reader030/viewer/2022040306/5ec4cea1c620f72ddb6d38af/html5/thumbnails/13.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
13
Gradient-Descent
E(w|X) is error with parameters w on sample X
Gradient
Gradient-descent:
Starts from random w and updates w iteratively in the negative direction of gradient
![Page 14: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml](https://reader030.vdocuments.us/reader030/viewer/2022040306/5ec4cea1c620f72ddb6d38af/html5/thumbnails/14.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
14
Gradient-Descent
wt wt+1
η
E (wt)
E (wt+1)
![Page 15: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml](https://reader030.vdocuments.us/reader030/viewer/2022040306/5ec4cea1c620f72ddb6d38af/html5/thumbnails/15.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
15
Logistic Discrimination
Two classes: Assume log likelihood ratio is linear
![Page 16: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml](https://reader030.vdocuments.us/reader030/viewer/2022040306/5ec4cea1c620f72ddb6d38af/html5/thumbnails/16.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
16
Training: Two Classes
![Page 17: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml](https://reader030.vdocuments.us/reader030/viewer/2022040306/5ec4cea1c620f72ddb6d38af/html5/thumbnails/17.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
17
Training: Gradient-Descent
![Page 18: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml](https://reader030.vdocuments.us/reader030/viewer/2022040306/5ec4cea1c620f72ddb6d38af/html5/thumbnails/18.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
18
![Page 19: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml](https://reader030.vdocuments.us/reader030/viewer/2022040306/5ec4cea1c620f72ddb6d38af/html5/thumbnails/19.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
19
10
100 1000
![Page 20: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml](https://reader030.vdocuments.us/reader030/viewer/2022040306/5ec4cea1c620f72ddb6d38af/html5/thumbnails/20.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
20
K>2 Classessoftmax
![Page 21: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml](https://reader030.vdocuments.us/reader030/viewer/2022040306/5ec4cea1c620f72ddb6d38af/html5/thumbnails/21.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
21
![Page 22: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml](https://reader030.vdocuments.us/reader030/viewer/2022040306/5ec4cea1c620f72ddb6d38af/html5/thumbnails/22.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
22
Example
![Page 23: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml](https://reader030.vdocuments.us/reader030/viewer/2022040306/5ec4cea1c620f72ddb6d38af/html5/thumbnails/23.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
23
Generalizing the Linear Model
Quadratic:
Sum of basis functions:
where φ(x) are basis functions Kernels in SVMHidden units in neural networks
![Page 24: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml](https://reader030.vdocuments.us/reader030/viewer/2022040306/5ec4cea1c620f72ddb6d38af/html5/thumbnails/24.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
24
Discrimination by Regression
Classes are NOT mutually exclusive and exhaustive
![Page 25: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml](https://reader030.vdocuments.us/reader030/viewer/2022040306/5ec4cea1c620f72ddb6d38af/html5/thumbnails/25.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
25
Optimal Separating Hyperplane
(Cortes and Vapnik, 1995; Vapnik, 1995)
![Page 26: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml](https://reader030.vdocuments.us/reader030/viewer/2022040306/5ec4cea1c620f72ddb6d38af/html5/thumbnails/26.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
26
Margin
Distance from the discriminant to the closest instances on either side
Distance of x to the hyperplane is
We require
For a unique sol’n, fix and to max margin
![Page 27: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml](https://reader030.vdocuments.us/reader030/viewer/2022040306/5ec4cea1c620f72ddb6d38af/html5/thumbnails/27.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
27
![Page 28: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml](https://reader030.vdocuments.us/reader030/viewer/2022040306/5ec4cea1c620f72ddb6d38af/html5/thumbnails/28.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
28
![Page 29: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml](https://reader030.vdocuments.us/reader030/viewer/2022040306/5ec4cea1c620f72ddb6d38af/html5/thumbnails/29.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
29
Most αt are 0 and only a small number have αt>0; they are the support vectors
![Page 30: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml](https://reader030.vdocuments.us/reader030/viewer/2022040306/5ec4cea1c620f72ddb6d38af/html5/thumbnails/30.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
30
Soft Margin Hyperplane
Not linearly separable
Soft error
New primal is
![Page 31: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml](https://reader030.vdocuments.us/reader030/viewer/2022040306/5ec4cea1c620f72ddb6d38af/html5/thumbnails/31.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
31
Kernel Machines
Preprocess input x by basis functions
The SVM solution
![Page 32: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml](https://reader030.vdocuments.us/reader030/viewer/2022040306/5ec4cea1c620f72ddb6d38af/html5/thumbnails/32.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
32
Kernel Functions
Polynomials of degree q:
Radial-basis functions:
Sigmoidal functions:
(Cherkassky and Mulier, 1998)
![Page 33: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml](https://reader030.vdocuments.us/reader030/viewer/2022040306/5ec4cea1c620f72ddb6d38af/html5/thumbnails/33.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
33
SVM for Regression
Use a linear model (possibly kernelized)
Use the є-sensitive error function