jeff howbert introduction to machine learning winter 2012 1 regression linear regression
TRANSCRIPT
![Page 1: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Regression Linear Regression](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649ed45503460f94be539f/html5/thumbnails/1.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 1
Regression
Linear Regression
![Page 2: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Regression Linear Regression](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649ed45503460f94be539f/html5/thumbnails/2.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 2
slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)
![Page 3: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Regression Linear Regression](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649ed45503460f94be539f/html5/thumbnails/3.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 3
slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)
![Page 4: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Regression Linear Regression](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649ed45503460f94be539f/html5/thumbnails/4.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 4
slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)
![Page 5: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Regression Linear Regression](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649ed45503460f94be539f/html5/thumbnails/5.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 5
slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)
![Page 6: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Regression Linear Regression](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649ed45503460f94be539f/html5/thumbnails/6.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 6
slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)
![Page 7: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Regression Linear Regression](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649ed45503460f94be539f/html5/thumbnails/7.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 7
Suppose target labels come from set Y
– Binary classification: Y = { 0, 1 }
– Regression: Y = (real numbers) A loss function maps decisions to costs:
– defines the penalty for predicting when the true value is .
Standard choice for classification:0/1 loss (same asmisclassification error)
Standard choice for regression:squared loss
Loss function
)ˆ,( yyL yy
otherwise 1
ˆ if 0)ˆ,(1/0
yyyyL
2)ˆ()ˆ,( yyyyL
![Page 8: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Regression Linear Regression](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649ed45503460f94be539f/html5/thumbnails/8.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 8
Most popular estimation method is least squares:
– Determine linear coefficients , that minimize sum of squared loss (SSL).
– Use standard (multivariate) differential calculus: differentiate SSL with respect to , find zeros of each partial differential equation solve for ,
One dimension:
Least squares linear fit to data
ttt
N
jjj
xxy
x, yx,yxyx
yx
Nxy
samplefor test ˆ
trainingof means ]var[
],cov[
samples ofnumber ))((SSL1
2
![Page 9: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Regression Linear Regression](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649ed45503460f94be539f/html5/thumbnails/9.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 9
Multiple dimensions
– To simplify notation and derivation, change to 0, and add a new feature x0 = 1 to feature vector x:
– Calculate SSL and determine :
Least squares linear fit to data
xβ
d
iii xy
10 1ˆ
ttt
j
j
N
j
d
iiij
y
y
xy
xxβ
yXXXβ
xX
y
XβyXβy
samplefor test ˆ
)(
samples trainingall ofmatrix
responses trainingall ofvector
)()()(SSL
T1T
1
T2
0
![Page 10: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Regression Linear Regression](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649ed45503460f94be539f/html5/thumbnails/10.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 10
Least squares linear fit to data
![Page 11: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Regression Linear Regression](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649ed45503460f94be539f/html5/thumbnails/11.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 11
Least squares linear fit to data
![Page 12: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Regression Linear Regression](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649ed45503460f94be539f/html5/thumbnails/12.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 12
The inputs X for linear regression can be:
– Original quantitative inputs
– Transformation of quantitative inputs, e.g. log, exp, square root, square, etc.
– Polynomial transformation example: y = 0 + 1x + 2x2 + 3x3
– Basis expansions
– Dummy coding of categorical inputs
– Interactions between variables example: x3 = x1 x2
This allows use of linear regression techniques to fit much more complicated non-linear datasets.
Extending application of linear regression
![Page 13: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Regression Linear Regression](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649ed45503460f94be539f/html5/thumbnails/13.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 13
Example of fitting polynomial curve with linear
model
![Page 14: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Regression Linear Regression](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649ed45503460f94be539f/html5/thumbnails/14.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 14
97 samples, partitioned into:
– 67 training samples
– 30 test samples Eight predictors (features):
– 6 continuous (4 log transforms)
– 1 binary
– 1 ordinal Continuous outcome variable:
– lpsa: log( prostate specific antigen level )
Prostate cancer dataset
![Page 15: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Regression Linear Regression](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649ed45503460f94be539f/html5/thumbnails/15.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 15
Correlations of predictors in prostate cancer
dataset
![Page 16: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Regression Linear Regression](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649ed45503460f94be539f/html5/thumbnails/16.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 16
Fit of linear model to prostate cancer dataset
![Page 17: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Regression Linear Regression](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649ed45503460f94be539f/html5/thumbnails/17.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 17
Complex models (lots of parameters) often prone to overfitting. Overfitting can be reduced by imposing a constraint on the overall
magnitude of the parameters. Two common types of regularization in linear regression:
– L2 regularization (a.k.a. ridge regression). Find which minimizes:
is the regularization parameter: bigger imposes more constraint
– L1 regularization (a.k.a. lasso). Find which minimizes:
Regularization
N
j
d
ii
d
iiij xy
1 1
22
0
)(
N
j
d
ii
d
iiij xy
1 1
2
0
||)(
![Page 18: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Regression Linear Regression](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649ed45503460f94be539f/html5/thumbnails/18.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 18
Example of L2 regularization
L2 regularization shrinks coefficients towards (but not to) zero, and towards
each other.
![Page 19: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Regression Linear Regression](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649ed45503460f94be539f/html5/thumbnails/19.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 19
Example of L1 regularization
L1 regularization shrinks coefficients to zero at
different rates; different values of give models with different subsets of
features.
![Page 20: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Regression Linear Regression](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649ed45503460f94be539f/html5/thumbnails/20.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 20
Example of subset selection
![Page 21: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Regression Linear Regression](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649ed45503460f94be539f/html5/thumbnails/21.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 21
Comparison of various selection and shrinkage
methods
![Page 22: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Regression Linear Regression](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649ed45503460f94be539f/html5/thumbnails/22.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 22
L1 regularization gives sparse models, L2 does
not
![Page 23: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Regression Linear Regression](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649ed45503460f94be539f/html5/thumbnails/23.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 23
In addition to linear regression, there are:
– many types of non-linear regression decision trees nearest neighbor neural networks support vector machines
– locally linear regression
– etc.
Other types of regression
![Page 24: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Regression Linear Regression](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649ed45503460f94be539f/html5/thumbnails/24.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 24
MATLAB interlude
matlab_demo_07.m
Part B