regression: simple and linear...advertisement sales advertisement ales estimating coefficients true...
TRANSCRIPT
![Page 1: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/1.jpg)
INTRODUCTION TO MACHINE LEARNING
Regression: Simple and Linear
![Page 2: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/2.jpg)
Introduction to Machine Learning
Regression Principle
PREDICTORS REGRESSION RESPONSE
![Page 3: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/3.jpg)
Introduction to Machine Learning
ExampleShop Data: sales, competition, district size, ...
Data Analyst Relationship?
● Predictors: competition, advertisement, …
● Response: sales
Shopkeeper Predictions!
![Page 4: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/4.jpg)
Introduction to Machine Learning
Simple Linear Regression● Simple: one predictor to model the response
● Linear: approximately linear relationship
Linearity Plausible? Sca!erplot!
![Page 5: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/5.jpg)
Introduction to Machine Learning
Example● Relationship: advertisement sales
● Expectation: positively correlated
![Page 6: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/6.jpg)
Introduction to Machine Learning
Example● Observation: upwards linear trend
● First Step: simple linear regression
5 10 15
0100
200
300
400
500
advertisement
sales
![Page 7: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/7.jpg)
Introduction to Machine Learning
ModelFi!ing a line
● Predictor: ● Intercept:
● Statistical Error:
● Response: ● Slope:
![Page 8: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/8.jpg)
Introduction to Machine Learning
5 10 15
0100
200
300
400
500
advertisement
sales
Advertisement
Sale
s
Estimating Coefficients
True Response
Residuals
Fi!ed Response
Minimize!
5 10 15
010
020
030
040
050
0
5 10 15
0100
200
300
400
500
advertisement
sales
#Observations
![Page 9: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/9.jpg)
Introduction to Machine Learning
Estimating Coefficients
> my_lm <- lm(sales ~ ads, data = shop_data)
PredictorResponse
5 10 15
010
020
030
040
050
0
> my_lm$coefficients Returns coefficients
![Page 10: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/10.jpg)
Introduction to Machine Learning
Prediction with RegressionPredicting new outcomes
> y_new <- predict(my_lm, x_new, interval = "confidence")
Provides confidence interval
,
Estimated Response
New Predictor Instance
Estimated Coefficients
Must be data frame
Example: Ads: 11.000$ Sales: 380.000$
![Page 11: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/11.jpg)
Introduction to Machine Learning
Accuracy: RMSEMeasure of accuracy:
Estimated Response
True Response# Observations
difficult to interpret!
Example: RMSE = 76.000$ Meaning?
RMSE has unit + scale
![Page 12: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/12.jpg)
Introduction to Machine Learning
Interpretation: % explained variance,
Accuracy: R-squaredSample mean response
close to 1 good fit!
> summary(my_lm)$r.squared
R-squared
Total SS
Example: 0.84
![Page 13: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/13.jpg)
INTRODUCTION TO MACHINE LEARNING
Let’s practice!
![Page 14: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/14.jpg)
INTRODUCTION TO MACHINE LEARNING
Multivariable Linear Regression
![Page 15: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/15.jpg)
Introduction to Machine Learning
5 10 15
010
020
030
040
050
0
0 5 10 15
010
020
030
040
050
0
nearby competition
sale
s
ExampleSimple Linear Regression:
> lm(sales ~ ads, data = shop_data)
0 5 10 15
010
020
030
040
050
0
nearby competition
sale
s
> lm(sales ~ comp, data = shop_data)
Loss of information!
![Page 16: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/16.jpg)
Introduction to Machine Learning
Multi-Linear Model
Solution: combine in multi linear model!
Individual Effect
● Higher predictive power
● Higher accuracy
![Page 17: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/17.jpg)
Introduction to Machine Learning
Multi-Linear Regression Model
● Predictors:
● Response:
● Statistical Error:
● Coefficients:
![Page 18: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/18.jpg)
Introduction to Machine Learning
Estimating Coefficients
True Response Fi!ed Response
#Observations
Minimize!
Residuals
![Page 19: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/19.jpg)
Introduction to Machine Learning
Extending!
> my_lm <- lm(sales ~ ads + comp + ..., data = shop_data)
More predictors: total inventory, district size, …
PredictorsResponse
Extend methodology to p predictors:
![Page 20: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/20.jpg)
Introduction to Machine Learning
RMSE & Adjusted R-SquaredMore predictors
● Penalizes more predictors
● Used to compare
> summary(my_lm)$adj.r.squared
Solution: adjusted R-squared
Lower RMSE and higher R-squared
Higher complexity and cost{
In Example: 0.819 0.906
![Page 21: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/21.jpg)
Introduction to Machine Learning
Influence of predictors● p-value: indicator influence of parameter
● p-value low — more likely parameter has significant influence
> summary(my_lm)
Call: lm(formula = sales ~ ads + comp, data = shop_data)
Residuals: Min 1Q Median 3Q Max -131.920 -23.009 -4.448 33.978 146.486
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 228.740 80.592 2.838 0.009084 ** ads 25.521 5.900 4.325 0.000231 *** comp -19.234 4.549 -4.228 0.000296 ***
P-Values
![Page 22: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/22.jpg)
Introduction to Machine Learning
Example● Want 95% confidence — p-value <= 0.05
● Want 99% confidence — p-value <= 0.01
Note: Do not mix up R-squared with p-values!
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 228.740 80.592 2.838 0.009084 ** ads 25.521 5.900 4.325 0.000231 *** comp -19.234 4.549 -4.228 0.000296 ***
P-Values
![Page 23: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/23.jpg)
Introduction to Machine Learning
Assumptions● Just make a model,
make a summary and look at p-values?
● Not that simple!
● We made some implicit assumptions
![Page 24: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/24.jpg)
Introduction to Machine Learning
0 100 200 300 400 500 600
−100
−50
050
100
150
Residual Plot
Estimated Sales
Res
idua
ls
−2 −1 0 1 2
−100
−50
050
100
150
Normal Q−Q Plot
Theoretical Quantiles
Res
idua
l Qua
ntile
s
> qqnorm(lm_shop$residuals)
> plot(lm_shop$fitted.values, lm_shop$residuals)
Verifying AssumptionsResiduals:
● Identical Normal:
Draws normal Q-Q plot
No pa!ern?● Independent:
Approximately a line?
![Page 25: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/25.jpg)
Introduction to Machine Learning
Verfiying Assumptions
−2 −1 0 1 2
−100
−50
050
100
150
Normal Q−Q Plot
Theoretical QuantilesR
esid
ual Q
uant
iles
0 100 200 300 400 500 600
−100
−50
050
100
150
Residual Plot
Estimated Sales
Res
idua
ls
● Important to avoid mistakes!
● Alternative tests exist
![Page 26: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/26.jpg)
INTRODUCTION TO MACHINE LEARNING
Let’s practice!
![Page 27: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/27.jpg)
INTRODUCTION TO MACHINE LEARNING
k-Nearest Neighbors and Generalization
![Page 28: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/28.jpg)
Introduction to Machine Learning
Non-Parametric Regression
2 3 4 5 6
4446
4850
5254
56
x
y
Problem: Visible pa!ern, but not linear
![Page 29: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/29.jpg)
Introduction to Machine Learning
Non-Parametric RegressionProblem: Visible pa!ern, but not linear
Solutions:
● Transformation
● Multi-linear Regression
● non-Parametric Regression
Tedious
Advanced
Doable
![Page 30: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/30.jpg)
Introduction to Machine Learning
Non-Parametric RegressionProblem: Visible pa!ern, but not linear
Techniques:
● k-Nearest Neighbors
● Kernel Regression
● Regression Trees
● …
No parameter estimations required!
![Page 31: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/31.jpg)
Introduction to Machine Learning
k-NN: Algorithm
4.5 4.6 4.7 4.8 4.9 5.0
45.5
46.0
46.5
47.0
47.5
48.0
48.5
x
y
New observation
Given a training set and a new observation:
![Page 32: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/32.jpg)
Introduction to Machine Learning
4.5 4.6 4.7 4.8 4.9 5.0
45.5
46.0
46.5
47.0
47.5
48.0
48.5
x
y
Given a training set and a new observation:
k-NN: Algorithm
1. Calculate the distance in the predictors
4.5 4.6 4.7 4.8 4.9 5.0
45.5
46.0
46.5
47.0
47.5
48.0
48.5
x
y
![Page 33: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/33.jpg)
Introduction to Machine Learning
4.5 4.6 4.7 4.8 4.9 5.0
45.5
46.0
46.5
47.0
47.5
48.0
48.5
x
y
k-NN: Algorithm
4.5 4.6 4.7 4.8 4.9 5.0
45.5
46.0
46.5
47.0
47.5
48.0
48.5
x
yk = 4
2. Select the k nearest
Given a training set and a new observation:
![Page 34: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/34.jpg)
Introduction to Machine Learning
4.5 4.6 4.7 4.8 4.9 5.0
45.5
46.0
46.5
47.0
47.5
48.0
48.5
x
y
4.5 4.6 4.7 4.8 4.9 5.0
45.5
46.0
46.5
47.0
47.5
48.0
48.5
x
y
k-NN: Algorithm
Mean of 4 responses
3. Aggregate the response of the k nearest
Given a training set and a new observation:
![Page 35: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/35.jpg)
Introduction to Machine Learning
4.5 4.6 4.7 4.8 4.9 5.0
45.5
46.0
46.5
47.0
47.5
48.0
48.5
x
y
4.5 4.6 4.7 4.8 4.9 5.0
45.5
46.0
46.5
47.0
47.5
48.0
48.5
x
y
k-NN: Algorithm
Prediction
4. The outcome is your prediction
Given a training set and a new observation:
![Page 36: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/36.jpg)
Introduction to Machine Learning
Choosing k● k = 1: Perfect fit on training set but poor predictions
● k = #obs in training set: Mean, also poor predictions
Reasonable: k = 20% of #obs in training set
Bias - Variance trade off!
![Page 37: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/37.jpg)
Introduction to Machine Learning
Generalization in Regression● Built your own regression model
● Worked on training set
● Does it generalize well?!
● Two techniques
● Hold Out: simply split the dataset
● K-fold cross-validation
![Page 38: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/38.jpg)
Introduction to Machine Learning
Hold Out Method for RegressionTest setTraining set
Build regression model on training set
Calculate RMSE within training set
Predict the outcome of the test set
Calculate the RMSE within test set
Compare Test RMSE and Training RMSE
![Page 39: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/39.jpg)
Introduction to Machine Learning
2 3 4 5 6 7 8
−20
24
68
10
x
y
2 3 4 5 6 7 8
−20
24
68
10
x
y
2 3 4 5 6 7 8
−20
24
68
10
x
y
● Fit:
● Generalize:
● Prediction: ✔
✔
✔
Under and Overfi!ing Underfit Overfit
● Fit:
● Generalize:
● Prediction:
● Fit:
● Generalize:
● Prediction:
✔
✔
✘
✘
✘
✘
![Page 40: Regression: Simple and Linear...advertisement sales Advertisement ales Estimating Coefficients True Response Residuals Fi!ed Response Minimize! 5 10 15 0 100 200 300 400 500 5 10 15](https://reader034.vdocuments.us/reader034/viewer/2022043013/5faf7b2d1a2abe77d443c4ea/html5/thumbnails/40.jpg)
INTRODUCTION TO MACHINE LEARNING
Let’s practice!