cse217 introduction to data science lecture 4: …m.neumann/sp2019/cse217/... · cse217...
TRANSCRIPT
![Page 1: CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 4: …m.neumann/sp2019/cse217/... · CSE217 INTRODUCTION TO DATA SCIENCE Spring 2019 Marion Neumann ... •carefully define what kinds](https://reader034.vdocuments.us/reader034/viewer/2022042220/5ec5fdb74f8ce2596d27b5be/html5/thumbnails/1.jpg)
CSE217 INTRODUCTION TO DATA SCIENCE
Spring 2019Marion Neumann
LECTURE 4: REGRESSION
![Page 2: CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 4: …m.neumann/sp2019/cse217/... · CSE217 INTRODUCTION TO DATA SCIENCE Spring 2019 Marion Neumann ... •carefully define what kinds](https://reader034.vdocuments.us/reader034/viewer/2022042220/5ec5fdb74f8ce2596d27b5be/html5/thumbnails/2.jpg)
RECAP: DATA SCIENCE
2
…solving problems with data…
collect & understand
data
clean & format
data
dataproblem
use datato createsolution
scientific or business problem
…which step is most exciting?
Machine Learning
![Page 3: CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 4: …m.neumann/sp2019/cse217/... · CSE217 INTRODUCTION TO DATA SCIENCE Spring 2019 Marion Neumann ... •carefully define what kinds](https://reader034.vdocuments.us/reader034/viewer/2022042220/5ec5fdb74f8ce2596d27b5be/html5/thumbnails/3.jpg)
RECAP: ML
• data: anything you can measure or record
• model: specifica9on of a (mathema9cal) rela+onship between different variables
• evalua*on: how well does the model work?
3
…creating and using models that learn from data…
![Page 4: CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 4: …m.neumann/sp2019/cse217/... · CSE217 INTRODUCTION TO DATA SCIENCE Spring 2019 Marion Neumann ... •carefully define what kinds](https://reader034.vdocuments.us/reader034/viewer/2022042220/5ec5fdb74f8ce2596d27b5be/html5/thumbnails/4.jpg)
RECAP: ML WORKFLOW• Training phase, test phase, and evaluation phase
à turn to your neighbor• by taking turns, explain what happens in the
• training phase• test phase• evaluation phase
• carefully define what kinds of data are used in each phase
4
data
outputprogram
data
output
ground truth performance
measure
![Page 5: CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 4: …m.neumann/sp2019/cse217/... · CSE217 INTRODUCTION TO DATA SCIENCE Spring 2019 Marion Neumann ... •carefully define what kinds](https://reader034.vdocuments.us/reader034/viewer/2022042220/5ec5fdb74f8ce2596d27b5be/html5/thumbnails/5.jpg)
PROPERTY SALES DATAGoal: predict how much my house is worth
• features (input variables)size (in sq. ft): o numeric o categorical o binaryneighborhood: o numeric o categorical o binary# bed rooms: o numeric o categorical o binary# bath rooms: o numeric o categorical o binarypool o numeric o categorical o binaryage (in years): o numeric o categorical o binaryrenovated o numeric o categorical o binary
• house price = target variableo numeric o categorical o binary
5
How can this data
help?
![Page 6: CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 4: …m.neumann/sp2019/cse217/... · CSE217 INTRODUCTION TO DATA SCIENCE Spring 2019 Marion Neumann ... •carefully define what kinds](https://reader034.vdocuments.us/reader034/viewer/2022042220/5ec5fdb74f8ce2596d27b5be/html5/thumbnails/6.jpg)
PREDICTING HOUSE PRICES
• target (house price) is a real number
6
How much is my house worth?
Look at Zillow!
![Page 7: CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 4: …m.neumann/sp2019/cse217/... · CSE217 INTRODUCTION TO DATA SCIENCE Spring 2019 Marion Neumann ... •carefully define what kinds](https://reader034.vdocuments.us/reader034/viewer/2022042220/5ec5fdb74f8ce2596d27b5be/html5/thumbnails/7.jpg)
LINEAR REGRESSION MODEL
7
![Page 8: CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 4: …m.neumann/sp2019/cse217/... · CSE217 INTRODUCTION TO DATA SCIENCE Spring 2019 Marion Neumann ... •carefully define what kinds](https://reader034.vdocuments.us/reader034/viewer/2022042220/5ec5fdb74f8ce2596d27b5be/html5/thumbnails/8.jpg)
TRAINING: MINIMIZE ERROR
8
PDSHp391
Linear Regression
math & statistics
![Page 9: CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 4: …m.neumann/sp2019/cse217/... · CSE217 INTRODUCTION TO DATA SCIENCE Spring 2019 Marion Neumann ... •carefully define what kinds](https://reader034.vdocuments.us/reader034/viewer/2022042220/5ec5fdb74f8ce2596d27b5be/html5/thumbnails/9.jpg)
PREDICTION: USE MODEL
9
PDSHp391
Linear Regression
![Page 10: CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 4: …m.neumann/sp2019/cse217/... · CSE217 INTRODUCTION TO DATA SCIENCE Spring 2019 Marion Neumann ... •carefully define what kinds](https://reader034.vdocuments.us/reader034/viewer/2022042220/5ec5fdb74f8ce2596d27b5be/html5/thumbnails/10.jpg)
HOW ABOUT MORE COMPLEX MODELS?
10
PDSHp393
Linear Regression
Error on training set:linear model >> quadratic >> 6-order polynomial
ß error is zero!
Is the model with zero (training)
error the best?
![Page 11: CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 4: …m.neumann/sp2019/cse217/... · CSE217 INTRODUCTION TO DATA SCIENCE Spring 2019 Marion Neumann ... •carefully define what kinds](https://reader034.vdocuments.us/reader034/viewer/2022042220/5ec5fdb74f8ce2596d27b5be/html5/thumbnails/11.jpg)
EVALUATION FOR REGRESSION
• Training Error vs. Test Error
• Error measures: • RMSE: root mean squared error• MAE: mean absolute error
11
RMSE %&, &() = +,-
.(%0. − 0.)3
MAE %&, &() = +,-
.| %0. − 0.|
%& = 6(7())predictions for test data
![Page 12: CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 4: …m.neumann/sp2019/cse217/... · CSE217 INTRODUCTION TO DATA SCIENCE Spring 2019 Marion Neumann ... •carefully define what kinds](https://reader034.vdocuments.us/reader034/viewer/2022042220/5ec5fdb74f8ce2596d27b5be/html5/thumbnails/12.jpg)
MACHINE LEARNING WORKFLOW
• Training Phase, Test Phase, Evaluation Phase
12
![Page 13: CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 4: …m.neumann/sp2019/cse217/... · CSE217 INTRODUCTION TO DATA SCIENCE Spring 2019 Marion Neumann ... •carefully define what kinds](https://reader034.vdocuments.us/reader034/viewer/2022042220/5ec5fdb74f8ce2596d27b5be/html5/thumbnails/13.jpg)
SUMMARY & READING• Learning from Data requires a lot of math!
• Regression models are used to predict real valued targets.
• We need a test set to evaluate how well our model generalizes.
13
• DSFS• Ch11: ML (p142-144) • Ch14: Simple Linear Regression (p173-176)
• PDSH Ch5: ML – Linear Regression (p390-394)• LINEAR REGRESSION BY HAND
https://www.wired.com/2011/01/linear-regression-by-hand/
SciKitLearn
understandthe model use the
model in practice