regression intro annotated
TRANSCRIPT
-
8/19/2019 Regression Intro Annotated
1/42
Regression
Predicting House Prices
Emily Fox & Carlos Guestrin
Machine Learning Specialization
University of Washington
1 23456 78%.9 :1; < ="*.1> ?@'>A*%&
-
8/19/2019 Regression Intro Annotated
2/42
Predicting house prices
23456 78%.9 :1; < ="*.1> ?@'>A*%&
-
8/19/2019 Regression Intro Annotated
3/42
B
How much is my house worth?
23456 78%.9 :1; < ="*.1> ?@'>A*%&
I want to listmy house
for sale
-
8/19/2019 Regression Intro Annotated
4/42
C
How much is my house worth?
23456 78%.9 :1; < ="*.1> ?@'>A*%&
$$ ????
-
8/19/2019 Regression Intro Annotated
5/42
6
Look at recent sales in my neighborh
• How much did they sell for?
23456 78%.9 :1; < ="*.1> ?@'>A*%&
-
8/19/2019 Regression Intro Annotated
6/42
D
Plot recent house sales(Past years
)
23456 78%.9 :1; < ="*.1> ?@'>A*%&
square feet (sq.ft.)
p r i c e
( $ )
x
y
Termix – fea
covapred
y – obsresp
-
8/19/2019 Regression Intro Annotated
7/42
E
Predict your house bysimilar houses
23456 78%.9 :1; < ="*.1> ?@'>A*%&
square feet (sq.ft.)
p r i c e
( $ )
x
y
No houserecently the same
-
8/19/2019 Regression Intro Annotated
8/42
F 23456 78%.9 :1; < ="*.1> ?@'>A*%&
square feet (sq.ft.)
p r i c e
( $ )
x
y
• Look at a
price in • Still only
• Throwinfrom all
Predict your house bysimilar houses
-
8/19/2019 Regression Intro Annotated
9/42
Linear regression
23456 78%.9 :1; < ="*.1> ?@'>A*%&
-
8/19/2019 Regression Intro Annotated
10/4254 23456 78%.9 :1; < ="*.1> ?@'>A*%&
square feet (sq.ft.)
p r i c e
( $ )
x
y Fit a line through the data
f(x) = w0+w1
param
of m
Use a linear regression model
-
8/19/2019 Regression Intro Annotated
11/4255
Use a linear regression model
23456 78%.9 :1; < ="*.1> ?@'>A*%&
square feet (sq.ft.)
p r i c e
( $ )
x
y
fun
parame
w = (w
Fit a line through the data
fw (x) = w0+w
-
8/19/2019 Regression Intro Annotated
12/4253
Which line?
23456 78%.9 :1; < ="*.1> ?@'>A*%&
square feet (sq.ft.)
p r i c e
( $ )
x
y
different parameters w
fw (x) = w0+w
-
8/19/2019 Regression Intro Annotated
13/425B
“Cost” of using a given line
23456 78%.9 :1; < ="*.1> ?@'>A*%&
square feet (sq.ft.)
p r i c e
( $ )
x
y Residual sum of squares (RSS)
RSS(w0,w1) =($house 1-[w0+w1s
+ ($house 2-[w0+w1+ ($house 3-[w0+w1+ … [include all ho
-
8/19/2019 Regression Intro Annotated
14/425C
Find “best” line
23456 78%.9 :1; < ="*.1> ?@'>A*%&
square feet (sq.ft.)
p r i c e
( $ )
x
y Minimize cost over allpossible w0,w1
ŵ
RSS(w0,w1) =($house 1-[w0+w1s
+ ($house 2-[w0+w1+ ($house 3-[w0+w1+ … [include all ho
-
8/19/2019 Regression Intro Annotated
15/4256
Predicting your house price
23456 78%.9 :1; < ="*.1> ?@'>A*%&
square feet (sq.ft.)
p r i c e
( $ )
x
y
Best guess
house
fw*(x) = ŵ0 + ŵ1
x
ŷ = ŵ0 + ŵ1
s
-
8/19/2019 Regression Intro Annotated
16/42
Adding higher order effects
23456 78%.9 :1; < ="*.1> ?@'>A*%&
-
8/19/2019 Regression Intro Annotated
17/425E
Fit data with a line or … ?
23456 78%.9 :1; < ="*.1> ?@'>A*%&
square feet (sq.ft.)
p r i c e
( $ )
x
y
Youyouyou
-
8/19/2019 Regression Intro Annotated
18/425F
Fit data with a line or … ?
23456 78%.9 :1; < ="*.1> ?@'>A*%&
square feet (sq.ft.)
p r i c e
( $ )
x
y
Dnorel
-
8/19/2019 Regression Intro Annotated
19/425G
What about a quadratic function?
23456 78%.9 :1; < ="*.1> ?@'>A*%&
square feet (sq.ft.)
p r i c e
( $ )
x
y
Dnorel
-
8/19/2019 Regression Intro Annotated
20/4234 23456 78%.9 :1; < ="*.1> ?@'>A*%&
square feet (sq.ft.)
p r i c e
( $ )
x
y
fw(x) = w0 + w1 x+ w
What about a quadratic function?
-
8/19/2019 Regression Intro Annotated
21/4235
Even higher order polynomial
23456 78%.9 :1; < ="*.1> ?@'>A*%&
square feet (sq.ft.)
p r i c e
( $ )
x
y
m y
-
8/19/2019 Regression Intro Annotated
22/4233
Do you believe this fit?
23456 78%.9 :1; < ="*.1> ?@'>A*%&
square feet (sq.ft.)
p r i c e
( $ )
x
y
My hisn’t w
so li
-
8/19/2019 Regression Intro Annotated
23/42
Evaluating overfitting viatraining/test split
23456 78%.9 :1; < ="*.1> ?@'>A*%&
-
8/19/2019 Regression Intro Annotated
24/42
3C
Do you believe this fit?
Minimize
but bad p
23456 78%.9 :1; < ="*.1> ?@'>A*%&
square feet (sq.ft.)
p r i c e
( $ )
x
y
-
8/19/2019 Regression Intro Annotated
25/42
36 23456 78%.9 :1; < ="*.1> ?@'>A*%&
square feet (sq.ft.)
p r i c e
( $ )
x
y
fw(x) = w0 + w1 x+ w
What about a quadratic function?
-
8/19/2019 Regression Intro Annotated
26/42
3D
How to choose modelorder/complexity
• Want good predictiocan’t observe future
• Simulate predictions1. Remove some houses
2.
Fit model on remaining
3.
Predict heldout houses23456 78%.9 :1; < ="*.1> ?@'>A*%&
-
8/19/2019 Regression Intro Annotated
27/42
3E
Training/test split
23456 78%.9 :1; < ="*.1> ?@'>A*%&
Terminology: – training set
– test set
-
8/19/2019 Regression Intro Annotated
28/42
3F
Training error
23456 78%.9 :1; < ="*.1> ?@'>A*%&
square feet (sq.ft.)
p r i c e
( $ )
x
y
Training erro($train 1-fw(
+ ($train 2-fw+ ($train 3-fw+ … [includ
train
Minimize to
find ŵ
-
8/19/2019 Regression Intro Annotated
29/42
3G
Test error
23456 78%.9 :1; < ="*.1> ?@'>A*%&
square feet (sq.ft.)
p r i c e
( $ )
x
y
Test error (ŵ($test 1-fŵ(s
+ ($test 2-fŵ(+ ($test 3-fŵ(+ … [includ
test h
Assesspredictions
using ŵ
-
8/19/2019 Regression Intro Annotated
30/42
B4
Training/Test Curves
23456 78%.9 :1; < ="*.1> ?@'>A*%&
Model complexity
E r r o r
-
8/19/2019 Regression Intro Annotated
31/42
Adding other features
23456 78%.9 :1; < ="*.1> ?@'>A*%&
-
8/19/2019 Regression Intro Annotated
32/42
B3
Predictions just based onhouse size
23456 78%.9 :1; < ="*.1> ?@'>A*%&
square feet (sq.ft.)
p r i c e
( $ )
x
y
Only 1 bathro
Not same as 3 bathroom
-
8/19/2019 Regression Intro Annotated
33/42
BB
Add more features
23456 78%.9 :1; < ="*.1> ?@'>A*%&
square feet (sq.ft.)
p r i c e
( $ )
x1
y
x2
fw(x) = w0 ++ w2
-
8/19/2019 Regression Intro Annotated
34/42
BC
How many features to use?
• Possible choices:
- Square feet
- # bathrooms
- # bedrooms
- Lot size
- Year built
- …
• See Regression Course!
23456 78%.9 :1; < ="*.1> ?@'>A*%&
-
8/19/2019 Regression Intro Annotated
35/42
Other regression examples
35 23456 78%.9 :1; < ="*.1> ?@'>A*%&
-
8/19/2019 Regression Intro Annotated
36/42
BD
Salary after ML specialization
• How much will your salary be? (y = $$)
• Depends on x = performance in courses, quality ofcapstone project, # of forum responses, …
23456 78%.9 :1; < ="*.1> ?@'>A*%&
hard work
-
8/19/2019 Regression Intro Annotated
37/42
BE
Salary after ML specialization
ŷ = ŵ0 + ŵ1
performance +ŵ2
capstone + ŵ3 forum
23456 78%.9 :1; < ="*.1> ?@'>A*%&
informed by other students who
completed specialization
hard work
-
8/19/2019 Regression Intro Annotated
38/42
BF
Stock prediction
23456 78%.9 :1; < ="*.1> ?@'>A*%&
• Predict the price of a stock
•
Depends on
- Recent history of stock price
- News events
- Related commodities
-
8/19/2019 Regression Intro Annotated
39/42
BG
Tweet popularity
• How many people will retweet your tweet?
•
Depends on # followers,# of followers of followers,
features of text tweeted,
popularity of hashtag,# of past retweets,…
23456 78%.9 :1; < ="*.1> ?@'>A*%&
-
8/19/2019 Regression Intro Annotated
40/42
C4
Smart houses
23456 78%.9 :1; < ="*.1> ?@'>A*%&
• Smart houses have many distributed sensors
•
What’s the temperature at your desk? (no sensor)
- Learn spatial function to predict temp
• Also depends on
- Thermostat setting
-
Blinds open/closedor window tint
- Vents
- Temperature outside
- Time of day
-
8/19/2019 Regression Intro Annotated
41/42
Summary for regression
23456 78%.9 :1; < ="*.1> ?@'>A*%&
-
8/19/2019 Regression Intro Annotated
42/42
What you can do now…• Describe the input (features) and output (real-valued
predictions) of a regression model
•
Calculate a goodness-of-fit metric (e.g., RSS)• Estimate model parameters by minimizing RSS
(algorithms to come…)
• Exploit the estimated model to form predictions
• Perform a training/test split of the data
• Analyze performance of various regression models in
terms of test error
• Use test error to avoid overfitting when selecting amongst
candidate models
• Describe a regression model using multiple features
• Describe other applications where regression is useful