day 4 - linear regression

19
Day 4 - Linear Regression Agenda: Review: Supervised vs Unsupervised Learning Regression vs Classification Parametric Regression, Linear Regression Least Squares Formulation Solving Least Squares Formulation Issues to Pay Attention To with Linear Regression it yo É where yo fix find f cohtinuously valued

Upload: others

Post on 30-Dec-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Day 4 - Linear Regression

Day 4 - Linear Regression Agenda:

Review: Supervised vs Unsupervised Learning•Regression vs Classification•Parametric Regression, Linear Regression•Least Squares Formulation•Solving Least Squares Formulation•Issues to Pay Attention To with Linear Regression•

it yo É where yo fix

find f cohtinuously valued

Page 2: Day 4 - Linear Regression

Supervised LearningSupervised vs Unsupervised LearningSupervised Learning PipelineExample of features and labels from the context of home prices:

Page 3: Day 4 - Linear Regression

Regression and Classification Regression: Predict a CONTINUOUS value/label/output based on features/inputsClassification: Predict a DISCRETE/CATEGORICAL label based on featuresExamples:Regression:

Determine the energy efficiency of a data center given cooling operational data•

Determine the value a home will sell for given features of the home•Classification:

Given an image of a telephone phone, determine if there is rust on it•

Given an image of a dog, determine what breed of dog it is.•

Page 4: Day 4 - Linear Regression

Activity: Decide if these are regression problems or classification problems

You are have a conveyer belt of cucumbers. A photograph is taken and the cucumber is •identified as either high, medium, or low quality.

You are given a resume of a person, and you predict the salary they will be offered for a job if •hired.

Activity: Would you approach the following task as a regression or a classification problem? The company Square has access to the financial transactions (on its platform) for a given company. The company asks for a $2000 loan. Square needs to decide if they will approve or deny the loan request.

Classification:Features: Account balances, recent financial transactions, other market conditions Training Labels: Did they repay loan (requested in the past) Predict:

Regression:Features: Account balances, recent financial transactions, other market conditions Training Labels: Market valuation OR account balance in two weeks

Page 5: Day 4 - Linear Regression

Mathematically

Let F IRI IRy fix noise

Find 8 f

Classification predict membership in a category

wtf no Io.fiy frxlt noise

Given It Yi in nFind F

boundarydecision

Terminology

X input variables predictors independent vars features

Y response dependent variable outputvariable

f model predictor hypothesis

Page 6: Day 4 - Linear Regression

The parametric approach to regression With a parametric approach to regression, you assume the relationship of the input to output is given by a function, f, with unknown parameters. This requires making modeling assumptions on f.Example of data modelMathematically,

Iunknown parameters

Consider XE IR y f X EIR of form

Y Bot B X Bat t Error

for unknown PoPi B2

Page 7: Day 4 - Linear Regression

Or perhaps we want a model that is not linear in its input:

Y BootBiot Bati t Bo Kat Boz kit B Kika terror

for unknown BooBio BaoBon Boz B

Parametric

Regression predict a continuous value

Model Fo IR't IRy fort noise

Given visit in

Find 8 O

Page 8: Day 4 - Linear Regression

What is linear regression?A linear regression problem is one where the response is linear in the model parameters.

Examplesfo IR SIR

g EP

Icar in PoB

F IR TIR

y Rotax taxitlinear inBoBiB2

f IR SIR

Not y BittyLINEARREGRESSION not linear

in BlBz

Page 9: Day 4 - Linear Regression

Are the following models linear in their parameters?

F IR SIR YESy B E t Basin X

F IR s IR

y Sin B X Bat P

No

Higher dimensional Examples

F IR s IR

y I 1kt BI

YES

f IR S IR

y Bot Bit t Barat B X PyxI

Page 10: Day 4 - Linear Regression

Can you use linear regression to solve for the parameters of the following models?

yes

f IR SIR

Y Bot X X

No

f IR SIR

y Botox y PotB X

yes

F IR SIRPz logy lad

y Pok t laxlyk

Page 11: Day 4 - Linear Regression

Least Squares Formulation for Linear Regression (for models that are linear wrt input)

Given D X Yii n

X'VERd Y EIR

model Y

fightterror

Want 0 such that Yee fork

Idea minimize square deviation of Yi from fort

idata point

font fipoohminimize

The

set to

Page 12: Day 4 - Linear Regression

Rewrite using vectors and matrices

Lot 9 1tix ta

Ma EEE His

Want y Q

min I 119 1011

Recall if a GIR belk

a b E a b

Hall Éa

da yoa a

Page 13: Day 4 - Linear Regression

Least squares formulation for linear regression (with models linear in their input) Solving the least squares formulation using vector calculus

mint 119 11011Q

where y P E f in

Let YE IR EIR O endknown known unknown

min I 119 11011Consider

If I has rank d

the solution is given by Ot

Ity

Page 14: Day 4 - Linear Regression

Whyyo Illy ON It y 0

set this gradient equal too

try Xo O

normal Equis Xt yo TtyAtx tty if X'X

is invertible

If X has rank d Ktx has rank d

Note Ktx is ded so Ktx is invertible

why is 7 Illy 11011 X ly Xo

Recall Taylors than for multivariate functions

If 53Rd S R

f Ath fix Jfk h 01114114

You can use this to read off a gradient after

applying a perturbation h

Let FCO Elly XO IF II XO yipXO y Xo y

sofroth totth y Korth y

L XO y th XO y th

XO y XO y th XO y

Page 15: Day 4 - Linear Regression

XO Y XD IC th th

fro t XO y th t Ohl

Frost key hot ornhli

TFGso Tfrol Xtra y try Xo

When is X of rank d

If nd X is of ranken

Need more data points

than parameters to get a unique0

If any features are duplicates

for linear combinations of each

other X is of rank d

Need to remove dependent

features to use formula above

Page 16: Day 4 - Linear Regression

Other ways to solve least squares formulation for linear regression Use a computer package such as TensorFlow or PyTorch to run Gradient Descent down the objective.

fmin FG

One on e8w

É

eIGrad Desc

Page 17: Day 4 - Linear Regression

Least Squares Formulation for Linear Regression (for a general model)

Given D X Yii n

X GIRD Y EIR

model Y ftp.grflterrorWant y Q

EEEi n

ginny

so do same process as above but w features

9 X1 9210 gurl

instead of X

Ex Fi IR IR

Y PotB Kt 122

É

min É Yi BotBWitz x'way

madly tofu of xp

Page 18: Day 4 - Linear Regression

Things that can go wrong: Underfitting and OverfittingOther topics: What happens when there is fewer data than features?

Underfitting

Ei

X

y overfitting

Ei

x

Page 19: Day 4 - Linear Regression

How do you deal with categorical features? Be careful about whether you want to view your problem as a prediction task