deconstructing data science - coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... ·...

39
Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 14: Linear regression Mar 7, 2017

Upload: others

Post on 26-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

Deconstructing Data ScienceDavid Bamman, UC Berkeley

Info 290

Lecture 14: Linear regression

Mar 7, 2017

Page 2: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

Regression

x = the empire state building y = 17444.5625”

A mapping from input data x (drawn from instance space 𝓧) to a point y in ℝ

(ℝ = the set of real numbers)

Page 3: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

Regression problems

task 𝓧 𝒴

predicting box office revenue movie ℝ

Page 4: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

4

Page 5: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

Feature Value

follow clinton 0

follow trump 0

“benghazi” 0

negative sentiment + “benghazi” 0

“illegal immigrants” 0

“republican” in profile 0

“democrat” in profile 0

self-reported location = Berkeley 1

x = feature vector

5

Feature β

follow clinton -3.1

follow trump 6.8

“benghazi” 1.4

negative sentiment + “benghazi” 3.2

“illegal immigrants” 8.7

“republican” in profile 7.9

“democrat” in profile -3.0

self-reported location = Berkeley -1.7

β = coefficients

Page 6: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

Linear regression

y =F�

i=1xiβi + ε

y =F�

i=1xiβi

true value y

prediction ŷ

ε = y � y 𝜀 is the difference between the prediction and true value

Page 7: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

-10

0

10

-5.0 -2.5 0.0 2.5 5.0x

y

Page 8: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

-1

0

1

2

4 8 12x

y

Page 9: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

y =F�

i=1fi(x)βi

f1(x) =

�1 if x < 6 or x > 100 otherwise

Linear regression is linear in the parameters β

Page 10: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

10

Feature β

follow clinton -3.1

follow trump 6.8

“benghazi” 1.4

negative sentiment + “benghazi” 3.2

“illegal immigrants” 8.7

“republican” in profile 7.9

“democrat” in profile -3.0

self-reported location = Berkeley -1.7

β = coefficients

How do we get good values for β?

Page 11: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

Least squares

β = minβ

N�

i=1ε2 we want to minimize the

errors we make

β = minβ

N�

i=1

�y �F�

j=1xjβj

�2

β = minβ

N�

i=1(y � y)2

Page 12: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

• We can solve this in two ways:

• Closed form (normal equations) • Iteratively (gradient descent)

Least squares

β = minβ

N�

i=1

�y �F�

j=1xjβj

�2

Page 13: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture
Page 14: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture
Page 15: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

Code

Page 16: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

16

Feature β

follow clinton -3.1

follow trump + follow NFL + follow bieber 7299302

“benghazi” 1.4

negative sentiment + “benghazi” 3.2

“illegal immigrants” 8.7

“republican” in profile 7.9

“democrat” in profile -3.0

self-reported location = Berkeley -1.7

β = coefficients

Many features that show up rarely may likely only appear (by

chance) with one label

More generally, may appear so few times that the noise of

randomness dominates

Page 17: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

Ridge regression

We want both of these to be small!

This corresponds to a prior belief that β should be 0

β = minβ

N�

i=1(y � y)2

� �� �error

+ηF�

i=1β2i

� �� �coefficient size

Page 18: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

Ridge regression

A.K.A.

L2 regularization Penalized least squares

β = minβ

N�

i=1(y � y)2

� �� �error

+ηF�

i=1β2i

� �� �coefficient size

Page 19: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

19

Matt Gerald $295,619,605

Peter Mensah $294,475,429

Lewis Abernathy $188,093,808

Sam Worthington $186,193,754

CCH Pounder $184,946,303

… …

Steve Bacic -$65,334,914

Jim Ward -$66,096,435

Karley Scott Collins -$66,612,154

Dee Bradley Baker -$73,571,884

Animals -$110,349,541

low L2

Computer Animation

$68,629,803

Hugo Weaving $39,769,171

John Ratzenberger $36,342,438

Tom Cruise $36,137,757

Tom Hanks $34,757,574

… …

Western -$13,223,795

World cinema -$13,278,965

Crime Thriller -$14,138,326

Anime -$14,750,932

Indie -$21,081,924

med L2

BIAS: $13,394,465

Adventure $6,349,781

Action $5,512,359

Fantasy $5,079,546

Family Film $4,024,701

Thriller $3,479,196

… …

Western -$752,683

Black-and-white -$1,389,215

World cinema -$1,534,435

Drama -$2,432,272

Indie -$3,040,457

high L2

BIAS: $45,044,525 BIAS: $5,913,648

Page 20: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

Assumptions

Anscombe’s quartet

Page 21: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

Probabilistic Interpretation

P(yi | x, β) = Norm(yi | yi, σ2)

“the errors are normally distributed”

Page 22: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

4 6 8 10 12 14 16 18

4

6

8

10

12

x1

y 1

Page 23: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

0.0

0.1

0.2

0.3

0.4

0 5 10 15y

P(y)

Probabilistic InterpretationP(yi | x, β) = Norm(yi | yi, σ2)

ŷ = 9

y = 7

Page 24: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

Conditional likelihood

24

N�

iP(yi | xi, β)

For all training data, we want probability of the true value y for

each data point x to high

This principle gives us a way to pick the values of the parameters β that maximize the probability of

the training data <x, y>

Page 25: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

Outliers

4 6 8 10 12 14 16 18

4

6

8

10

12

x3

y 3

Page 26: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

0.0

0.1

0.2

0.3

0.4

0 5 10 15y

P(y)

Probabilistic InterpretationP(yi | x, β) = Norm(yi | yi, σ2)

ŷ = 9

y = 13

Page 27: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

Robust regression

• Rather than modeling the errors as normally distributed, pick some heavier-tailed distribution instead

• This will assign higher likelihood to the outliers without having to move the best fit for the coefficients.

Page 28: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

Heavy tailed distributions

0.0

0.1

0.2

0.3

0.4

-5.0 -2.5 0.0 2.5 5.0y

P(y)

0.00

0.25

0.50

0.75

1.00

-5.0 -2.5 0.0 2.5 5.0y

P(y)

Normal vs Laplace

Page 29: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

Homoscedasticity

5 10 15 20 25

5

10

15

20

25

x1

y 1

Assumption that the variance in y is constant for all values of x; this data is heteroscedastic

Page 30: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

EvaluationGoodness of fit (to training data)

R2 = 1 ��

i(yi � yi)2�i(yi � y)2

sum of square errors

total sum of squares

For most models, R2 ranges from 0 (no fit) to 1 (perfect fit)

Page 31: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

Experiment designtraining development testing

size 80% 10% 10%

purpose training models model selectionevaluation;

never look at it until the very

end

Page 32: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

Metrics• Measure difference between the prediction ŷ and

the true y

1N

N�

i=1(yi � yi)2Mean squared error

(MSE)

1N

N�

i=1|yi � yi|Mean absolute error

(MAE)

Page 33: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

Interpretationy = x0β0 + x1β1

x0β0 + x1β1 + β1

x0β0 + (x1 + 1)β1 Let’s increase the value of x1 by 1 (e.g., from 0 → 1)

= y + β1β represents the degree to

which y changes with a 1-unit increase in x

Page 34: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

Independencebenedict cumberbatch stars

movie good 1

terrible acting benedict cumberbatch 0

benedict cumberbatch script excellent 1

excellent script movie good 1

benedict cumberbatch good excellent 1

• benedict • cumberbatch • stars • movie • good • acting • script • excellent • terrible

Page 35: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

Code

Page 36: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

Independencebenedict_cumberbatch stars

movie good 1

terrible acting benedict_cumberbatch 0

benedict_cumberbatch script excellent 1

excellent script movie good 1

benedict_cumberbatch good excellent 1

• benedict_cumberbatch

• stars • movie • good • acting • script • excellent • terrible

Page 37: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

Significance

Page 38: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

I ngrams

II POS ngrams

III Dependency relations

Joshi et al. (2010)

Page 39: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s17/slides/14... · 2017. 3. 8. · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture

Joshi et al. (2010)