linear modelsinterpretation • we get values for parameters a and b as a = -18.85 b = 0.013 • a...

19
Linear Models Stat 430

Upload: others

Post on 31-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Linear ModelsInterpretation • We get values for parameters a and b as a = -18.85 b = 0.013 • a is the intercept - i.e. the value for Y if X=0 in this data the interpretation is

Linear ModelsStat 430

Page 2: Linear ModelsInterpretation • We get values for parameters a and b as a = -18.85 b = 0.013 • a is the intercept - i.e. the value for Y if X=0 in this data the interpretation is

Outline

• Parameter Estimation

• Goodness of Fit

• Matrix Notation: Multivariate Models

• Distributional Assumptions

• Dummy Variables

Page 3: Linear ModelsInterpretation • We get values for parameters a and b as a = -18.85 b = 0.013 • a is the intercept - i.e. the value for Y if X=0 in this data the interpretation is

Olympic Gold Medallists

!"

!#$"

!#%"

!#&"

!#'"

'"

'#$"

'#%"

'#&"

'#'"

("

)''*" )(**" )($*" )(%*" )(&*" )('*" $***" $*$*"

!"#$%&'()%*+,'-.,%

!"#"$%$&'(")"&*%*+"

,"

,%-"

,%."

,%/"

,%*"

*"

*%-"

*%."

*%/"

*%*"

0"

&**$" &0$$" &0-$" &0.$" &0/$" &0*$" -$$$" -$-$"

!"#$%&'()%*+,'-.,%

Page 4: Linear ModelsInterpretation • We get values for parameters a and b as a = -18.85 b = 0.013 • a is the intercept - i.e. the value for Y if X=0 in this data the interpretation is

Interpretation

• We get values for parameters a and b asa = -18.85b = 0.013

• a is the intercept - i.e. the value for Y if X=0in this data the interpretation is a bit obscure: for the year 0 we would expect the winner to jump 18.85m backwards (quite a feat!)

• b is the average increase that we expect for Y when we increase X by 1 unit: for each year we expect the winner to jump 1.3 cm further, from one Olympics to the next we’d expect an increase of 5.2 cm

Page 5: Linear ModelsInterpretation • We get values for parameters a and b as a = -18.85 b = 0.013 • a is the intercept - i.e. the value for Y if X=0 in this data the interpretation is

Simple Linear Regression

• How do we get a and b?

• How good is the model?

• What are confidence intervals for the parameters a and b, for predicted values?

Page 6: Linear ModelsInterpretation • We get values for parameters a and b as a = -18.85 b = 0.013 • a is the intercept - i.e. the value for Y if X=0 in this data the interpretation is

Ordinary Least Squares (OLS)Estimation

• Model: y = ax + b

• Data x1, x2, x3, ... and y1, y2, y3

• OLS: find a and b such that they minimize ∑ ( a + b xi - yi)2

• aOLS = -18.85bOLS = 0.013

Page 7: Linear ModelsInterpretation • We get values for parameters a and b as a = -18.85 b = 0.013 • a is the intercept - i.e. the value for Y if X=0 in this data the interpretation is

• b = a =

Derivation of Estimates

Page 8: Linear ModelsInterpretation • We get values for parameters a and b as a = -18.85 b = 0.013 • a is the intercept - i.e. the value for Y if X=0 in this data the interpretation is

Goodness of Fit

• Coefficient of Determination R2

• compare amount of variability overall to amount explained in model:

• R2 = (TSS - SSE)/TSS

• TSS = ∑ (yi - mean(y))2

• SSE = ∑ ei2

• R2 is value in [0,1], with R2 = 1 indicating perfect fit

Page 9: Linear ModelsInterpretation • We get values for parameters a and b as a = -18.85 b = 0.013 • a is the intercept - i.e. the value for Y if X=0 in this data the interpretation is

Extending the Model

• Explanatory variable can be discrete

Page 10: Linear ModelsInterpretation • We get values for parameters a and b as a = -18.85 b = 0.013 • a is the intercept - i.e. the value for Y if X=0 in this data the interpretation is

Example: Running in OZ

• http://www.statsci.org/data/oz/ms212.html

• Students in an introductory statistics class participated in a simple experiment: The students took their own pulse rate. They were then asked to flip a coin. If the coin came up heads, they were to run in place for one minute. Otherwise they sat for one minute. Then everyone took their pulse again. The pulse rates and other physiological and lifestyle data are given in the data.

Page 11: Linear ModelsInterpretation • We get values for parameters a and b as a = -18.85 b = 0.013 • a is the intercept - i.e. the value for Y if X=0 in this data the interpretation is

Pulse 1 vs Pulse 2

Pulse1

Pulse2

60

80

100

120

140

160

60 80 100 120 140

should look at difference in Pulse

Page 12: Linear ModelsInterpretation • We get values for parameters a and b as a = -18.85 b = 0.013 • a is the intercept - i.e. the value for Y if X=0 in this data the interpretation is

Pulse difference

factor(Ran)

dPulse

0

20

40

60

80

1 2

Page 13: Linear ModelsInterpretation • We get values for parameters a and b as a = -18.85 b = 0.013 • a is the intercept - i.e. the value for Y if X=0 in this data the interpretation is

Linear Model with Categorical Variables• y = a + bran

• Use value of b only for those students who did run

• identical to:y = a + b xran

where xran = 0 for students who did not run, and 1 otherwise (dummy variable)

Page 14: Linear ModelsInterpretation • We get values for parameters a and b as a = -18.85 b = 0.013 • a is the intercept - i.e. the value for Y if X=0 in this data the interpretation is

lm(formula = dPulse ~ Ran, data = fitness)

Residuals: Min 1Q Median 3Q Max -41.391 -3.000 1.000 4.609 42.609

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 103.783 4.489 23.12 <2e-16 ***Ran -52.391 2.715 -19.30 <2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 14 on 107 degrees of freedom (1 observation deleted due to missingness)Multiple R-squared: 0.7768,Adjusted R-squared: 0.7747 F-statistic: 372.4 on 1 and 107 DF, p-value: < 2.2e-16

Page 15: Linear ModelsInterpretation • We get values for parameters a and b as a = -18.85 b = 0.013 • a is the intercept - i.e. the value for Y if X=0 in this data the interpretation is

Interpretation of Parameters

• Ran has values 1 (= Ran) and 2 (= Sit)

• in modellm(formula = dPulse ~ Ran, data = fitness) Ran is interpreted as numeric variable

• Somebody who ran therefore has an average difference in pulse of 103.783 + 1*(-52.391) = 51.392,somebody who sat, has a difference in pulse of 103.783 + 2*(-52.391) = -1.

Page 16: Linear ModelsInterpretation • We get values for parameters a and b as a = -18.85 b = 0.013 • a is the intercept - i.e. the value for Y if X=0 in this data the interpretation is

Better, use factor

lm(formula = dPulse ~ factor(Ran), data = fitness)

Residuals: Min 1Q Median 3Q Max -41.391 -3.000 1.000 4.609 42.609

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 51.391 2.064 24.9 <2e-16 ***factor(Ran)2 -52.391 2.715 -19.3 <2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 14 on 107 degrees of freedom (1 observation deleted due to missingness)Multiple R-squared: 0.7768,Adjusted R-squared: 0.7747 F-statistic: 372.4 on 1 and 107 DF, p-value: < 2.2e-16

Page 17: Linear ModelsInterpretation • We get values for parameters a and b as a = -18.85 b = 0.013 • a is the intercept - i.e. the value for Y if X=0 in this data the interpretation is

Parameter Estimates

• By default, R is using baseline coding for all factor variables, i.e. the first effects are always set to zero:

is interpreted as difference in pulse of 51.391 + 0 when Ran is equal to 1, and 51.391 - 52.391 when Ran is equal to 2

(Intercept) 51.391 2.064 24.9 <2e-16 ***factor(Ran)2 -52.391 2.715 -19.3 <2e-16 ***

Page 18: Linear ModelsInterpretation • We get values for parameters a and b as a = -18.85 b = 0.013 • a is the intercept - i.e. the value for Y if X=0 in this data the interpretation is

Further Extension

• Multiple explanatory variables

Page 19: Linear ModelsInterpretation • We get values for parameters a and b as a = -18.85 b = 0.013 • a is the intercept - i.e. the value for Y if X=0 in this data the interpretation is

factor(Gender)

dPulse

0

20

40

60

80

1 2interaction(factor(Gender), factor(Ran))

dPulse

0

20

40

60

80

1.1 2.1 1.2 2.2

Age

dPulse

0

20

40

60

80

20 25 30 35 40 45Weight

dPulse

0

20

40

60

80

40 60 80 100