simple linear regression ams 572 11/29/2010. outline 1.brief history and motivation – zhen gong...

69
Simple Linear Regression AMS 572 11/29/2010

Upload: morgan-chambers

Post on 26-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Simple Linear RegressionAMS 57211/29/2010

Page 2: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Outline1. Brief History and Motivation – Zhen Gong2. Simple Linear Regression Model – Wenxiang

Liu3. Ordinary Least Squares Method – Ziyan Lou4. Goodness of Fit of LS Line – Yixing Feng5. OLS Example – Lingbin Jin6. Statistical Inference on Parameters – Letan Lin7. Statistical Inference Example – Emily Vo8. Regression Diagnostics– Yang Liu9. Correlation Analysis – Andrew Candela10.Implementation in SAS – Joseph Chisari

2/69

Page 3: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Legendre published the earliest form of regression, which was the method of least squares in 1805.

In 1809, Gauss published the same method.

The method was extended by Francis Galton in the 19th century to describe a biological phenomenon.

Karl Pearson and Udny Yule extended it to a more general statistical context around 20th century.

Brief History and Introduction

3/69

Page 4: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Motivation for Regression Analysis

• Regression analysis is a statistical methodology to estimate the relationship of a response variable to a set of predictor variable.

• When there is just one predictor variable, we will use simple linear regression. When there are two or more predictor variables, we use multiple linear regression

New observedpredictor value

Prediction for response variable

?

Predict Y, based on X

4/69

Page 5: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

2010 Camry:Horsepower at 6000 rpm: 169Highway gasoline consumption: 0.03125 gallon per mile

2010 Milan:Horsepower at 6000 rpm: 175Highway gasoline consumption: 0.0326 gallon per mile

2010 Fusion:Horsepower at 6000 rpm: 263Highway gasoline consumption: ?

Response variable (Y): Highway gasoline consumptionPredictor variable (X): Horsepower at 6000 rpm

Motivation for Regression Analysis

5/69

Page 6: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

•A summary of the relationship between a dependent variable (or response variable) Y and an independent variable (or covariate variable) X.

•Y is assumed to be a random variable while, even if X is a random variable, we condition on it (assume it is fixed). Essentially, we are interested in knowing the behavior of Y given we know X = x.

Simple Linear Regression Model

6/69

Page 7: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

•Regression models attempt to minimize the distance measured vertically between the observation point and the model line (or curve).

•The length of the line segment is called residual, modeling error, or simply error.

•The negative and positive errors should cancel out⇒ Zero overall errorMany lines will satisfy this criterion.

Good Model

7/69

Page 8: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Good Model

8/69

Page 9: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

• In simple linear regression, the population regression line was given by

E(Y) = β0+β1x •The actual values of Y are assumed to be

the sum of the mean value, E(Y), and a random error term, ∊:

Y = E(Y) + ∊ = β0+β1x + ∊•At any given value of x, the dependent

variable Y ~ N (β0+β1x , σ2)

Probabilistic Model

9/69

Page 10: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Least Squares (LS) Fit

Pressure Boiling Pt Pressure Boiling Pt20.79 194.5 24.01 201.320.79 194.3 25.14 203.622.40 197.9 26.57 204.622.67 198.4 28.49 209.523.15 199.4 27.76 208.623.35 199.9 29.04 210.723.89 200.9 29.88 211.923.99 201.1 30.06 212.224.02 201.4

Boiling Point of Water in the Alps

10/69

Page 11: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Least Squares (LS) Fit

Find a line that represent the

”best” linear relationship:

11/69

Page 12: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Least Squares (LS) Fit• Problem: the data does not go through a line

•Find the line that minimizes the sum:

• We are looking for the line that minimizes 210 )()(

iii xyxe

2

110

n

iii xyQ

ii xy 10 ni ,......2,1

12/69

Page 13: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Least Squares (LS) Fit

• To get the parameters that make the sum of square difference become minimum, take partial derivative for each parameter and equate it with zero.

2

0 1

1 1

0 1

20 1

20 1

0

2 0

0

i i

i i i

i i i i

i i i i

y xQ

y x x

x y x x

x x x y

2

0 1

0 0

0 1

0 1

0 1

0

2 1 0

0

i i

i i

i i

i i

y xQ

y x

y n x

n x y

13/69

Page 14: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Least Squares (LS) Fit

2

1 1 1 10

2 2

1 1

1 1 11

2 2

1 1

( )( ) ( )( )

( )

( )( )

( )

n n n n

i i i i ii i i i

n n

i ii i

n n n

i i i ii i i

n n

i ii i

x y x x y

n x x

n x y x y

n x x

•Solve the equations and we get

14/69

Page 15: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Least Squares (LS) Fit

1 1 1 1

2 2 2

1 1 1

2 2 2

1 1 1

1( )( ) ( )( )

1( ) ( )

1( ) ( )

n n n n

xy i i i i i ii i i i

n n n

xx i i ii i i

n n n

yy i i ii i i

S x x y y x y x yn

S x x x xn

S y y y yn

• To simplify, we introduce

10 1

xy

xx

Sy x

S

0 1y x

• The resulting equation is known as the least squares line, which is an estimate of the true regression line.

15/69

Page 16: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Goodness of Fit of the LS Line

The fitted values is

The residuals

are used to evaluate the goodness of fit of

the LS Line.

0 1ˆ ˆ( )i i ie y x

0 1ˆi iy x

16/69

Page 17: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Goodness of Fit of the LS Line

The error sum of squares SSE=

The total sum of squares

SST=

The regression sum of squares

SST=SSR+SSE

2 2 2

1 1 1 1

0

ˆ ˆ ˆ ˆ( ) ( ) ( ) 2 ( )( )n n n n

i i i i i i ii i i i

SSR SSE

SST y y y y y y y y y y

17/69

Page 18: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Goodness of Fit of the LS Line•The coefficient of determination

is always between 0 and 1 •The sample correlation coefficient

between X and Y is

For the simple linear regression,

18/70

Page 19: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Estimation of the variance

The variance measures the scatter of the

around their means

An unbiased estimate of is given by

This estimate of has n-2 degrees of freedom.

2

2 1

2 2

n

ii

eSSE

sn n

19/69

Page 20: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Implementing OLS method to Problem 10.4

OLS method:

20 1

1

[ ( )]n

i ii

Q y x

The time between eruptions of Old Faithful geyser in Yellowstone National Park is random but is related to the duration of the last eruption. The table below shows these times for 21 consecutive eruptions.

Obs No.

Last

Next

Obs No.

Last

Next

Obs No.

Last Next

1 2.0 50 8 2.8 57 15 4.0 77

2 1.8 57 9 3.3 72 16 4.0 70

3 3.7 55 10 3.5 62 17 1.7 43

4 2.2 47 11 3.7 63 18 1.8 48

5 2.1 53 12 3.8 70 19 4.9 70

6 2.4 50 13 4.5 85 20 4.2 79

7 2.6 62 14 4.7 75 21 4.3 72

20/69

Page 21: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

A scatter plot of Next vs. LAST

Implementing OLS method to Problem 10.4

21/69

Page 22: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

3.238x y=62.714

212

1

( ) 22.230xx ii

S x x

21

2

1

( ) 2844.286yy ii

S y y

21

1

( )( ) 217.629xy i ii

S x x y y

212

1

ˆ( ) 713.687i ii

SSE y y

21

2

1

ˆ( ) 2130.599ii

SSR y y

2844.286yySST S

1 / 9.790xy xxS S 0 1ˆ ˆ 31.013y x

Implementing OLS method to Problem 10.4

22/69

Page 23: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

0 1ˆ ˆy= x

When x=3, y=60

/ 0.865r SSR SST

We could say that Last is a good predictor of Next

Implementing OLS method to Problem 10.4

23/69

Page 24: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Final Result

and are normally distributed.

.

Statistical Inference

0 1

00 )ˆ( E 11)ˆ( E

xx

i

nS

xSD

2

0 )ˆ( xxS

SD )ˆ( 1

)1,0(~)ˆ(

ˆ

0

00 NSD

)1,0(~

)ˆ(

ˆ

1

11 NSD

Statistical Inference on and0 1

24/69

Page 25: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Set ’s as fixed and use

Derivation

. ix 0)( xnxxx ii

n

i xx

i

xx

iii

xx

ii

S

Yxx

S

xxYYxx

S

YYxx

1

1

)(

)()())((

xY 10ˆˆ

Statistical Inference on and 0 1

25/69

Page 26: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Derivation

.

xxxx

xx

n

ii

xx

n

i xx

i

n

ii

xx

i

SS

S

xxS

S

xx

YVarS

xxVar

2

2

2

1

22

2

1

2

2

1

2

1

)(

)()ˆ(

11

21

11

1

11

10

1

10

11

)(

)()(

)()(

)()(

)()()ˆ(

n

ii

xx

n

ii

n

iii

xx

n

i xx

iin

i xx

i

n

i xx

ii

n

i xx

ii

xxS

xxxxxxS

S

xxx

S

xx

S

xExx

S

YExxE

Statistical Inference on and 0 1

26/69

Page 27: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Derivation

.

0

110

110

1

10

)(

)ˆ()(

)ˆ()ˆ(

xn

xn

xn

xE

xEn

YE

xYEE

i

i

i

Statistical Inference on and 0 1

27/69

xx

i

xx

ii

xx

nS

x

nS

xnxxx

S

x

n

VarxYVar

xYVarVar

22

22

222

12

10

)(

)ˆ()(

)ˆ()ˆ(

Page 28: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Since

2

1

11~)ˆ(

ˆ

nt

SE

Pivotal Quantities (P.Q.):

.

2222

2

~)2(

n

SSESn

Confidence Intervals (CI’s):

. )ˆ(ˆ 02/,20 SEtn

2

0

00~)ˆ(

ˆ

nt

SE

)ˆ(ˆ12/,21 SEtn

xx

i

nS

xsSE

2

0)ˆ(xxS

sSE )ˆ( 1

Statistical Inference on and 0 1

28/70

Page 29: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

A useful application is to show whether there is a linear relationship between x and y

29/69

Hypothesis tests:

. 0110

0110 :.: HvsH

Reject at level if

0H 2/,2

1

011

0)ˆ(

ˆ

ntSE

t

0:.0: 1010 HvsH

Reject at level if

0H 2/,2

1

1

0)ˆ(

ˆ

nt

SEt

One-side alternative hypotheses can be tested using one-side t-test.

Statistical Inference on and0 1

Page 30: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Mean Square: A sum of squares divided by its degrees of freedom.

30/69

Analysis of Variance (ANOVA)

2and

1 n

SSEMSE

SSRMSR

020

2

1

1

2

12

21

2 )ˆ(

ˆ

/

ˆˆFt

SESss

S

s

SSR

MSE

MSR

xx

xx

22/,2,2,1 nn tf

Page 31: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Source of Variation (Source)

Sum of Squares (SS)

Degrees of

Freedom (d.f.)

Mean Square (MS)

F

Regression

Error

SSR

SSE

1

n - 2

Total SST n - 1

Analysis of Variance (ANOVA)ANOVA Table:

1

SSRMSR

2n

SSEMSE

MSE

MSRF

31/69

Page 32: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Statistical Inference Example – Testing for Linear Relationship

•Problem 10.4At α = 0.05, is there a linear trend between

the time to the NEXT eruption and the duration of the LAST eruption?

vs.

Reject H0 if where

0 1: 0H 1 1: 0H

2, /2nt t

1

1

tSE

32/69

Page 33: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Statistical Inference – Hypothesis TestingSolution:

1

217.6299.790

22.230xy

xx

SB

S

2

1

713.687n

i ii

SSE y y

713.689

6.1292 19

SSEs

n

1

xx

sSE

S

6.1291.2999

22.230

1

1

9.7907.531

1.2999tSE

7.531 2.0932, /2 19,0.025 2.093nt t

We reject H0 and therefore concludeThat there is a linear relationship between NEXT and LAST.

33/70

Page 34: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Statistical Inference Example - Confidence and Prediction Intervals

•Problem 10.11 from Tamane & Dunlop Statistics and Data Analysis

10.11 (a) Calculate a 95% PI for the time to the next eruption if the last eruption lasted 3 minutes.

34/69

Page 35: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Problem 10.11 – Prediction Interval

Solution:The formula for a 100(1-α)% PI for a future

observation is given by

*Y

* 2*

2, /2

1 ( )1n

xx

x xY t s

n S

35/69

Page 36: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Problem 10.11 - Prediction Interval

1 9.790xy

xx

SB

S

6.1292

SSEs

n

[47.238,73.529]

0 1 31.013B y B x

**

0 1Y B B x 31.013 9.790(3)

60.385

2, /2 19,0.025 2.093nt t

* 2*

2, /2

1 ( )1n

xx

x xY t s

n S

2

60.385

1 (3 3.238)(2.093)(6.129) 1

21 22.230

36/69

Page 37: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Problem 10.11 - Confidence Interval

10.11(b) Calculate a 95% CI for the mean time to the next eruption for a last eruption lasting 3 minutes. Compare this confidence interval with the PI obtained in (a)

37/69

Page 38: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Problem 10.11 - Confidence IntervalSolution:The formula for a 100(1-α)% CI for is

given by

where The 95% CI is [57.510, 63.257]The CI is shorter than the PI

*

* 2*

2, /2

1 ( )n

xx

x xt s

n S

**

0 1B B x

38/70

Page 39: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Regression Diagnostics

Checking the Model Assumptions1. is a linear function of 2. is the same for all 3. The errors are normally

distributed 4. The errors are independent(for

time series data) Checking for Outliers and Influential Observations

ix( )iE Y2( )iVar Y ix

ii

39/69

Page 40: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Checking the Model Assumptions•Residuals:

• can be viewed as the “estimates”

of random errors

ˆi i ie y y

ie'i s

22 2( )1

~ (0, 1 )ii

xx

x xe N

n S

40/69

Page 41: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Checking for Linearity

•If regression of on is linear, then the plot of vs. should exhibit random scatter around zero

y xie ix

41/69

Page 42: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

1 0 394.33 360.64 33.69

2 4 329.50 331.51 -2.01

3 8 302.39 302.39 -11.39

4 12 273.27 273.27 -18.10

5 16 244.15 244.15 -14.82

6 20 215.02 215.02 -10.19

7 24 185.90 185.90 -6.90

8 28 156.78 156.78 7.05

9 32 127.66 127.66 22.67

0 5 10 15 20 25 30 35150

200

250

300

350

400

Checking for LinearityTire Wear Data

i ix iy ˆiy ie

x

y

42/69

Page 43: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Checking for Linearity

1 0 394.33 360.64 33.69

2 4 329.50 331.51 -2.01

3 8 302.39 302.39 -11.39

4 12 273.27 273.27 -18.10

5 16 244.15 244.15 -14.82

6 20 215.02 215.02 -10.19

7 24 185.90 185.90 -6.90

8 28 156.78 156.78 7.05

9 32 127.66 127.66 22.67

i ix iy ˆiy ie

Tire Wear Data

x

Resi

du

al

0 5 10 15 20 25 30 35-20

-10

0

10

20

30

40

43/69

Page 44: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Checking for Linearity•Data Transformation

x

y2x

y

3x

y

x log yx 1/ y

x ylog x y1/ x

2y

x 3yx

2y

x

ylog x y1/ x

y

xlog yx1/ y

x y2x

y3xy

x2yx3y

44/69

Page 45: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Checking for Constant Variance

•If the constant variance assumption is correct, the dispersion of the is approximately constant with respect to the

'ie s

ˆ 'iy s

45/69

Page 46: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Resi

du

al

e

y0 0.5 1 1.5 2 2.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

Checking for Constant Variance

Example from textbook 10.21

46/69

Page 47: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Checking for Normality

•We can use residuals to make a normal plot

-0.3 -0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.20.01

0.02

0.05

0.10

0.25

0.50

0.75

0.90

0.95

0.98

0.99

Data

Pro

babi

lity

Normal Probability Plot

Example from textbook 10.21Normal plot of residuals

47/69

Page 48: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Checking for OutliersDefinition: An outlier is an observation that does not follow the general pattern of the relationship between and

•A large residual indicates an outlier!!

y x

*

2( ) ( )11

i i ii

i i

xx

e e ee

SE e sx xs

n S

* 2ie

48/69

Page 49: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Checking for Influential ObservationsAn observation can be influential because it has an extreme x-value, an y-value, or both

•A large indicates an influential observation!!

k: # of predictors

1

ˆn

i ij jj

y h y

2( )1 i

iixx

x xh

n S

2( 1) /iih k n

iih

iih

49/69

Page 50: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Checking for Influential Observations

0 2 4 6 8 10 12 14 16 18 20

10

20

30

40

50

60

70

80

90

50/69

Page 51: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Why use Correlation analysis?

•If the nature of the relationship between X and Y is not known, we can investigate the correlation between them without making any assumptions of causality.

•In order to do this, assume (X,Y) follows the bivariate normal distribution.

51/69

Page 52: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

The Bivariate Normal Distribution

•(X,Y) has the following distribution:

52/69

Page 53: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Why can we do this?

•This assumption reduces to the probabilistic model for linear regression since the conditional distribution of Y given X=x is normal with the following parameters:

•So when X=x the mean of Y is a linear function of x and the variance is constant w.r.t. x.

53/69

Page 54: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

So what?

•Under these assumptions we can use the data available to make inferences about ρ.

•First we have to estimate ρ from the data. Define the sample correlation coefficient R:

54/69

Page 55: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

How can we use this?

•The exact distribution of R is very complicated, but we do have some options.

•Under the null Hypothesis H0:ρ0=0 the distribution of R is simplified. An exact test exists in this case.

•For arbitrary values of ρ0 we can approximate a function of R with a normal distribution thanks to R.A. Fisher.

55/69

Page 56: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Testing H0 : ρ0=0

•Under H0 the distribution of

is t(n-2). This is kind of surprising, but think

about it. The test statistic we used to test β10=0 is distributed as t(n-2) and ρ=0 if and only if β1=0. That the two test statistics are equivalent is shown on page 382-383 of the text.

56/69

Page 57: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Approximation of R

•Fisher showed that for n even as small as 10

•Now we can test H0 : ρ= ρ0 vs. H1 : ρ ≠ ρ0 for arbitrary ρ0. We just compute:

57/70

Page 58: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Almost Finished!

•We now have the tools necessary for inference on ρ. For a confidence interval for ρ compute:

and solve for:

58/69

Page 59: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Correlation - Conclusion

•When we are not sure of the relationship between X and Y assume (Xi,Yi) is an observation from a bivariate normal distribution. To test H0 : ρ= ρ0 vs H1 : ρ ≠ ρ0 at significance level α just compare :

to

But if ρ0 =0 compare to t(n-2,α)

59/69

Page 60: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

SAS - Reg Procedure

Proc Reg Data=Regression_Example;Title "Regresion Example";

Model Next = Last;Plot Next*Last;Plot Residual.*Predicted.;

Output Out=Data_From_Regression Residual=R Predicted=PV;

Run;

60/69

Page 61: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Proc Reg Output

61/70

Page 62: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Plot Next*Last

62/70

Page 63: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

SAS - Plotting Regression Line

Symbol1 Value=Dot C=blue I=R;Symbol2 Value=None C=red I=RLCLM95;Proc Gplot Data=Regression_Example;

Title "Regression Line and CIs";Plot Next*Last=1 Next*Last=2/Overlay;

Run;

63/70

Page 64: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Plotting Regression Line

64/70

Page 65: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

SAS - Checking Homoscedasticity

Proc Reg Data=Regression_Example;Title "Regresion Example";

Model Next = Last;Plot Next*Last;Plot Residual.*Predicted.;

Output Out=Data_From_Regression Residual=R Predicted=PV;

Run;

65/69

Page 66: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Predicted.*Residual.

66/69

Page 67: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

SAS - Checking Normality of Residuals

Proc Reg Data=Regression_Example;Output Out=Data_From_Regression Residual=R Predicted=PV;

Proc Univariate Data=Data_From_Regression Normal;Var R;qqplot R / Normal(Mu=est Sigma=est);

Run;

67/69

Page 68: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Checking for Normality

68/69

Page 69: Simple Linear Regression AMS 572 11/29/2010. Outline 1.Brief History and Motivation – Zhen Gong 2.Simple Linear Regression Model – Wenxiang Liu 3.Ordinary

Questions?69/69