simple linear regression

16
Simple Linear Regression Lecture for Statistics 509 November-December 2000

Upload: christopher-moss

Post on 31-Dec-2015

48 views

Category:

Documents


0 download

DESCRIPTION

Simple Linear Regression. Lecture for Statistics 509 November-December 2000. Correlation and Regression. Study of association and/or relationship between variables. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Simple Linear Regression

Simple Linear Regression

Lecture for Statistics 509

November-December 2000

Page 2: Simple Linear Regression

Week of 11/27/2000 Stat 509 - Regression Lecture 2

Correlation and Regression

• Study of association and/or relationship between variables.

• Useful for determining the effect of changes in one variable (called the independent or control variable) on another variable (called the dependent or response variable).

• Regression models could be utilized to determine optimal operating conditions [these conditions specified by the control variables] in order to achieve a certain specified value or yield on the response variable.

• Regression models could also be utilized to predict the value of the response given a value of the independent variable, or could be used for “calibrating” the value of the independent variable to achieve a certain response.

Page 3: Simple Linear Regression

Week of 11/27/2000 Stat 509 - Regression Lecture 3

Some Examples• Control variable is X = Average Speed of a Car and response variable

is Y=Fuel Efficiency of the Car. Goal is to determine speed to optimize the efficiency of the car.

• Control variable is X = Temperature, while the response variable is Y = Yield in a chemical reaction.

• Control variable is X = amount of fertilizer applied on a plant, while the response variable is Y = yield of this plant.

• Control variable is X = thickness of a stack of bond paper, while the response variable is Y = number of sheets in this stack.

• Control variable is X = average time of studying, while the response variable is Y = GPA.

Page 4: Simple Linear Regression

Week of 11/27/2000 Stat 509 - Regression Lecture 4

Population Model• Each member of the population will have a value for the independent

variable X and the response variable Y, usually represented by the vector (X,Y).

• For a given value X = x, the variable Y has a certain distribution whose conditional mean is (x) and whose conditional variance is 2(x).

• This could be visualized as follows: When you consider the subpopulation consisting of units whose values of X equal x, then their Y-values has a certain distribution whose mean is (x) and whose variance is 2(x). When you pick a unit from this subpopulation, then the Y-value that you will observe is governed by this particular distribution. In particular, this observation could be expressed via

• Y = (x) + , where e is some “error term.”

Page 5: Simple Linear Regression

Week of 11/27/2000 Stat 509 - Regression Lecture 5

Assumptions for Simple Linear Regression

• Assumptions for Simple Linear Regression (x) = E(Y|X=x) = + x. This means that the mean of Y, given X =

x, is a linear function of x. is called the regression coefficient or the slope of the regression line;

is the y-intercept. 2(x) = does not depend on x. This is the assumption of “equal

variances” or homoscedasticity.

• Furthermore, for the sample data (x1, Y1), (x2, Y2), …, (xn, Yn):

• Y1, Y2, …, Yn are independent observations, and their conditional distributions are all normal.

• In shorthand notation:

• Yi = (xi) + i = + xi + i, i=1,2,…,n, where 1, 2, …, n are independent and identically distributed (IID) N(0,2).

Page 6: Simple Linear Regression

Week of 11/27/2000 Stat 509 - Regression Lecture 6

Regression Problem• Given the sample (bivariate) data (x1, Y1), (x2, Y2), …, (xn, Yn),

satisfying the linear regression model

• Yi = + xi + i with 1, 2, …, n IID N(0, 2)

• we would like to address the following questions:

• How should the data be summarized graphically?• What are the estimators of the parameters , , and 2?• What will be an estimate of the prediction line?• What are the properties of the estimators of the model parameters?• How do we test whether the fitted regression model is a significant

model? • How do we construct CIs or test hypotheses concerning parameters?• How do we perform prediction using the prediction model?

Page 7: Simple Linear Regression

Week of 11/27/2000 Stat 509 - Regression Lecture 7

Illustrative Example: On Plasma Etching

• Plasma etching is essential to the fine-line pattern transfer in current semiconductor processes. The paper “Ion Beam-Assisted Etching of Aluminum with Chlorine” in J. Electrochem. Soc. (1985) gives the data below on chlorine flow (x, in SCCM) through a nozzle used in the etching mechanism, and etch rate (y, in 100A/min)

x 1.5 1.5 2.0 2.5 2.5 3.0 3.5 3.5 4.0y 23.0 24.5 25.0 30.0 33.5 40.0 40.5 47.0 49.0

Page 8: Simple Linear Regression

Week of 11/27/2000 Stat 509 - Regression Lecture 8

The Scatterplot

2 3 4

20

30

40

50

ChlorineFlow

Etc

hRat

e

Scatterplot of Chlorine Flow and Etch Rate

Page 9: Simple Linear Regression

Week of 11/27/2000 Stat 509 - Regression Lecture 9

Least-Squares Prediction Line

T h e l e a s t - s q u a r e s ( L S ) p r i n c i p l e t o f i t t i n g t h e r e g r e s s i o n l i n e t o t h es c a t t e r p l o t s t a t e s t h a t t h e b e s t f i t t i n g l i n e

bxaY ˆ

i s s u c h t h a t t h e c o e f f i c i e n t s a a n d b w i l l p r o v i d e t h e s m a l l e s t p o s s i b l ev a l u e t o t h e s u m o f s q u a r e d d e v i a t i o n s b e t w e e n t h e o b s e r v e d Y - v a l u e s a n dt h e i r a s s o c i a t e d p r e d i c t e d v a l u e s . T h e p r e d i c t e d v a l u e s a r e

,,...,2,1,ˆ nibxaY ii

s o t h e q u a n t i t y t h a t n e e d s t o b e m i n i m i z e d i s g i v e n b y :

.)(ˆ),(2

1

2

1

n

iii

n

iii bxaYYYbaQ

U s i n g m i n i m i z a t i o n t e c h n i q u e s f r o m C a l c u l u s , t h e c o e f f i c i e n t s t h a t w i l lp r o v i d e t h e m i n i m u m v a l u e f o r Q ( a , b ) a r e g i v e n i n t h e n e x t s l i d e .

Page 10: Simple Linear Regression

Week of 11/27/2000 Stat 509 - Regression Lecture 10

nibXaYYYR

bXaY

XbYa

SXX

SXYb

SYYSXX

SXYr

YXnYXYYXXSXY

YnYYYSYY

XnXXXSXX

iiiii

i

n

iii

n

ii

n

ii

n

ii

n

ii

n

ii

,...,2,1 ),(ˆ

LinePredictionˆ

ofEstimator

ofEstimator

tCoefficien nCorrelatio Sample))((

))((

)(

)(

gression Linear ReSimplefor Formulas

11

2

1

22

1

2

1

22

1

Page 11: Simple Linear Regression

Week of 11/27/2000 Stat 509 - Regression Lecture 11

SYY

SSRR

MSE

nSSE

SSR

MSE

MSRF

n

SSEMSES

SSESSRSSY

YYSSR

YYRSSE

c

n

ii

n

iii

n

ii

2

2

2

2

1

2

11

2

tion Determinaoft Coefficien

ofestimator unbiased an is

)2/(

1/2

)ˆ(

)ˆ(

Page 12: Simple Linear Regression

Week of 11/27/2000 Stat 509 - Regression Lecture 12

Analysis of Variance TableSource ofVariation

Degrees-of-Freedom

Sum ofSquares

MeanSquares

F-Value

Regression 1 SSR MSRError n-2 SSE MSE

MSR/MSE

Total n-1 SYY

To test the null hypothesis H0: =0, compare the F-value (MSR/MSE) to the tabular value obtainedfrom the F-distribution with degrees-of-freedom(1,n-2). If the F-value is larger, then the nullhypothesis is rejected, and it is concluded that theregression model is significant (at the prespecifiedlevel of significance).

Page 13: Simple Linear Regression

Week of 11/27/2000 Stat 509 - Regression Lecture 13

SXX

Xx

nMSEtbxa

xX

SXX

Xx

nMSEtbxa

x

SXX

Xx

nMSExY

SXX

X

nMSEa

SXX

MSEb

n

n

20

2/;20

0

20

2/;20

0

20

02

2

2

2

)(11)()(

:at Interval Prediction

)(1)()(

:)(for Interval Confidence

)(1)()](ˆ[ˆ

1)()(ˆ

)(ˆ

Intervals Confidence and ErrorsStandard

Page 14: Simple Linear Regression

Week of 11/27/2000 Stat 509 - Regression Lecture 14

Excel Worksheet for Regression Computations

X=ChlorineFlow Y=EtchRate X̂ 2 Y 2̂ XY1.5 23 2.25 529 34.51.5 24.5 2.25 600.25 36.75

2 25 4 625 502.5 30 6.25 900 752.5 33.5 6.25 1122.25 83.75

3 40 9 1600 1203.5 40.5 12.25 1640.25 141.753.5 47 12.25 2209 164.5

4 49 16 2401 196

24 312.5 70.5 11626.75 902.25SumX SumY SumX2 SumY2 SumXY

SXX 6.5 b 10.60256SYY 776.055556 a 6.448718SXY 68.9166667 MSE 6.480311

Page 15: Simple Linear Regression

Week of 11/27/2000 Stat 509 - Regression Lecture 15

Regression Analysis from Minitab• The regression equation is: y = 6.45 + 10.6 x

• Predictor Coef StDev T P• Constant 6.449 2.795 2.31 0.054• x 10.6026 0.9985 10.62 0.000

• S = 2.546 R-Sq = 94.2% R-Sq(adj) = 93.3%

• Analysis of Variance

• Source DF SS MS F P• Regression 1 730.69 730.69 112.76 0.000• Residual Error 7 45.36 6.48• Total 8 776.06

Page 16: Simple Linear Regression

Week of 11/27/2000 Stat 509 - Regression Lecture 16

Fitted Line in Scatterplot with Bands

2 3 4

15

25

35

45

55

ChlorineFlow

Etc

hRat

eY = 6.44872 + 10.6026X

R-Sq = 94.2 %

Regression

95% CI

95% PI

Regression Analysis of the Plasma Etching Data