chapter 10 simple regression ©. null hypothesis the analysis of business and economic processes...

31
Chapter 10 Chapter 10 Simple Regression Simple Regression ©

Post on 19-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 10 Simple Regression ©. Null Hypothesis The analysis of business and economic processes makes extensive use of relationships between variables

Chapter 10Chapter 10

Simple RegressionSimple Regression

©

Page 2: Chapter 10 Simple Regression ©. Null Hypothesis The analysis of business and economic processes makes extensive use of relationships between variables

Null HypothesisNull Hypothesis

The analysis of business and economic processes makes extensive use of relationships between variables.

)(XfY

Page 3: Chapter 10 Simple Regression ©. Null Hypothesis The analysis of business and economic processes makes extensive use of relationships between variables

Correlation AnalysisCorrelation Analysis

The correlation coefficientcorrelation coefficient is a quantitative measure of the strength of the linear relationship between two variables.

Page 4: Chapter 10 Simple Regression ©. Null Hypothesis The analysis of business and economic processes makes extensive use of relationships between variables

Correlation AnalysisCorrelation Analysis

The sample correlation coefficient:

yx

xy

ss

sr

1

))((

n

YyXxs iixy

where:

Page 5: Chapter 10 Simple Regression ©. Null Hypothesis The analysis of business and economic processes makes extensive use of relationships between variables

Correlation AnalysisCorrelation Analysis

The null hypothesis of no linear association:

0:0 H

)1(

)2(2r

nrt

where the random variable:

follows a Student’s t Distribution with (n-2) degrees of freedom

Page 6: Chapter 10 Simple Regression ©. Null Hypothesis The analysis of business and economic processes makes extensive use of relationships between variables

Tests for Zero Population Tests for Zero Population CorrelationCorrelation

Let r be the sample correlation coefficient, calculated from a random sample of n pairs of observation from a joint normal distribution. The following tests of the null hypothesis

have a significance value : 1. To test H0 against the alternative

the decision rule is

0:0 H

,220)r-(1

2)-(nr if HReject nt

0:1 H

Page 7: Chapter 10 Simple Regression ©. Null Hypothesis The analysis of business and economic processes makes extensive use of relationships between variables

Tests for Zero Population Tests for Zero Population CorrelationCorrelation

(continued)(continued)

2. To test H0 against the alternative

the decision rule is

,220)r-(1

2)-(nr if HReject nt

0:1 H

Page 8: Chapter 10 Simple Regression ©. Null Hypothesis The analysis of business and economic processes makes extensive use of relationships between variables

Tests for Zero Population Tests for Zero Population CorrelationCorrelation

(continued)(continued)

3. To test H0 against the two-sided alternative

the decision rule is

Here, t n-2, is the number for which

Where the random variable tn-2 follows a Student’s t distribution with (n – 2) degrees of freedom.

,22,220)r-(1

2)-(nr

)r-(1

2)-(nr if HReject nn tort

0:1 H

)( ,2, nsn ttP

Page 9: Chapter 10 Simple Regression ©. Null Hypothesis The analysis of business and economic processes makes extensive use of relationships between variables

Linear Regression ModelLinear Regression Model(Example 10.2)(Example 10.2)

Year Income (x) Retail Sales (y)1 9098 54922 9138 55403 9094 53054 9282 55075 9229 54186 9347 53207 9525 55388 9756 56929 10282 5871

10 10662 615711 11019 634212 11307 590713 11432 612414 11449 618615 11697 622416 11871 649617 12018 671818 12523 692119 12053 647120 12088 639421 12215 655522 12494 6755

Page 10: Chapter 10 Simple Regression ©. Null Hypothesis The analysis of business and economic processes makes extensive use of relationships between variables

Linear Regression ModelLinear Regression Model(Figure 10.1)(Figure 10.1)

Retail Sales per Household vs Per Capita Disposable Income

y = 0.3815x + 1922.4

R2 = 0.9192

5000

5500

6000

6500

7000

9000 9500 10000 10500 11000 11500 12000 12500

Income

Ret

ail S

ales

Page 11: Chapter 10 Simple Regression ©. Null Hypothesis The analysis of business and economic processes makes extensive use of relationships between variables

Linear Regression ModelLinear Regression Model

LINEAR REGRESSION POPULATION LINEAR REGRESSION POPULATION EQUATION MODELEQUATION MODEL

Where 0 and 1 are the population model coefficients and is a random error term.

iii xY 10

Page 12: Chapter 10 Simple Regression ©. Null Hypothesis The analysis of business and economic processes makes extensive use of relationships between variables

Linear Regression Linear Regression OutcomesOutcomes

Linear regression provides two important results:

1. Predicted values of the dependent or endogenous variable as a function of an independent or exogenous variable.

2. Estimated marginal change in the endogenous variable that results from a one unit change in the independent or exogenous variable.

Page 13: Chapter 10 Simple Regression ©. Null Hypothesis The analysis of business and economic processes makes extensive use of relationships between variables

Least Squares ProcedureLeast Squares Procedure

The Least-squares procedure obtains estimates of the linear equation coefficients b0 and b1, in the model

by minimizing the sum of the squared residuals ei

This results in a procedure stated as

Choose bChoose b00 and b and b11 so that the quantity so that the quantity

is minimized. We use differential calculus to obtain is minimized. We use differential calculus to obtain the coefficient estimators that minimize SSE.. the coefficient estimators that minimize SSE..

ii xbby 10ˆ

22 )ˆ( iii yyeSSE

210

2 ))(( iii xbbyeSSE

Page 14: Chapter 10 Simple Regression ©. Null Hypothesis The analysis of business and economic processes makes extensive use of relationships between variables

Least-Squares Derived Least-Squares Derived Coefficient EstimatorsCoefficient Estimators

The slope coefficient estimator is

And the constant or intercept indicator is

We also note that the regression line always goes through the mean X, Y.

X

Yxyn

ii

n

iii

s

sr

Xx

YyXxb

1

2

11

)(

))((

XbYb 10

Page 15: Chapter 10 Simple Regression ©. Null Hypothesis The analysis of business and economic processes makes extensive use of relationships between variables

Standard Assumptions for Standard Assumptions for the Linear Regression Modelthe Linear Regression Model

The following assumptions are used to make inferences about the population linear model by using the estimated coefficients:

1. The x’s are fixed numbers, or they are realizations of random variable, X that are independent of the error terms, i’s. In the latter case, inference is carried out conditionally on the observed values of the x’s.

2. The error terms are random variables with mean 0 and the same variance, 2. The later is called homoscedasticity or uniform variance.

3. The random error terms, I, are not correlated with one another, so that

n), 1,(ifor ][0][ 22 ii EandE

ji allfor 0][ jiE

Page 16: Chapter 10 Simple Regression ©. Null Hypothesis The analysis of business and economic processes makes extensive use of relationships between variables

Regression Analysis for Regression Analysis for Retail Sales AnalysisRetail Sales Analysis

(Figure 10.5)(Figure 10.5)

Coefficients Standard Error t Stat P-valueIntercept 1922.392694 274.9493737 6.99180605 8.74464E-07X Income 0.38151672 0.025293061 15.08384918 2.17134E-12

The regression equation is

Y Retail Sales = 1922 + 0.382 X Income

b0 b1

Page 17: Chapter 10 Simple Regression ©. Null Hypothesis The analysis of business and economic processes makes extensive use of relationships between variables

Analysis of VarianceAnalysis of VarianceThe total variability in a regression analysis, SST, can be partitioned into a component explained by the regression, SSR, and a component due to unexplained error, SSE

With the components defined as,

Total sum of squares

Error sum of squares

Regression sum of squares

SSESSRSST

n

ii YySST

1

2)(

n

ii

n

iii

n

iii eyyxbbySST

1

2

1

2

1

210 )ˆ())((

n

ii

n

ii XxbYySSR

1

221

1

2 )()ˆ(

Page 18: Chapter 10 Simple Regression ©. Null Hypothesis The analysis of business and economic processes makes extensive use of relationships between variables

Regression Analysis for Regression Analysis for Retail Sales AnalysisRetail Sales Analysis

(Figure 10.7)(Figure 10.7)

The regression equation is

Y Retail Sales = 1922 + 0.382 X Income

Analysis of Variancedf SS MS F Significance F

Regression 1 4961434.406 4961434.406 227.522506 2.17134E-12Residual 20 436126.9127 21806.34563Total 21 5397561.318

Page 19: Chapter 10 Simple Regression ©. Null Hypothesis The analysis of business and economic processes makes extensive use of relationships between variables

Coefficient of Determination, Coefficient of Determination, RR22

The Coefficient of DeterminationCoefficient of Determination for a regression equation is defined as

This quantity varies from 0 to 1 and higher values indicate a better regression. Caution should be used in making general interpretations of R2 because a high value can result from either a small SSE or a large SST or both.

SST

SSE

SST

SSRR 12

Page 20: Chapter 10 Simple Regression ©. Null Hypothesis The analysis of business and economic processes makes extensive use of relationships between variables

Correlation and RCorrelation and R22

The multiple coefficient of determination, R2, for a simple regression is equal to the simple correlation squared:

22xyrR

Page 21: Chapter 10 Simple Regression ©. Null Hypothesis The analysis of business and economic processes makes extensive use of relationships between variables

Estimation of Model Error Estimation of Model Error VarianceVariance

The quantity SSE is a measure of the total squared deviation about the estimated regression line, and ei is the residual. An estimator for the variance of the population model error is

Division by n – 2 instead of n – 1 results because the simple regression model uses two estimated parameters, b0 and b1, instead of one.

22ˆ 1

2

22

n

SSE

n

es

n

ii

e

Page 22: Chapter 10 Simple Regression ©. Null Hypothesis The analysis of business and economic processes makes extensive use of relationships between variables

Sampling Distribution of the Sampling Distribution of the Least Squares Coefficient Least Squares Coefficient

EstimatorEstimator

If the standard least squares assumptions hold, then b1 is an unbiased estimator of 1 and has a population variance

and an unbiased sample variance estimator

2

2

1

2

22

)1()(

1

Xn

ii

b snXx

2

2

1

2

22

)1()(

1

X

en

ii

eb sn

s

Xx

ss

Page 23: Chapter 10 Simple Regression ©. Null Hypothesis The analysis of business and economic processes makes extensive use of relationships between variables

Basis for Inference About the Basis for Inference About the Population Regression SlopePopulation Regression Slope

Let 1 be a population regression slope and b1 its least squares estimate based on n pairs of sample observations. Then, if the standard regression assumptions hold and it can also be assumed that the errors i are normally distributed, the random variable

is distributed as Student’s t with (n – 2) degrees of freedom. In addition the central limit theorem enables us to conclude that this result is approximately valid for a wide range of non-normal distributions and large sample sizes, n.

1

11

bs

bt

Page 24: Chapter 10 Simple Regression ©. Null Hypothesis The analysis of business and economic processes makes extensive use of relationships between variables

Excel Output for Retail Sales Excel Output for Retail Sales ModelModel

(Figure 10.9)(Figure 10.9)

Regression StatisticsMultiple R 0.958748803R Square 0.919199267Adjusted R Square 0.91515923Standard Error 147.6697181Observations 22

Analysis of Variancedf SS MS F Significance F

Regression 1 4961434.406 4961434.406 227.522506 2.17134E-12Residual 20 436126.9127 21806.34563Total 21 5397561.318

Coefficients Standard Error t Stat P-value Lower 95%Intercept 1922.392694 274.9493737 6.99180605 8.74464E-07 1348.858617X Income 0.38151672 0.025293061 15.08384918 2.17134E-12 0.328756343

The regression equation is

Y Retail Sales = 1922 + 0.382 X Income

SSR SSE SST MSR

MSE

b0 b1 sb1 tb1

se

Page 25: Chapter 10 Simple Regression ©. Null Hypothesis The analysis of business and economic processes makes extensive use of relationships between variables

Tests of the Population Regression Tests of the Population Regression SlopeSlope

If the regression errors i are normally distributed and the standard least squares assumptions hold (or if the distribution of b1 is approximately normal), the following tests have significance value :

1. To test either null hypothesis

against the alternative

the decision rule is

*110

*110 :: HorH

,2

*11

0

1

b if HReject

n

b

ts

*111 : H

Page 26: Chapter 10 Simple Regression ©. Null Hypothesis The analysis of business and economic processes makes extensive use of relationships between variables

Tests of the Population Regression Tests of the Population Regression SlopeSlope

(continued)(continued)

2. To test either null hypothesis

against the alternative

the decision rule is

*110

*110 :: HorH

,2

*11

0

1

b if HReject

n

b

ts

*111 : H

Page 27: Chapter 10 Simple Regression ©. Null Hypothesis The analysis of business and economic processes makes extensive use of relationships between variables

Tests of the Population Regression Tests of the Population Regression SlopeSlope

(continued)(continued)

3. To test the null hypothesis

Against the two-sided alternative

the decision rule is

*110 : H

2/,2

*11

2/,2

*11

0

11

bb if HReject

n

bn

b

ts

orts

*111 : H

Page 28: Chapter 10 Simple Regression ©. Null Hypothesis The analysis of business and economic processes makes extensive use of relationships between variables

Confidence Intervals for the Confidence Intervals for the Population Regression Slope Population Regression Slope 11

If the regression errors i , are normally distributed and the standard regression assumptions hold, a 100(1 - )% confidence interval for the population regression slope 1 is given by

Where t(n – 2, /2) is the number for which

And the random variable t(n – 2) follows a Student’s t distribution with (n – 2) degrees of freedom.

11 )2/,2(11)2/,2(1 bnbn stbstb

2/)( )2/,2()2( nn ttP

Page 29: Chapter 10 Simple Regression ©. Null Hypothesis The analysis of business and economic processes makes extensive use of relationships between variables

F test for Simple Regression F test for Simple Regression CoefficientCoefficient

We can test the hypothesis

against the alternative

By using the F statistic

The decision rule is

We can also show that the F statistic is

For any simple regression analysis.

0: 10 H

2,-n1,0 FF if HReject

2es

SSR

MSE

MSRF

0: 11 H

2

1btF

Page 30: Chapter 10 Simple Regression ©. Null Hypothesis The analysis of business and economic processes makes extensive use of relationships between variables

Key WordsKey Words Analysis of Variance Assumptions for the

Least Squares Coefficient Estimators

Basis for Inference About the Population Regression Slope

Coefficient of Determination, R2

Confidence Intervals for Predictions

Confidence Intervals for the Population Regression Slope b1

Correlation and R2

Estimation of Model Error Variance

F test for Simple Regression Coefficient

Least-Squares Procedure

Linear Regression Outcomes

Page 31: Chapter 10 Simple Regression ©. Null Hypothesis The analysis of business and economic processes makes extensive use of relationships between variables

Key WordsKey Words(continued)(continued)

Linear Regression Population Equation Model

Population Model Sampling Distribution of

the Least Squares Coefficient Estimator

Tests for Zero Population Correlation

Tests of the Population Regression Slope