welcome to chapter 13 mba 541

37
13-1 Welcome to Chapter 13 MBA 541 BENEDICTINE UNIVERSITY Regression and Correlation Linear Regression and Correlation Chapter 13

Upload: howard-stevenson

Post on 04-Jan-2016

39 views

Category:

Documents


3 download

DESCRIPTION

Welcome to Chapter 13 MBA 541. B ENEDICTINE U NIVERSITY Regression and Correlation Linear Regression and Correlation Chapter 13. Chapter 13. Please, Read Chapter 13 in Lind before viewing this presentation. Statistical Techniques in Business & Economics Lind. Goals. - PowerPoint PPT Presentation

TRANSCRIPT

13-1

Welcome to Chapter 13 MBA 541

BENEDICTINE UNIVERSITY

• Regression and Correlation

• Linear Regression and Correlation

• Chapter 13

13-2

Chapter 13

Please, Read Chapter 13 in

Lind before viewing this presentation.

StatisticalTechniques in

Business &Economics

Lind

13-3

Goals

When you have completed this chapter, you will be able to:

• ONE– Draw a scatter diagram.

• TWO– Understand and interpret the terms dependent variable

and independent variable.• THREE

– Calculate and interpret the coefficient of correlation, the coefficient of determination, and the standard error of estimate.

13-4

Goals

When you have completed this chapter, you will be able to:

• FOUR– Conduct a test of hypothesis to determine whether the

coefficient of correlation in the population is zero.• FIVE

– Calculate the least squares regression line and interpret the slope and intercept values.

• SIX– Construct and interpret confidence and prediction

intervals for the dependent variable.• SEVEN

– Set up and interpret an ANOVA table.

13-5

Correlation Analysis

• Correlation Analysis is a group of statistical techniques to measure the association between two variables.

• A Scatter Diagram is a chart that portrays the relationship between two variables.

• The Dependent Variable is the variable that is being predicted or estimated.

• The Independent Variable is the variable that provides the basis for estimation. It is the predictor variable.

13-6

Coefficient of Correlation

• The Coefficient of Correlation, r, is a measure of the strength of the relationship between two variables.– Also called Pearson’s r and Pearson’s product moment

correlation coefficient.– It requires interval or ratio-scaled data.– It can range from -1.00 to +1.00.– Values of -1.00 or +1.00 indicate perfect and strong

correlation.– Negative values indicate an inverse relationship and

positive values indicate a direct relationship.– Values close to 0.0 indicate weak correlation.

13-7

Perfect Negative Correlation

0 1 2 3 4 5 6 7 8 9 10

10 9 8 7 6 5 4 3 2 1 0

Y

X

13-8

Perfect Positive Correlation

0 1 2 3 4 5 6 7 8 9 10

10 9 8 7 6 5 4 3 2 1 0

X

Y

13-9

Zero Correlation

0 1 2 3 4 5 6 7 8 9 10

10 9 8 7 6 5 4 3 2 1 0

X

Y

13-10

Strong Positive Correlation

0 1 2 3 4 5 6 7 8 9 10

10 9 8 7 6 5 4 3 2 1 0

X

Y

13-11

Formula for r

• We calculate the coefficient of correlation from the following formula.

x y

X X Y Yr

n 1 s s

13-12

Coefficient of Determination

• The Coefficient of Determination, r², is the proportion of the total variation in the dependent variable, Y, that is explained or accounted for by the variation in the independent variable, X.

– It is the square of the coefficient of correlation.

– It ranges from 0 to 1.

– It does not give any information on the direction of the relationship between the variables.

13-13

Example 1

• Dan Ireland, the student body president at Toledo State University, is concerned about the cost to students of textbooks.

• He believes that there is a relationship between the number of pages in the text and the selling price of the book.

• To provide insight into the problem he selects a sample of eight textbooks currently on sale in the bookstore.

• Draw a scatter diagram.• Compute the correlation coefficient.

13-14

Example 1 (Continued)

Book Page Price ($)

Introduction to History 500 84

Basic Algebra 700 75

Introduction to Psychology 800 99

Introduction to Sociology 600 72

Business Management 400 69

Introduction to Biology 600 81

Fundamentals of Jazz 600 63

Principles of Nursing 800 93

13-15

Example 1 (Continued)

Scatter Diagram of Number of Pages and Selling Price of Text

5060708090

100110

300 400 500 600 700 800 900

Page

Pri

ce (

$)

13-16

Example 1 (Continued)

Page Price

Mean 625 Mean 79.50Standard Error 49 Standard Error 4.32Median 600 Median 78Mode 600 Mode #N/AStandard Deviation 139 Standard Deviation 12.21Sample Variance 19,286 Sample Variance 149.14Kurtosis -0.55 Kurtosis -0.77Skewness -0.16 Skewness 0.40Range 400 Range 36Minimum 400 Minimum 63Maximum 800 Maximum 99Sum 5,000 Sum 636Count 8 Count 8

Output from Excel

13-17

Example 1 (Continued)

X X Y Y (a)

Page(b)

Price(c)

Page-Mean(page)(c)

Page-Mean(page)(c)*(d)

500 84 -125 4.5 -562.5

700 75 75 -4.5 -337.5

800 99 175 19.5 3,412.5

600 72 -25 -7.5 187.5

400 69 -225 -10.5 2,362.5

600 81 -25 1.5 -37.5

600 63 -25 -16.5 412.5

800 93 175 13.5 2,362.5

TOTAL = 7,800.0

13-18

Example 1 (Continued)

• The correlation between the number of pages and the selling price of the book is 0.657. This indicates a moderate association between the variables.

x y

X X Y Yr

n 1 s s

7800

r 0.6578 1 139 12.21

22r 0.657 0.431

13-19

Significance of r• Did a computed r come from a population of paired

observations with zero correlation?

H0: ρ = 0 (The correlation in the population is zero.) H1: ρ ≠ 0 (The correlation in the population is different

from zero.)

• Use t test for the coefficient of correlation,

with n-2 for the degrees of freedom.

2

r n 2t

1 r

13-20

Example 1 (Continued)

• Based on the computed r of 0.657, test the hypothesis that there is no correlation in the population. Use a 0.02 significance level.

• Step 1: State the null and alternate hypotheses.H0: ρ = 0 (The correlation in the population is zero.)

H1: ρ ≠ 0 (The correlation in the population is different from zero.)

• Step 2: State the level of significance.The 0.02 significance level is stated in the problem.

• Step 3: Find the appropriate test statistic.The test statistic is the t distribution.

• Step 4: State the decision rule.H0 is rejected if t > 3.143 or if t < -3.143 or if the p-value

is less than 0.02. There are (n-2) = (8-2) = 6 levels of freedom.

13-21

Example 1 (Continued)• Step 5: Compute the value of the test statistic and make a

decision.

• H0 is not rejected. We cannot reject the hypothesis that there is no correlation in the population.

• The amount of association could be due to chance.

2 2

0.657 8 2r n 2t 2.135

1 r 1 0.657

p t 2.135 0.077

13-22

Regression Analysis

• In Regression Analysis, we use the independent variable, X, to estimate the value of the dependent variable, Y.

– In Linear Regression Analysis, the relationship between the variables is linear.

– Both variables must be at least interval scale.

– The least squares criterion is used to determine the equation. That is, the term Σ(Y-Y’)² is minimized.

13-23

Regression Analysis

The regression equation is:

where,Y’ (read Y prime) is the predicted value of the Y variable for a selected X value.a is the Y-intercept. It is the estimated y value when X = 0.b is the slope of the line, or the average change in Y’ for each change of one unit in X.The least squares principle is used to obtain the values of a and b.

Y ' a bX

13-24

Regression Analysis

• The least squares principle is used to obtain the values of a and b.

• The equations to determine a and b are:

Y

X

sb r

s

a Y bX

13-25

Example 1 (Revisited)

• Develop a regression equation for the information given in Example 1 that can be used to estimate the selling price based on the number of pages.

Y

X

s 12.21b r 0.657 0.0577

s 139

a Y bX 79.5 0.0577 * 625 43.44

13-26

Example 1 (Revisited)

• The regression equation is:

• The slope of the line is 0.0577. Each additional page costs about a nickel.

• The equation crosses the Y-axis at $43.44. (A book with no pages would cost $43.44.)

• The sign of the b value and the sign of r will always be the same.

Y ' a bX 43.44 0.0577 X

13-27

Example 1 (Revisited)

• We can use the regression equation to estimate values of Y.

• The estimated selling price of an 800 page book is $89.60, found by the following equations.

Pr ice Y ' 43.44 0.0577 Number of Pages

Pr ice Y ' 43.44 0.0577 800 $89.60

13-28

Standard Error of Estimate

• The Standard Error of Estimate measures the scatter, or dispersion, of the observed values around the line of regression.

• The formula that is used to compute the standard error follows.

2

Y X

Y Y 's

n 2

13-29

Example 1 (Revisited)

• Find the standard error of estimate for the problem involving the number of pages in a book and the selling price.

2

Y X

Y Y 's

n 2

Actual

price(Y)

EstimatedPrice(Y’)

Deviation(Y-Y’)

DeviationSquared(Y-Y’)²

84 72.28 11.72 137.41

75 83.83 -8.83 78.03

99 89.61 9.39 88.15

72 78.06 -6.06 36.67

69 66.50 2.50 6.25

81 78.05 2.94 8.67

63 78.06 -15.06 226.67

93 89.61 3.39 11.48

0.00 593.33

Y X

593.33s

8 2

Y Xs 9.944

13-30

Assumptions UnderlyingLinear Regression

• For each value of X, there is a group of Y values, and these Y values are normally distributed.

• The Y values are statistically independent. This means that in the selection of a sample, the Y values chosen for a particular X value do not depend on the Y values for any other X values.

• The means of these normal distributions of Y values all lie on the straight line of regression.

• The standard deviations of these normal distributions are the same.

13-31

Confidence Interval

The confidence interval for the mean value of Y for a given value of X is given by:

where,Y’ is the predicted value for any selected X value,X is a selected value from the data set,

is the mean of the X’s,n is the number of pairs of observations,sY•X is the standard error of the estimate, and

t is the value of t at n-2 degrees of freedom.

2

Y X 2

X X1C.I. Y ' t s

n X X

X

13-32

Example 1 (Revisited)

• Find the confidence interval for the earlier price estimate of $89.60 for an 800 page book, assuming a desired 95% confidence.

Page-Mean(page) (Page-Mean(page))²

-125 15,625

75 5,625

175 30,625

-25 625

-225 50,625

-25 625

-25 625

175 30,625

135,000

13-33

Example 1 (Revisited)• Continuing the calculations to find the confidence interval.

Y’ = $89.60 X = 800 = 625 n = 8

sY•X = 9.944 t = 2.447 at (8-2) = 6 degrees of freedom.

2800 6251

C.I. 89.60 2.447 9.9448 135,000

X

2

Y X 2

X X1C.I. Y ' t s

n X X

C.I. 89.60 14.43

13-34

Prediction Interval

The prediction interval for the range of values of Y for a given value of X is given by:

For the previous example:

2

Y X 2

X X1P.I. Y ' t s 1

n X X

2800 6251

P.I. 89.60 2.447 9.944 18 135,000

P.I. 89.60 28.29

13-35

Example 1 (Revisited)• Summarizing the Results:

– The estimated selling price for a book with 800 pages is $89.60.

– The standard error of estimate is $9.94.

– The 95% confidence interval for all books with 800 pages is $89.60±$14.43. This means that the limits are between $75.17 and $104.03.

– The 95% prediction interval for a particular book with 800 pages is $89.60±28.29. This means that the limits are between $61.31 and $117.89.

– These results appear in the following Minitab and Excel outputs.

13-36

Example 1 (Revisited)Regression AnalysisThe regression equation isPrice = 43.4 + 0.0578 No of Pages Predictor Coef StDev T PConstant 43.39 17.28 2.51 0.046No of Pages 0.05778 0.02706 2.13 0.077 S = 9.944 R-Sq = 43.2% R-Sq(adj) = 33.7% Analysis of VarianceSource DF SS MS F PRegression 1 450.67 450.67 4.56 0.077Error 6 593.33 98.89Total 7 1044.00  Fit StDev Fit 95.0% CI 95.0% PI 89.61 5.90 ( 75.17, 104.05) ( 61.31, 117.91)

13-37

Example 1 (Revisited)Regression Statistics

Multiple R 0.657

R Square 0.432

Adjusted R Square 0.337

Standard Error 9.944

Observations 8

ANOVA

df SS MS F

Significance F

Regression 1 450.67 450.67 4.5573034 0.0767

Residual 6 593.33 98.89

Total 7 1044

Coefficients

Standard Error t Stat P-value

Intercept 43.3889 17.277 2.511 0.0458193Page 0.0578 0.027 2.135 0.0767009