statistics for business and economics

76
Statistics for Business and Economics Chapter 10 Simple Linear Regression

Upload: rinah-hicks

Post on 31-Dec-2015

62 views

Category:

Documents


0 download

DESCRIPTION

Statistics for Business and Economics. Chapter 10 Simple Linear Regression. Learning Objectives. Describe the Linear Regression Model State the Regression Modeling Steps Explain Least Squares Compute Regression Coefficients Explain Correlation Predict Response Variable. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Statistics for Business and Economics

Statistics for Business and Economics

Chapter 10

Simple Linear Regression

Page 2: Statistics for Business and Economics

Learning Objectives

1. Describe the Linear Regression Model

2. State the Regression Modeling Steps

3. Explain Least Squares

4. Compute Regression Coefficients

5. Explain Correlation

6. Predict Response Variable

Page 3: Statistics for Business and Economics

What is Regression?

1.1. Method of modeling Method of modeling relationships relationships between a between a responseresponse variable Y and one or more variable Y and one or more predictorspredictors X X

2.2. A way of “fitting line A way of “fitting line through data”through data”

Page 4: Statistics for Business and Economics

Example 1: Store Site Selection

Model sales Y at existing sites as a Model sales Y at existing sites as a function of demographic variables:function of demographic variables:

XX11 == Population in store Population in store vicinityvicinity

XX22 == Income in areaIncome in areaXX33 == Age of houses in areaAge of houses in areaXX44 ==XX55 ==From equation, predict sales at new sitesFrom equation, predict sales at new sites

Page 5: Statistics for Business and Economics

Example 2:Marketing Research

Model consumer response to a Model consumer response to a product on basis of product product on basis of product characteristics:characteristics:

YY == Taste score on soft Taste score on soft drinkdrink

XX11 == Sugar levelSugar levelXX22 == Carbonation levelCarbonation levelXX33 ==XX44 ==

Page 6: Statistics for Business and Economics

Example 3:Operations/Quality

Model product quality in plant as a Model product quality in plant as a function of manufacturing process function of manufacturing process characteristics:characteristics:

YY == Quality score of sheet Quality score of sheet metalmetal

XX11 == Raw material purity Raw material purity scorescore

XX22 == Molten aluminum Molten aluminum temptemp

XX33 == Line speedLine speedXX44 ==

Page 7: Statistics for Business and Economics

Example 4:Real Estate Pricing

YY == Selling price of housesSelling price of houses

XX11 == Square feetSquare feetXX22 == TaxesTaxesXX33 == Lot acreageLot acreageXX44 ==XX55 ==XX66 ==

Page 8: Statistics for Business and Economics

One-Predictor Regression

Price

050000

100000150000

200000250000

300000350000

400000

0 2000 4000 6000

Square Feet (X)

Pri

ce

(Y

in

US

$)

25 Homes Sold in Essex County, New Jersey, 199625 Homes Sold in Essex County, New Jersey, 1996

Page 9: Statistics for Business and Economics

One-Predictor Regression

Price

050000

100000150000

200000250000

300000350000

400000

0 2000 4000 6000

Square Feet (X)

Pri

ce

(Y

in

US

$)

Regression Equation: Y= -39001 + 85.0 XRegression Equation: Y= -39001 + 85.0 X

Page 10: Statistics for Business and Economics

Regression Models

• Answers ‘What is the relationship between the variables?’

• Equation used– One numerical dependent (response) variable

What is to be predicted– One or more numerical or categorical

independent (explanatory) variables

• Used mainly for prediction and estimation

Page 11: Statistics for Business and Economics

Prediction Using Regression

Price

050000

100000150000

200000250000

300000350000

400000

0 2000 4000 6000

Square Feet (X)

Pri

ce

(Y

in

US

$)

Y= -39001 + 85.0(4000)= $300,999Y= -39001 + 85.0(4000)= $300,999

Page 12: Statistics for Business and Economics

Regression Modeling Steps

1. Hypothesize deterministic component

2. Estimate unknown model parameters

3. Specify probability distribution of random error term

• Estimate standard deviation of error

4. Evaluate model

5. Use model for prediction and estimation

Page 13: Statistics for Business and Economics

Specifying the Model

1. Define variables• Conceptual (e.g., Advertising, price)• Empirical (e.g., List price, regular price) • Measurement (e.g., $, Units)

2. Hypothesize nature of relationship• Expected effects (i.e., Coefficients’ signs)• Functional form (linear or non-linear)• Interactions

Page 14: Statistics for Business and Economics

Model Specification Is Based on Theory

• Theory of field (e.g., Sociology)

• Mathematical theory

• Previous research

• ‘Common sense’

Page 15: Statistics for Business and Economics

Advertising

Sales

Advertising

Sales

Advertising

Sales

Advertising

Sales

Thinking Challenge: Which Is More Logical?

Page 16: Statistics for Business and Economics

Simple

1 ExplanatoryVariable

Types of Regression Models

RegressionModels

2+ ExplanatoryVariables

Multiple

LinearNon-

LinearLinear

Non-Linear

Page 17: Statistics for Business and Economics

y x 0 1

Linear Regression Model

Relationship between variables is a linear function

Dependent (Response) Variable

Independent (Explanatory) Variable

Population Slope

Population y-intercept

Random Error

Page 18: Statistics for Business and Economics

y

E(y) = β0 + β1x (line of means)

β0 = y-intercept

x

Change in y

Change in x

β1 = Slope

Line of Means

High school teacher

Page 19: Statistics for Business and Economics

YY XXii ii ii 00 11

Linear Regression Model

• 1. Relationship between variables is a linear function

Dependent Dependent (response) (response) variablevariable

Independent Independent (explanatory) (explanatory) variablevariable

Population Population slopeslope

Population Population Y-interceptY-intercept

Random Random errorerror

Page 20: Statistics for Business and Economics

Y

X

Y

X

Population Linear Regression Model

Y Xi i i 0 1Y Xi i i 0 1ObservedObservedvaluevalue

Observed valueObserved value

ii = Random error= Random error

E Y X iaf 0 1E Y X iaf 0 1

Page 21: Statistics for Business and Economics

Observed valueObserved value

Sample LinearRegression Model

YY

XXTrue Unknown LineTrue Unknown Line

Y Xi i 0 1 Y Xi i 0 1

Page 22: Statistics for Business and Economics

ii = = Random Random

errorerror

Observed valueObserved value

^

Sample LinearRegression Model

YY

XX

Y Xi i i 0 1Y Xi i i 0 1

Page 23: Statistics for Business and Economics

Regression Modeling Steps

1. Hypothesize deterministic component

2. Estimate unknown model parameters

3. Specify probability distribution of random error term

• Estimate standard deviation of error

4. Evaluate model

5. Use model for prediction and estimation

Page 24: Statistics for Business and Economics

Scattergram

1. Plot of all (xi, yi) pairs

2. Suggests how well model will fit

0204060

0 20 40 60

x

y

Page 25: Statistics for Business and Economics

0204060

0 20 40 60

x

y

Thinking Challenge

• How would you draw a line through the points?• How do you determine which line ‘fits best’?

Page 26: Statistics for Business and Economics

Least Squares

• ‘Best fit’ means difference between actual y values and predicted y values are a minimum

– But positive differences off-set negative

2 2

1 1

ˆ ˆn n

ii ii i

y y

• Least Squares minimizes the Sum of the Squared Differences (SSE)

Page 27: Statistics for Business and Economics

Least Squares Graphically

2

y

x

1

3

4

^^

^^

2 0 1 2 2ˆ ˆ ˆy x

0 1ˆ ˆˆi iy x

2 2 2 2 21 2 3 4

1

ˆ ˆ ˆ ˆ ˆLS minimizes n

ii

Page 28: Statistics for Business and Economics

Coefficient Equations

1 1

11 2

12

1

ˆ

n n

i ini i

i ixy i

nxx

ini

ii

x y

x ySS nSS

x

xn

Slope

y-intercept

0 1ˆ ˆy x Prediction Equation

0 1ˆ ˆy x

Page 29: Statistics for Business and Economics

Computation Table

xi2

x12

x2

2

: :

xn2

Σxi2

yi

y1

y2

:

yn

Σyi

xi

x1

x2

:

xn

Σxi

2

2

2

2

2

yi

y1

y2

:

yn

Σyi

xiyi

Σxiyi

x1y1

x2y2

xnyn

Page 30: Statistics for Business and Economics

Interpretation of Coefficients

^

^

^1. Slope (1)

• Estimated y changes by 1 for each 1unit increase

in x1. If 1 = 2, then Sales (y) is expected to increase by 2

for each 1 unit increase in Advertising (x)

2. Y-Intercept (0)

• Average value of y when x = 02. If 0 = 4, then Average Sales (y) is expected to be

4 when Advertising (x) is 0

^

^

Page 31: Statistics for Business and Economics

Least Squares Example

You’re a marketing analyst for Hasbro Toys. You gather the following data:

Ad $ Sales (Units)1 12 13 24 25 4

Find the least squares line relatingsales and advertising.

Page 32: Statistics for Business and Economics

0

1

2

3

4

0 1 2 3 4 5

Scattergram Sales vs. Advertising

Sales

Advertising

Page 33: Statistics for Business and Economics

Parameter Estimation Solution Table

2 2

1 1 1 1 1

2 1 4 1 2

3 2 9 4 6

4 2 16 4 8

5 4 25 16 20

15 10 55 26 37

xi yi xi yi xiyi

Page 34: Statistics for Business and Economics

Parameter Estimation Solution

1 1

11 2 2

12

1

15 1037

5ˆ .7015

555

n n

i ini i

i ii

n

ini

ii

x y

x yn

x

xn

ˆ .1 .7y x

0 1ˆ ˆ 2 .70 3 .10y x

Page 35: Statistics for Business and Economics

Parameter Estimates

Parameter Standard T for H0:

Variable DF Estimate Error Param=0 Prob>|T|

INTERCEP 1 -0.1000 0.6350 -0.157 0.8849

ADVERT 1 0.7000 0.1914 3.656 0.0354

Parameter Estimation Computer Output

0^

1^

ˆ .1 .7y x

Page 36: Statistics for Business and Economics

JMP “Fit Model” Results

Page 37: Statistics for Business and Economics

Coefficient Interpretation Solution

1. Slope (1)

• Sales Volume (y) is expected to increase by .7 units for each $1 increase in Advertising (x)

^

2. Y-Intercept (0)

• Average value of Sales Volume (y) is -.10 units when Advertising (x) is 0— Difficult to explain to marketing manager

— Expect some sales without advertising

^

Page 38: Statistics for Business and Economics

Regression Modeling Steps

1. Hypothesize deterministic component

2. Estimate unknown model parameters

3. Specify probability distribution of random error term

• Estimate standard deviation of error

4. Evaluate model

5. Use model for prediction and estimation

Page 39: Statistics for Business and Economics

Linear Regression Assumptions

1. Mean of probability distribution of error, ε, is 0

2. Probability distribution of error has constant variance

3. Probability distribution of error, ε, is normal

4. Errors are independent

Page 40: Statistics for Business and Economics

11XX22

XX

Error Probability Distribution

^

YY

f(f())

XX

Page 41: Statistics for Business and Economics

11XX22

XX

Error Probability Distribution

^

YY

f(f())

XX

Page 42: Statistics for Business and Economics

11XX22

XX

Error Probability Distribution

^

YY

f(f())

XX

Page 43: Statistics for Business and Economics

Estimating

Recall that:Recall that:

where is our estimator of the mean. where is our estimator of the mean.

Now substitute (1) Now substitute (1) YYii for X for Xii , (2) , , (2) , and and nn-2 df for -2 df for nn-1:-1:

s2 =(X i − X )2

n −1i =1

n

X

ˆ Y for X

s2 =(Yi − ˆ Y )2

n − 2i=1

n

Page 44: Statistics for Business and Economics

Why are df = n - 2?

1.1. Two statistics are estimated Two statistics are estimated to compute the regression line, to compute the regression line, 00 and and 11

2.2. As a result, if I know any n-2 As a result, if I know any n-2 residuals, the other two can be residuals, the other two can be computed from computed from ii = 0 and = 0 and iiXXii = 0.= 0.

^ ^

^ ^

Page 45: Statistics for Business and Economics

Regression Modeling Steps

1. Hypothesize deterministic component

2. Estimate unknown model parameters

3. Specify probability distribution of random error term

• Estimate standard deviation of error

4. Evaluate model

5. Use model for prediction and estimation

Page 46: Statistics for Business and Economics

Test of Slope Coefficient

• Shows if there is a linear relationship between x and y

• Involves population slope 1

• Hypotheses – H0: 1 = 0 (No Linear Relationship)

– Ha: 1 0 (Linear Relationship)

• Theoretical basis is sampling distribution of slope

Page 47: Statistics for Business and Economics

y

Population Linex

Sample 1 Line

Sample 2 Line

Sampling Distribution of Sample Slopes

1

Sampling Distribution

11

11SS

^

^

All Possible Sample Slopes

Sample 1: 2.5

Sample 2: 1.6

Sample 3: 1.8

Sample 4: 2.1 : :

Very large number of sample slopes

Page 48: Statistics for Business and Economics

Slope Coefficient Test Statistic

1

1 1

ˆ

2

12

1

ˆ ˆ2

wherexx

n

ini

xx ii

t df nsS

SS

x

SS xn

Page 49: Statistics for Business and Economics

Test of Slope Coefficient Example

You’re a marketing analyst for Hasbro Toys.

You find β0 = –.1, β1 = .7 and s = .6055.Ad $ Sales (Units)

1 12 13 24 25 4

Is the relationship significant at the .05 level of significance?

^^

Page 50: Statistics for Business and Economics

Test StatisticSolution

1

1

ˆ 2

1

ˆ

.6055.1914

1555

5

ˆ .703.657

.1914

xx

SS

SS

tS

Page 51: Statistics for Business and Economics

JMPFit Model Results

t = 1 / S

P-Value

S

1

1

1

^^^

^

Page 52: Statistics for Business and Economics

Computing a CI for a Slope

A (1 - )100% confidence interval for the true slope parameter 1 is given by:

This assumes that all of the standard regression assumptions are met!

Example from previous slide: 6123.7000.0

1914.1824.37000.0

1914.)3;025(.7000.0

t

121 )2;(ˆ

snt

Page 53: Statistics for Business and Economics

Proportion of variation ‘explained’ by relationship between x and y

Coefficient of Determination

2 Explained Variation

Total Variationyy

yy

SS SSEr

SS

0 r2 1

r2 = (coefficient of correlation)2

Page 54: Statistics for Business and Economics

Y

X

Y

X

Y

X

Coefficient of Determination Examples

Y

X

r2 = 1 r2 = 1

r2 = .8 r2 = 0

Page 55: Statistics for Business and Economics

Coefficient of Determination Example

You’re a marketing analyst for Hasbro Toys.

You know r = .904. Ad $ Sales (Units)

1 12 13 24 25 4

Calculate and interpret thecoefficient of determination.

Page 56: Statistics for Business and Economics

Coefficient of Determination Solution

r2 = (coefficient of correlation)2

r2 = (.904)2

r2 = .817

Interpretation: About 81.7% of the sample variation in Sales (y) can be explained by using Ad $ (x) to predict Sales (y) in the linear model.

Page 57: Statistics for Business and Economics

r2 Computer Output

Root MSE 0.60553 R-square 0.8167

Dep Mean 2.00000 Adj R-sq 0.7556

C.V. 30.27650

r2 adjusted for number of explanatory variables & sample size

r2

Page 58: Statistics for Business and Economics

JMPFit Model Results

r2

r2 adjusted for number of explanatory variables & sample size

Page 59: Statistics for Business and Economics

Correlation Models

1. Answer ‘How strong is the linear relationship between 2 variables?’

2. Coefficient of correlation usedPopulation coefficient denoted (rho)

Values range from -1 to +1

Measures degree of association

3. Used mainly for understanding

Page 60: Statistics for Business and Economics

1.Pearson product moment coefficient of correlation, r

Sample Coefficient of Correlation

r = (sign of slope) × R2

=(X i − X )(Yi −Y )∑

(X i − X )2 (Yi −Y )2∑∑

Page 61: Statistics for Business and Economics

Coefficient of Correlation Values

-1.0-1.0 +1.0+1.000

Perfect Perfect Positive Positive

CorrelationCorrelation

-.5-.5 +.5+.5

Perfect Perfect Negative Negative

CorrelationCorrelationNo No

CorrelationCorrelation

Page 62: Statistics for Business and Economics

Coefficient of Correlation Examples

Y

X

Y

X

Y

X

Y

X

r = 1 r = -1

r = .89 r = 0

Page 63: Statistics for Business and Economics

Regression Modeling Steps

1. Hypothesize deterministic component

2. Estimate unknown model parameters

3. Specify probability distribution of random error term

• Estimate standard deviation of error

4. Evaluate model

5. Use model for prediction and estimation

Page 64: Statistics for Business and Economics

Prediction With Regression Models

• Types of predictions– Point estimates– Interval estimates

• What is predicted– Population mean response E(y) for given x

Point on population regression line

– Individual response (yi) for given x

Page 65: Statistics for Business and Economics

What Is Predicted

Mean y, E(y)

y

^yIndividual

Prediction, y

E(y) = x

^x

xP

^ ^y i =

x

Page 66: Statistics for Business and Economics

Confidence Interval Estimate for Mean Value of y at x = xp

ˆ y ± (tα / 2)(s)1

n+

X p − X ( )2

SSxx

df = n – 2

Page 67: Statistics for Business and Economics

Factors Affecting Interval Width

1. Level of confidence (1 – )• Width increases as confidence increases

2. Data dispersion (s)• Width increases as variation increases

3. Sample size• Width decreases as sample size increases

4. Distance of xp from meanx1. Width increases as distance increases

Page 68: Statistics for Business and Economics

Confidence Interval Estimate Example

You’re a marketing analyst for Hasbro Toys.

You find β0 = -.1, β 1 = .7 and s = .6055.

Ad $ Sales (Units)1 12 13 24 25 4

Find a 95% confidence interval forthe mean sales when advertising is $4.

^^

Page 69: Statistics for Business and Economics

Confidence Interval Estimate Solution

2

/ 2

2

ˆ .1 .7 4 2.7

4 312.7 3.182 .6055

5 10

1.645 ( ) 3.755

p

xx

x xy t s

n SS

y

E Y

x to be predicted

Page 70: Statistics for Business and Economics

Prediction Interval of Individual Value of y at x = xp

Note!

2

/ 2

1ˆ 1

p

xx

x xy t S

n SS

df = n – 2

Page 71: Statistics for Business and Economics

Why the Extra ‘S’?

Expected(Mean) y

yy we're trying topredict

Prediction, y

xxp

E(y) = x

^^

^y i =

x i

Page 72: Statistics for Business and Economics

Prediction Interval Solution

2

/ 2

2

4

1ˆ 1

ˆ .1 .7 4 2.7

4 312.7 3.182 .6055 1

5 10

.503 4.897

p

xx

x xy t s

n SS

y

y

x to be predicted

Page 73: Statistics for Business and Economics

Dep Var Pred Std Err Low95% Upp95% Low95% Upp95%

Obs SALES Value Predict Mean Mean Predict Predict

1 1.000 0.600 0.469 -0.892 2.092 -1.837 3.037

2 1.000 1.300 0.332 0.244 2.355 -0.897 3.497

3 2.000 2.000 0.271 1.138 2.861 -0.111 4.111

4 2.000 2.700 0.332 1.644 3.755 0.502 4.897

5 4.000 3.400 0.469 1.907 4.892 0.962 5.837

Interval Estimate Computer Output

Predicted y when x = 4

Confidence Interval

SYPredictionInterval

Page 74: Statistics for Business and Economics

Confidence Intervals v. Prediction Intervals

x

y

x

^^ ^

y i =

x i

Page 75: Statistics for Business and Economics

Confidence Intervals vs. Prediction Intervals in JMP

Page 76: Statistics for Business and Economics

Conclusion

1. Described the Linear Regression Model

2. Stated the Regression Modeling Steps

3. Explained Least Squares

4. Computed Regression Coefficients

5. Used the model for prediction and estimation

6. Evaluated model on the basis of significance, Rsquare, size of prediction intervals

7. Discussed Rsquare and correlation coefficient