multiple regression. simple regression in detail y i = β o + β 1 x i + ε i where y =>...

Post on 13-Dec-2015

223 Views

Category:

Documents

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Multiple Regression

Simple Regression in detail

Yi = βo + β1 xi + εi

Where

• Y =>Dependent variable

• X =>Independent variable

• βo =>Model parameter– Mean value of dependent variable (Y) when the

independent variable (X) is zero

Simple Regression in detail

• Β1 => Model parameter

- Slope that measures change in mean value of dependent variable associated with a one-unit increase in the independent variable

• εi =>

- Error term that describes the effects on Yi of all factors other than value of Xi

Assumptions of the Regression Model

• Error term is normally distributed (normality assumption)

• Mean of error term is zero (E{εi} = 0)

• Variance of error term is a constant and is independent of the values of X (constant variance assumption)

• Error terms are independent of each other (independent assumption)

• Values of the independent variable X is fixed – No error in X values.

Estimating the Model Parameters

• Calculate point estimate bo and b1 of unknown parameter βo and β1

• Obtain random sample and use this information from sample to estimate βo and β1

• Obtain a line of best "fit" for sample data points - least squares line

= bo + b1 Xi

Where is the predicted value of Y

iY

iY

Values of Least Squares Estimates bo and b1

b1 = n xiyi - (xi)(yi)

n xi2 - (xi)2

bo = y - bi x

Where

y = yi ; x = xi

n n

• bo and b1 vary from sample to sample. Variation is given by their Standard Errors Sbo and Sb1

Example 1

• To see relationship between Advertising and Store Traffic

• Store Traffic is the dependent variable and Advertising is the independent variable

• We find using the formulae that bo=148.64 and b1 =1.54

• Are bo and b1 significant?

• What is Store Traffic when Advertising is 600?

Example 2 • Consider the following data

• Using formulae we find that b0 = -2.55 and b1 = 1.05

Sales (X) Advertising(Y)

3 7

8 13

17 13

4 11

15 16

7 6

Example 2

Therefore the regression model would be

Ŷ = -2.55 + 1.05 Xi

r2 = (0.74)2 = 0.54 (Variance in sales (Y) explained by ad (X))

Assume that the Sbo(Standard error of b0) = 0.51 and

Sb1 = 0.26 at = 0.5, df = 4,

Is bo significant? Is b1 significant?

Idea behind Estimation: Residuals• Difference between the actual and predicted values are

called Residuals

• Estimate of the error in the population

ei = yi - yi

= yi - (bo + b1 xi)

Quantities in hats are predicted quantities

• bo and b1 minimize the residual or error sums of squares (SSE)

SSE = ei2 = ((yi - yi)2

= Σ [yi-(bo + b1xi)]2

Testing the Significance of the Independent Variables

• Null Hypothesis• There is no linear relationship between the

independent & dependent variables

• Alternative Hypothesis• There is a linear relationship between the

independent & dependent variables

Testing the Significance of the Independent Variables

• Test Statistic

t = b1 - β1

sb1

• Degrees of Freedom

v = n - 2

• Testing for a Type II Error

H0: β1 = 0

H1: β1 0

• Decision Rule

Reject H0: β1 = 0 if α > p value

Significance Test for Store Traffic Example

• Null hypothesis, Ho: β1=0

• Alternative hypothesis, HA: β1 0

• The test statistic is t = = =7.33

• With as 0.5 and with Degree of Freedom v = n-2 =18, the value of t from the table is 2.10

• Since , we reject the null hypothesis of no linear relationship. Therefore Advertising affects Store Traffic

21.

054.1

1

11

bs

b

tablecalc tt

Predicting the Dependent Variable

• How well does the model yi = bo + bixi predict?

• Error of prediction without indep var is yi - yi

• Error of prediction with indep var is yi- yi

• Thus, by using indep var the error in prediction reduces by (yi – yi)-(yi- yi)= (yi – yi)

• It can be shown that

(yi - y)2 = ( yi - y)2 + (yi - yi)2

Predicting the Dependent Variable

• Total variation (SST)= Explained variation (SSM) + Unexplained variation (SSE)

• A measure of the model’s ability to predict is the Coefficient of Determination (r2)

r2 = =

• For our example, r2 =0.74, i.e, 74% of variation in Y

is accounted for by X• r2 is the square of the correlation between X and Y

SST

SSE-SST

SST

SSM

Multiple Regression

• Used when more than one indep variable affects dependent variable

• General model

WhereY: Dependent variable

: Independent variables: Coefficients of the n indep variables

: A constant (Intercept)

nn XXY ...110

nXXX ...,, , 21

n ...,, , 21

0

Issues in Multiple Regression

• Which variables to include• Is relationship between dep variables and each of

the indep variables linear?• Is dep variable normally distributed for all values

of the indep variables?• Are each of the indep variables normally

distributed (without regard to dep var)• Are there interaction variables?• Are indep variables themselves highly correlated?

Example 3

• Cataloger believes that age (AGE) and income (INCOME) can predict amount spent in last 6 months (DOLLSPENT)

• The regression equation isDOLLSPENT = 351.29 - 0.65 INCOME

+0.86 AGE• What happens when income(age) increases?• Are the coefficients significant?

Example 4

• Which customers are most likely to buy?• Cataloger believes that ratio of total orders to total

pieces mailed is good measure of purchase likelihood

• Call this ratio RESP• Indep variables are

- TOTDOLL: total purchase dollars- AVGORDR: average dollar order- LASTBUY: # of months since last purchase

Example 4

• Analysis of Variance table

- How is total sum of squares split up?

- How do you get the various Deg of Freedom?

- How do you get/interpret R-square?

- How do you interpret the F statistic?

- What is the Adjusted R-square?

Example 4

• Parameter estimates table

- What are the t-values corresp to the estimates?

- What are the p-values corresp to the estimates?

- Which variables are the most important?

- What are standardized estimates?

- What to do with non-significant variables?

top related