lecture 22 – thurs., nov. 25

20
Lecture 22 – Thurs., Nov. 25 • Nominal explanatory variables (Chapter 9.3) • Inference for multiple regression (Chapter 10.1- 10.2)

Upload: zena-contreras

Post on 04-Jan-2016

16 views

Category:

Documents


0 download

DESCRIPTION

Lecture 22 – Thurs., Nov. 25. Nominal explanatory variables (Chapter 9.3) Inference for multiple regression (Chapter 10.1-10.2). Nominal Variables. To incorporate nominal variables in multiple regression analysis, we use indicator variables. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lecture 22 – Thurs., Nov. 25

Lecture 22 – Thurs., Nov. 25

• Nominal explanatory variables (Chapter 9.3)

• Inference for multiple regression (Chapter 10.1-10.2)

Page 2: Lecture 22 – Thurs., Nov. 25

Nominal Variables

• To incorporate nominal variables in multiple regression analysis, we use indicator variables.

• Indicator variable to distinguish between two groups– The time onset (early vs. late) is a nominal

variable. To incorporate it into multiple regression analysis, we used indicator variable early which equals 1 if early, 0 if late.

earlylightearlylightflowers 210},|{

Page 3: Lecture 22 – Thurs., Nov. 25

Nominal Variables with More than Two Categories

• To incorporate nominal variables with more than two categories, we use multiple indicator variables. If there are k categories, we need k-1 indicator variables.

Page 4: Lecture 22 – Thurs., Nov. 25

Nominal Explanatory Variables Example: Auction Car Prices

• A car dealer wants to predict the auction price of a car.– The dealer believes that odometer reading and

the car color are variables that affect a car’s price (data from sample of cars in auctionprice.JMP)

– Three color categories are considered:• White• Silver• Other colors

Note: Color is a nominal variable.

Page 5: Lecture 22 – Thurs., Nov. 25

I1 =1 if the color is white0 if the color is not white

I2 =1 if the color is silver0 if the color is not silver

The category “Other colors” is defined by:I1 = 0; I2 = 0

Indicator Variables in Auction Car Prices

Page 6: Lecture 22 – Thurs., Nov. 25

• Solution– the proposed model is

– The dataPrice Odometer I-1 I-214636 37388 1 014122 44758 1 014016 45833 0 015590 30862 0 015568 31705 0 114718 34010 0 1

. . . .

. . . .

White car

Other color

Silver color

Auction Car Price Model

231210},|{ IIodometercolorodometerY

Page 7: Lecture 22 – Thurs., Nov. 25

Odometer

Price

Price = 16701 - .0555(Odometer) + 90.48(0) + 295.48(1)

Price = 16701 - .0555(Odometer) + 90.48(1) + 295.48(0)

Price = 6350 - .0278(Odometer) + 45.2(0) + 148(0)

16701 - .0555(Odometer)

16791.48 - .0555(Odometer)

16996.48 - .0555(Odometer)

The equation for an“other color” car.

The equation for awhite color car.

The equation for asilver color car.

From JMP we get the regression equationPRICE = 16701-.0555(Odometer)+90.48(I-1)+295.48(I-2)

Example: Auction Car Price The Regression Equation

Page 8: Lecture 22 – Thurs., Nov. 25

From JMP we get the regression equationPRICE = 16701-.0555(Odometer)+90.48(I-1)+295.48(I-2)

A white car sells, on the average, for $90.48 more than a car of the “Other color” category

A silver color car sells, on the average, for $295.48 more than a car of the “Other color” category.

For one additional mile the auction price decreases by 5.55 cents.

Example: Auction Car Price The Regression Equation

Page 9: Lecture 22 – Thurs., Nov. 25

There is insufficient evidenceto infer that a white color car anda car of “other color” sell for adifferent auction price.

There is sufficient evidenceto infer that a silver color carsells for a larger price than acar of the “other color” category.

Xm18-02b

Example: Auction Car Price The Regression Equation

Parameter Estimates Term Estimate Std Error t Ratio Prob>|t|

Intercept 16700.646 184.3331 90.60 <.0001 Odometer -0.05554 0.004737 -11.72 <.0001 I-1 90.481959 68.16886 1.33 0.1876 I-2 295.47602 76.36998 3.87 0.0002

Page 10: Lecture 22 – Thurs., Nov. 25

Shorthand Notation for Nominal Variables

• Shorthand Notation for regression model with Nominal Variables. Use all capital letters for nominal variables– Parallel Regression Lines model:

– Separate Regression Lines model:

TIMElightTIMElightflowers },|{

)*(},|{ TIMElightTIMElightTIMElightflowers

Page 11: Lecture 22 – Thurs., Nov. 25

Nominal Variables in JMP

It is not necessary to create indicator variables yourself to represent a nominal variable.

Nominal variables in JMP: • Make sure that the nominal variable’s modeling type is in fact

nominal.• Include the nominal variable in the Construct Model Effects

box in Fit Model• JMP will create indicator variables. The brackets indicate the

category of the nominal variable for which the indicator variable is 1.

• JMP will leave out the level which is highest alphabetically or numerically.

Page 12: Lecture 22 – Thurs., Nov. 25

Specially Constructed Explanatory Variables

• Types of specially constructed explanatory variables:– Powers of variables

– Products of variables (interactions)

– Indicator variables to represent nominal variables

– Transformations of variables (e.g., log)

• Use matrix of pairwise scatterplots to initially examine the data and look for needed transformations, powers of variables.

Page 13: Lecture 22 – Thurs., Nov. 25

Inference for Multiple Regression

• Chapter 10.2– Tests for single coefficients

– Confidence intervals for single coefficients

– Confidence intervals for mean response at

– Prediction intervals for

• Chapter 10.3– F-test for overall significance of regression

– F-test for joint significance of several terms (will not cover)

pXX ,...,1

pXX ,...,1

Page 14: Lecture 22 – Thurs., Nov. 25

Case Study 10.1.2

• Question: Do echolocating bats expend more energy than nonecholocating bats after accounting for body size?

• Data: Body mass and flight energy expenditure for 4 nonecholocating bats, 12 non-echolocating birds and 4 echolocating bats.

• Strategy: Build a multiple regression model for mean energy expended as a function of type of flying vertebrate (echolocating bat, nonecholocating bat, nonecholocating bird) and body size .– Explore (resolve need for transformation)– Test for interaction– If no interaction, answer question with the three parallel lines

model

Page 15: Lecture 22 – Thurs., Nov. 25

Coded Scatterplots

• To construct a coded scatterplot, create columns energy nonecholocating bat, energy nonecholocating bird and energy echolocating bat. The column energy nonecholocating bat should contain only the energies for nonecholocating bats and a blank for all other species.

• Click graph, overlay plot, put energy nonecholocating bat, energy nonecholocating bird and energy echolocating bat in Y and mass in X.

Page 16: Lecture 22 – Thurs., Nov. 25

Coded ScatterplotsO v e r l a y P l o t O v e r l a y Y ' s

-10

0

10

20

30

40

50

Y

-200 400.00000000 1000MASS

Y Energy nonecholocating batEnergy nonecholocating birdEnergy echolocating bat

O v e r l a y P l o t O v e r l a y Y ' s

-1

0

1

2

3

4

5

Y

1 2 3 4 5 6 7 8Log Mass

Y Log Energy Nonecholocating BatLog Energy Nonecholocating BirdLog Energy Echolocating Bat

Page 17: Lecture 22 – Thurs., Nov. 25

Separate/Parallel Regression Lines Model

• Separate regression lines model:

• Parallel regression lines model:

nebirdnebatnebirdnebat IlmassIlmasslmassIITYPElmasslenergy

TYPElmassTYPElmassTYPElmasslenergy

**},|{

*},|{

543210

lmassIITYPElmasslenergy

TYPElmassTYPElmasslenergy

nebirdnebat 3210},|{

},|{

Page 18: Lecture 22 – Thurs., Nov. 25

Inferences for Echolocating Bats

• Is the parallel regression lines model appropriate? Test and

• There is no evidence against the parallel regression lines model so we go ahead and use it to answer the question of interest – do echolocating bats use less energy than nonecholating bats of the same body size ( ) and nonecholocating birds of the same body size.( )

0: 40 H 0: 50 H

0: 10 H

0: 20 H

Page 19: Lecture 22 – Thurs., Nov. 25

Inferences for Echolocating Bats Cont.

• No strong evidence that echolocating bats use less energy than either nonecholocating bats (p-value = 0.35) or nonecholocating birds (p-value = 0.77) of same body size.

• 95% Confidence interval for difference in mean of log energy for nonecholocating bats and echolocating bats of same body size: (-0.51,0.35).

• This means that 95% confidence interval for ratio of median energy for nonecholocating bats and echolocating bats of same body size is

• Summary of findings: Although there is no strong evidence that echolocating bats use less energy than nonecholocating bats of same body size, it is still plausible that they use quite a less bit energy (60% as much at the median). Study is inconclusive.

)42.1,60.0(),( 35.151.0 ee

Page 20: Lecture 22 – Thurs., Nov. 25

Prediction Intervals

• To find a 95% prediction interval for the mean log energy of a flying vertebrate of a given type and mass, – Fit the multiple regression model– Click red triangle next to response log energy, click save

columns, click predicted values and also click indiv confid interval. This saves the predicted values, lower 95% prediction interval endpoint and upper 95% prediction interval endpoint for each observation in data set.

– To get prediction interval for X’s that are not in the data set, enter a row with those X’s and then exclude the observation.