lecture 22 – thurs., nov. 25
DESCRIPTION
Lecture 22 – Thurs., Nov. 25. Nominal explanatory variables (Chapter 9.3) Inference for multiple regression (Chapter 10.1-10.2). Nominal Variables. To incorporate nominal variables in multiple regression analysis, we use indicator variables. - PowerPoint PPT PresentationTRANSCRIPT
Lecture 22 – Thurs., Nov. 25
• Nominal explanatory variables (Chapter 9.3)
• Inference for multiple regression (Chapter 10.1-10.2)
Nominal Variables
• To incorporate nominal variables in multiple regression analysis, we use indicator variables.
• Indicator variable to distinguish between two groups– The time onset (early vs. late) is a nominal
variable. To incorporate it into multiple regression analysis, we used indicator variable early which equals 1 if early, 0 if late.
earlylightearlylightflowers 210},|{
Nominal Variables with More than Two Categories
• To incorporate nominal variables with more than two categories, we use multiple indicator variables. If there are k categories, we need k-1 indicator variables.
Nominal Explanatory Variables Example: Auction Car Prices
• A car dealer wants to predict the auction price of a car.– The dealer believes that odometer reading and
the car color are variables that affect a car’s price (data from sample of cars in auctionprice.JMP)
– Three color categories are considered:• White• Silver• Other colors
Note: Color is a nominal variable.
I1 =1 if the color is white0 if the color is not white
I2 =1 if the color is silver0 if the color is not silver
The category “Other colors” is defined by:I1 = 0; I2 = 0
Indicator Variables in Auction Car Prices
• Solution– the proposed model is
– The dataPrice Odometer I-1 I-214636 37388 1 014122 44758 1 014016 45833 0 015590 30862 0 015568 31705 0 114718 34010 0 1
. . . .
. . . .
White car
Other color
Silver color
Auction Car Price Model
231210},|{ IIodometercolorodometerY
Odometer
Price
Price = 16701 - .0555(Odometer) + 90.48(0) + 295.48(1)
Price = 16701 - .0555(Odometer) + 90.48(1) + 295.48(0)
Price = 6350 - .0278(Odometer) + 45.2(0) + 148(0)
16701 - .0555(Odometer)
16791.48 - .0555(Odometer)
16996.48 - .0555(Odometer)
The equation for an“other color” car.
The equation for awhite color car.
The equation for asilver color car.
From JMP we get the regression equationPRICE = 16701-.0555(Odometer)+90.48(I-1)+295.48(I-2)
Example: Auction Car Price The Regression Equation
From JMP we get the regression equationPRICE = 16701-.0555(Odometer)+90.48(I-1)+295.48(I-2)
A white car sells, on the average, for $90.48 more than a car of the “Other color” category
A silver color car sells, on the average, for $295.48 more than a car of the “Other color” category.
For one additional mile the auction price decreases by 5.55 cents.
Example: Auction Car Price The Regression Equation
There is insufficient evidenceto infer that a white color car anda car of “other color” sell for adifferent auction price.
There is sufficient evidenceto infer that a silver color carsells for a larger price than acar of the “other color” category.
Xm18-02b
Example: Auction Car Price The Regression Equation
Parameter Estimates Term Estimate Std Error t Ratio Prob>|t|
Intercept 16700.646 184.3331 90.60 <.0001 Odometer -0.05554 0.004737 -11.72 <.0001 I-1 90.481959 68.16886 1.33 0.1876 I-2 295.47602 76.36998 3.87 0.0002
Shorthand Notation for Nominal Variables
• Shorthand Notation for regression model with Nominal Variables. Use all capital letters for nominal variables– Parallel Regression Lines model:
– Separate Regression Lines model:
TIMElightTIMElightflowers },|{
)*(},|{ TIMElightTIMElightTIMElightflowers
Nominal Variables in JMP
It is not necessary to create indicator variables yourself to represent a nominal variable.
Nominal variables in JMP: • Make sure that the nominal variable’s modeling type is in fact
nominal.• Include the nominal variable in the Construct Model Effects
box in Fit Model• JMP will create indicator variables. The brackets indicate the
category of the nominal variable for which the indicator variable is 1.
• JMP will leave out the level which is highest alphabetically or numerically.
Specially Constructed Explanatory Variables
• Types of specially constructed explanatory variables:– Powers of variables
– Products of variables (interactions)
– Indicator variables to represent nominal variables
– Transformations of variables (e.g., log)
• Use matrix of pairwise scatterplots to initially examine the data and look for needed transformations, powers of variables.
Inference for Multiple Regression
• Chapter 10.2– Tests for single coefficients
– Confidence intervals for single coefficients
– Confidence intervals for mean response at
– Prediction intervals for
• Chapter 10.3– F-test for overall significance of regression
– F-test for joint significance of several terms (will not cover)
pXX ,...,1
pXX ,...,1
Case Study 10.1.2
• Question: Do echolocating bats expend more energy than nonecholocating bats after accounting for body size?
• Data: Body mass and flight energy expenditure for 4 nonecholocating bats, 12 non-echolocating birds and 4 echolocating bats.
• Strategy: Build a multiple regression model for mean energy expended as a function of type of flying vertebrate (echolocating bat, nonecholocating bat, nonecholocating bird) and body size .– Explore (resolve need for transformation)– Test for interaction– If no interaction, answer question with the three parallel lines
model
Coded Scatterplots
• To construct a coded scatterplot, create columns energy nonecholocating bat, energy nonecholocating bird and energy echolocating bat. The column energy nonecholocating bat should contain only the energies for nonecholocating bats and a blank for all other species.
• Click graph, overlay plot, put energy nonecholocating bat, energy nonecholocating bird and energy echolocating bat in Y and mass in X.
Coded ScatterplotsO v e r l a y P l o t O v e r l a y Y ' s
-10
0
10
20
30
40
50
Y
-200 400.00000000 1000MASS
Y Energy nonecholocating batEnergy nonecholocating birdEnergy echolocating bat
O v e r l a y P l o t O v e r l a y Y ' s
-1
0
1
2
3
4
5
Y
1 2 3 4 5 6 7 8Log Mass
Y Log Energy Nonecholocating BatLog Energy Nonecholocating BirdLog Energy Echolocating Bat
Separate/Parallel Regression Lines Model
• Separate regression lines model:
• Parallel regression lines model:
nebirdnebatnebirdnebat IlmassIlmasslmassIITYPElmasslenergy
TYPElmassTYPElmassTYPElmasslenergy
**},|{
*},|{
543210
lmassIITYPElmasslenergy
TYPElmassTYPElmasslenergy
nebirdnebat 3210},|{
},|{
Inferences for Echolocating Bats
• Is the parallel regression lines model appropriate? Test and
• There is no evidence against the parallel regression lines model so we go ahead and use it to answer the question of interest – do echolocating bats use less energy than nonecholating bats of the same body size ( ) and nonecholocating birds of the same body size.( )
0: 40 H 0: 50 H
0: 10 H
0: 20 H
Inferences for Echolocating Bats Cont.
• No strong evidence that echolocating bats use less energy than either nonecholocating bats (p-value = 0.35) or nonecholocating birds (p-value = 0.77) of same body size.
• 95% Confidence interval for difference in mean of log energy for nonecholocating bats and echolocating bats of same body size: (-0.51,0.35).
• This means that 95% confidence interval for ratio of median energy for nonecholocating bats and echolocating bats of same body size is
• Summary of findings: Although there is no strong evidence that echolocating bats use less energy than nonecholocating bats of same body size, it is still plausible that they use quite a less bit energy (60% as much at the median). Study is inconclusive.
)42.1,60.0(),( 35.151.0 ee
Prediction Intervals
• To find a 95% prediction interval for the mean log energy of a flying vertebrate of a given type and mass, – Fit the multiple regression model– Click red triangle next to response log energy, click save
columns, click predicted values and also click indiv confid interval. This saves the predicted values, lower 95% prediction interval endpoint and upper 95% prediction interval endpoint for each observation in data set.
– To get prediction interval for X’s that are not in the data set, enter a row with those X’s and then exclude the observation.