practice+problems2_4031_f14

1

ISyE4031 Regression and Forecasting

Practice Problems 2

Fall 2014

1. In mobile ad hoc computer networks, messages must be forwarded from computer to computer

until they reach their destinations. The data overhead is the number of bytes of information that

must be transmitted, and it measures the success of a protocol. A regression study was

performed to predict the data overhead (kB) by using the independent variables average speed

(m/s), pause time (s), and link change rate (LCR, 100/s). The studied model was

2163253143322110 xxxxxxxxy

The regression equation is

Overhead = 406 1.93 Speed 0.387 Pause + 1.02 LCR 0.0618 Speed*LCR + 0.290 Pause*LCR + 0.0401 Speed**2

Predictor Coef SE Coef T P Significant?(Y/N) Keep? Why?

Constant 406.491 9.032 45.01 0.000

Speed -1.932 1.137 -1.70 0.107

Pause -0.3866 0.2202 -1.76 0.096

LCR 1.0153 0.9208 1.10 0.285

Speed*LCR -0.06182 0.02102 -2.94 0.009

Pause*LCR 0.28985 0.04259 6.81 0.000

Speed**2 0.04013 0.02056 1.95 0.067

a. By looking at the output and by considering the p values, state whether a predictor should be

kept in the model by using = 0.05, and why? Fill in the table.

b. What would the expected data overhead be if the average speed is 15 m/s, the pause time is 10

s, and the LCR is 20?

2. The weekly sales (in $1000 per week) for fast-food outlets in each of four cities were

collected. The objective is to model sales (y) as a function of traffic flow (in 1,000 of cars),

adjusting for city-to-city variations that might be due to size or other market conditions. The

linear regression model is therefore:

y = + x + C +C + C +, where x is traffic flow,

C1=

otherwise ,0

1city if ,1 , C2 =

otherwise ,0

2city if ,1 , and C3 =

otherwise ,0

3city if ,1,

and City 4 is the base level.

The following output was obtained.

SALES = 1.08 - 1.22 CITY_1 - 0.531 CITY_2 - 1.08 CITY_3 + 0.104 TRAFFIC

Predictor Coef SE Coef T P

Constant 1.0834 0.3210 3.37 0.003

CITY_1 -1.2158 0.2054 -5.92 0.000

CITY_2 -0.5308 0.2848 -1.86 0.078

CITY_3 -1.0765 0.2265 -4.75 0.000

TRAFFIC 0.103673 0.004094 25.32 0.000

S = 0.362307 R-Sq = 97.9% R-Sq(adj) = 97.5%

2

Analysis of Variance

Source DF SS MS F P

Regression 4 116.656 29.164 222.17 0.000

Residual Error 19 2.494 0.131

Total 23 119.150

Answer the following questions by using = 0.05.

a. Are the mean weekly sales identical for all four cities? Why? I.e., support your answer by

hypothesis testing and expected values. Use =0.05.

b. Does City 2 have more expected sales than City 4? State the hypothesis that you consider

explicitly.

c. Which city has the most expected sales? Explain by referring to the coefficients, hypothesis

testing, and expected values.

d. What is the expected sales in City 3 when the traffic flow is recorded as 60,000 cars?

e. Suppose that you modeled and solved the problem as a simple linear regression model:

y = + x + , where y: Sales and x: Traffic flow. In order to decide on the significance of the cities, compare this reduced model and the complete model by using a partial (nested) F test.

State the hypothesis explicitly.

SALES = 0.018 + 0.108 TRAFFIC


Source DF SS MS F P

Regression 1 111.34 111.34 313.75 0.000


Total 23 119.15

3. Screening Techniques.

a. In a linear regression study five predictors are being evaluated by using a stepwise regression.

By considering the quantities given below, perform the first two steps of the stepwise regression,

i.e., write down the selected variable(s) in step1 and step 2. Assume entry= remove= 0.10.

Step1 Predictors t-stat for each Xj

p-value for each Xj

X1 11.21

0.000

X2 22.90

0.000

X3 2.75

0.015

X4 22.41

0.000

X5 10.71

0.000

Step2 Predictors t-stat for each Xj

p-value for each Xj

X1, X2 3.91,9.92

0.002,0.000

X1, X3 10.41,2.37

0.000,0.033

X1, X4 3.84,9.67

0.002,0.000

X1, X5 3.06,2.74

0.009,0.016

X2, X3 24.43,-3.40

0.000,0.004

Predictors

t-stat for each Xj

p-value for each Xj

X2, X4 0.72,-0.41

0.486,0.690

X2, X5 7.19,1.34

0.000,0.200

X3, X4 -3.31,23.86

0.005,0.000

X3, X5 2.02,9.48

0.063,0.000

X4, X5 6.97,1.39

0.000,0.250

Step 1:

Step 2:

b. Consider the best subset output given below. What is the best subset of variables? State your

reasons.

3

Response is y

Mallows x x x x x x

Vars R-Sq R-Sq(adj) Cp S 1 2 3 4 5 6

1 51.2 45.1 144.9 2496.8 X

1 47.6 41.1 156.0 2587.1 X

2 93.9 92.1 14.9 945.09 X X

2 79.6 73.8 59.1 1726.6 X X

3 97.2 95.9 6.6 686.37 X X X

3 95.8 93.7 11.1 848.68 X X X

4 98.8 97.9 3.7 492.34 X X X X

4 97.6 95.8 7.3 694.63 X X X X

5 99.0 97.7 5.2 510.32 X X X X X

5 98.8 97.3 5.7 549.94 X X X X X

6 99.0 97.1 7.0 574.97 X X X X X X

4. Residual analysis and diagnostics.

a. A linear regression model, y = + x + x + , was fitted to 24 heat treatment data points, and several test results for diagnostics statistics were produced. Consider the following

four of those 24 observations and state whether they are unusual or not. If an observation is

unusual, is it outlier, leverage, and/or influential? State your reasons explicitly by referring to the

diagnostics statistics.

Observation y SRES1 TRES1 HI1 COOK1

1 0.013 -1.47200 -1.50366 0.053974 0.04121

2 0.068 -3.17206 -3.33242 0.689127 3.48608

3 0.056 -0.12900 -0.12679 0.269057 1.02204

4 0.014 -2.87807 -2.90441 0.058732 0.11762

Observation #1:

Observation #2:

Observation #3:

Observation #4:

5. The data on annual sale revenues (in billions of dollars) of the Eastman Kodak Company over

a 25-year period were considered. The following time series plot depicts the actual annual

revenues (y) over the 25 years.

24222018161412108642

20.0

17.5

15.0

12.5

10.0

7.5

5.0

Index

Y

Time Series Plot of Y

4

A quadratic trend model, yt = + t + t2 + was studied and the following results were

obtained. The regression equation is

Y = 3.15 + 1.46 t - 0.0411 t^2

Predictor Coef SE Coef T P

Constant 3.152 1.137 2.77 0.011

t 1.4617 0.2194 6.66 0.000

t^2 -0.041122 0.008832 -4.66 0.000

S = 2.04889 R-Sq = 80.6% R-Sq(adj) = 78.9%


Source DF SS MS F P

Regression 2 384.04 192.02 45.74 0.000


Total 24 476.39

a. Would you consider this model and the predictors useful (significant)? Support your answer

by looking at the p values, and the tests. Assume = 0.10.

b. Would you change your answer in part (a) when you saw the residual plots given below by

considering the assumptions on errors? In other words, check whether the results justify the

basic error term assumptions or not, i.e., i ~ i.i.d. Normal(0, 2). State the hypotheses that you

are testing and the residual plot that you are referring to explicitly, and use .

5.02.50.0-2.5-5.0

99

90

50

10

1

Residual

Pe

rce

nt

15105

4

2

0

-2

-4

Fitted Value

Re

sid

ua

l

43210-1-2-3

8

6

4

2

0

Residual

Fre

qu

en

cy

24222018161412108642

4

2

0

-2

-4

Observation Order

Re

sid

ua

l

Normal Probability Plot Versus Fits

Histogram Versus Order

Residual Plots for Y

A-D = 0.397, p-value = 0.343

Durbin-Watson statistic = 0.532552

- Each i has a normal distribution:

- Each i has an identical distribution:

- Each i is independent:

6. In a chemical experiment, the true relationship between yield (y) and reaction time (x) is

assumed to be: y = ee x21 .

5

a. First, apply a transformation to the equation so that a simple linear regression solution can be

found.

b. Then, consider the solution to the transformed model: *y = -2.3+0.6 x.

What are 1 , 2 , and the predicted value of y (i.e., y ) when x = 5?

7. Answer the following short-answer questions.

a. If the variance inflation factor (VIF) is not less than 1, we can say that there exists

multicollinearity between independent variables. True or False?

b. An unusual data point can be both an outlier and a high leverage point. True or False?

c. What can we detect when t-tests for all (or nearly all) parameters are non-significant

whereas the F-test for overall model adequacy (H0: 1==k = 0) is significant?

d. Suppose a fitted linear regression model is y = 15 + 2 x1 3x2 + 2

2x + 4 x1 x2. What is the

amount of change in the expected value of y for every one-unit increase in x1, holding x2 fixed

at 2?

practice+problems2_4031_f14

Documents