regression everything is naturally associated. - mathacle's...

41
Mathacle PSet ----- Stats, Regression Analysis Level ---- 1 Number --- 1 Name: ___________________ Date: _____________ 1 Regression Everything is naturally associated. IV. REGRESSION ANALYSIS Regression analysis is to study the relationships among two or more variables. The basic assumptions for two-variable or bivariate regression analysis are: The sample is representative of the population for the inference prediction. The error is a random variable with a mean of zero conditional on the explanatory variables and the errors are uncorrelated. The variance of the error is constant across observations.

Upload: dodien

Post on 27-Jul-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

1

Regression – Everything is naturally associated.

IV. REGRESSION ANALYSIS

Regression analysis is to study the relationships among two or more variables.

The basic assumptions for two-variable or bivariate regression analysis are:

The sample is representative of the population for the inference prediction.

The error is a random variable with a mean of zero conditional on the explanatory

variables and the errors are uncorrelated.

The variance of the error is constant across observations.

Page 2: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

2

4.1. LSRL – Least Squares Regression Line

The sample standard deviation of X : 21

1x is x x

n

The sample standard deviation of Y : 21

1y is y y

n

The sample covariance of X and Y: yyxxn

s iixy

1

12

The sample correlation coefficient:

2 2

1

1

1

1 i i

xy

x y

i i

i i

i i

x y

x y

sr

s s

x x y y

x x y y

x x y y

n s s

z zn

The slope of the best fitting line: 1

y

x

sb r

s

The intercept of the best fitting line: 0 1b y b x or 0 1y b b x

The predicted values for Y: xbby 10ˆ

[MATH] The predicted z-score for Y:

0 1 0 1

1

y

y y

y

y x y

x

x

b b x b b xy yz

s s

sx x x xb r

s s s

x xr r z

s

Page 3: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

3

That is, in the 2-dimensional z space, the “best-fit” line

xy zrz ˆ

passes though the origin (means) with the r as the slope.

Example 4.1.1. A set of sample data about heights and weights is given below:

Height ix Weights

iy

61 105

62 120

63 120

65 160

65 120

68 145

69 175

70 160

72 185

75 210

To analyze the data, the first thing to do may be to visualize the data. Sketch the graph

Page 4: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

4

Solution:

Does the plot seem “linearly correlated?”

4.2. Find the Regression Line and Residuals

Example 4.2.1. The data is given in the table below:

Height ix Weights

iy

61 105

62 120

63 120

65 160

65 120

68 145

69 175

70 160

72 185

75 210

Find the following variables by hand:

a.) _________n , _________x , _________y

b.) 21

_________1

x is x xn

Page 5: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

5

ix ix x

2

ix x

61

62

63

65

65

68

69

70

72

75

2

ix x

c.) 21

_________1

y is y yn

iy iy y

2

iy y

105

120

120

160

120

145

175

160

185

210

2

iy y

Page 6: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

6

d.)

yyxxn

s iixy1

12

ix ix x

iy iy y i ix x y y

61 105

62 120

63 120

65 160

65 120

68 145

69 175

70 160

72 185

75 210

i ix x y y

e.) ____________

2

yx

xy

ss

sr

f.) ____________2 r

g.) 1 _____________y

x

sb r

s

h.) 0 1 ___________b y b x

i.) 0 1 __________________y b b x

Solution:

a.) 150,67,10 yxn

b.) 57.4xs

Page 7: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

7

ix ix x

2

ix x

61 -6 36

62 -5 35

63 -4 16

65 -2 4

65 -2 4

68 1 1

69 2 4

70 3 9

72 5 25

75 8 64

2

ix x 188

c.) 99.33ys

iy iy y

2

iy y

105 -45 2025

120 -30 900

120 -30 900

160 10 100

120 -30 900

145 -5 25

175 25 625

160 10 100

185 35 1225

210 60 3600

2

iy y 10400

d.) 56.1452 xys

ix ix x iy iy y i ix x y y

61 -6 105 -45 270

62 -5 120 -30 150

63 -4 120 -30 120

65 -2 160 10 -20

65 -2 120 -30 60

68 1 145 -5 -5

69 2 175 25 50

Page 8: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

8

70 3 160 10 30

72 5 185 35 175

75 8 210 60 480

i ix x y y 1310

e.) 94.0r

f.) 88.02 r

g.) 99.61 b

h.) 33.3180 b

i.) xy 99.633.318ˆ

[ Ti-84] Find the regression line:

STAT -> Calc -> 8. LinReg(a+bx) L1, L2, 1Y

To get 1Y :

VARS -> Y-VARS ->1. Function -> 1. 1Y

Use the calculator to find the following:

_________________x

_________________xS

_________________y

_________________yS

_________________r

1 _________________b

0 _________________b

_________________y

2 _________________r

Page 9: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

9

Residual Plot

The error or residual is given by

e y y

A residual plot shows the residuals (response variable) on the y-axis and explanatory

variable on the x-axis. If the residual plot has a distinct pattern rather than a random

scattering points, the “linear model” may not be suitable to “best fit” the data.

[ Ti-84] Find the residual plot:

1.) Turn off the StatPlot #1 and turn on StatPlot#2.

2.) In StatPlot#2, change L2 to RESID.

3.) To find RESID:

2nd

-> LIST -> #7: RESID

4.) Graph Residuals:

Zoom -> ZoomStat

Example 4.2.2. Sketch the residual plot for the regression line in Example 4.2.1.

Page 10: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

10

Solution:

4.3. Regression with Nonlinear Regressors

Example 4.3.1. A set of sample data are given below:

ix iy

1 2

2 1

3 6

4 14

5 15

6 30

7 40

8 74

9 75

1.) Sketch the original graph

Page 11: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

11

2.) Sketch the residual graph

Use the quadratic transformation:

ix (L1) iy (L2) iy (L3)

1 2 1.4142

2 1 1

3 6 2.4495

4 14 3.7417

5 15 3.873

6 30 5.4772

7 40 6.3246

8 74 8.6023

9 75 8.6603

3.) Sketch the “transformed” graph

Page 12: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

12

4.) Sketch the “transformed” residual graph

Solution:

1.) The original graph:

2.) The residual graph:

The residual plot looks more like a parabola.

Page 13: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

13

3.) Sketch the “transformed” graph

4.) Sketch the “transformed” residual graph

The residual plot looks more like a random noise.

4.4. Properties of LSRL

a.) y or yz are always underestimated.

[MATH]

From Cauchy–Schwarz Inequality, 2 2

i i i ix x y y x x y y , or

1r . In z-score space, the change in the estimate y is always smaller than the change in

x . i.e., the angle of the slope is within 045 .

Page 14: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

14

b.) The Sum of Residuals is zero

The residual is defined as

iii yye ˆ

The sum of the residuals is

0

ˆ

1

11

10

xxbyy

xbxbyy

xbby

yy

e

ii

ii

ii

ii

i

c.) Coefficient of Determination 2r

The normalized sum of squared residuals:

2

22

111

ˆr

n

rzz

n

zzSSR iii xyyiy

The percentage of variability that cannot be explained is 21 r . So, the percentage of

variability that can be explained is 2r .

2r is called the coefficient of determination.

Do not place too much importance on small differences between 2r values. Keep in mind

that 2,r r values can only relatively be compared while evaluating certain regression

models.

Page 15: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

15

d.) Rule of Thumb for Correlation Strength

0 0.3r , weak correlation

0.3 0.7r , moderate correlation

0.7r , strong correlation

When the points are removed, they are influential if those points change the slope of the

line and correlation coefficient greatly. The influential points are outliers, but outliers

may not be influential. When the data size is large, a single outlier may not be influential.

Page 16: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

16

e.) Residual Plot

The residual plot depicts the measure of the signed distances between the actual data

values and the outputs predicted by the model. A good linear model has residuals that are

near zero and are randomly distributed.

**4.5. Models with Nonlinear Regressors and Linear in Coefficients

A logarithmic model log (x)cy a b may be appropriate if ln( ), yx or

log( ), yx appear to be linear. The logarithmic model can be expressed as

0 1ˆ ln(x)y b b or 0 1

ˆ log(x)y b b

An exponential model xy a b may be appropriate if , ln(y)x or , log(y)x appear to be

linear. The exponential model can be expressed as

0 1ˆln y b b x or 0 1

ˆlog y b b x

Or in the exponential form: 0 1ˆ b b xy e or 0 1ˆ 10b b xy

An power model by a x may be appropriate if ln( ), ln(y)x or log( ), log(y)x appear

to be linear. The power model can be expressed as

0 1ˆln ln( )y b b x or 0 1

ˆlog log( )y b b x

Or in the exponential form: 0 1ˆ b by e x or 0 1ˆ 10b by x

Page 17: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

17

Start with analysis of the original ( , )x y data. Check r and residuals for all three models,

if necessary, to see which model is more “linear.”

The exponential model may be better for “faster growing” response variable, while the

logarithmic model may be better for “slower growing” response variable.

When apply bx to “straight” the curve, use 1b to “bend” concave-down curves, and use

0 1b to “bend” concave-up curves.

Be sure to keep the list of variables straight. Label sketches/graphs with appropriate

variables.

Example 4.5.1. A Xerox machine dealer has data on the number x of Xerox machines at

each of 89 customer locations and the number y of service calls in a month at each

location. Summary calculations are:

8.4, 14.2, 0.86, 2.1, 3.8x yx y r s s

What is the y-intercept of the LSRL?

Solution:

1

3.80.86 1.556

2.1

y

x

sb r

s , 0 1 14.2 1.556 8.4 1.128b y b x

Example 4.5.2. Which of the following would not be a correct conclusion based on an a

correlation of 0.27 ?

(A) There is a weak linear relationship between the 2 variables.

(B) Approximately 7.3% of variation in y can be explained by linear relationship with x.

(C) In general, as one variable increases, the other variable tends to increase as well.

(D) There is a positive association between the two variables.

(E) The relationship must not follow a linear pattern since the correlation value is so low.

Solution:

The answer is E. Even the correlation value is low, i.e., only 7.3% variation can be

explained, we can not claim that the two variables are totally nonlinear.

Page 18: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

18

Example 4.5.3. A study was conducted to determine the relationship between the number

of mishandled baggages per 1000 customers ( )x and the percentage of on-time arrivals

for various airlines. The LSRL was found to be 97 5.08y x and 2 0.5384r . What is

the correlation?

Solution:

Since 1 0b , the correlation coefficient must be negative. 0.5384 0.7338r .

Example 4.5.4. Suppose that the scatterplot of logX and logY shows a strong positive

correlation close to one. Which of the following is true?

I. The variable X and Y also have a correlation close to one.

II. A scatterplot of the variables X and Y shows strong nonlinear pattern.

III. The residual plot of the variables X and Y shows a random pattern.

Solution:

That logX and logY shows a strong positive correlation implies the power model should

be used:

0 1logY logXb b or 0 1Y 10 Xb b . So, only II is true.

Example 4.5.5. [2014APStatsFRQs, #6] Graph I is a scatterplot showing the lengths of

66 cars plotted with the fuel consumption rate (FCR). One point on the graph is labeled A.

A computer output from a linear regression is shown below.

Page 19: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

19

a.) The point on the graph labeled A represents one car of length 175 inches and an FCR

of 5.88. Calculate and interpret the residual for the car relative to the least squares

regression line.

Graph II is a scatterplot showing the engine size of the 66 cars plotted with the

corresponding residuals from the regression of FCR on length. Graph III is a scatterplot

showing the wheel base of the 66 cars plotted with the corresponding residuals from the

regression of FCR on length.

b.) In graph II, the point labeled A corresponds to the same car whose point was labeled

A in graph I. The measurements for the car represented by point A are given below.

(i) Circle the point on graph III that corresponds to the car represented by point A on

graphs I and II.

Page 20: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

20

(ii) There is a point on graph III labeled B. It is very close to the horizontal line at 0.

What does that indicate about the FCR of the car represented by point B?

c.) Write a few sentences to compare the association between the variables in graph II

with the association between the variables in graph III.

d.) Jamal wants to predict FCR using length and one of the other variables, engine size

or wheel base. Based on your response to part (c), which variable, engine size or wheel

base, should Jamal use in addition to length if he wants to improve the prediction?

Explain why you chose that variable.

Solution:

a.) 5.88 ( 1.595789 0.0372614 175) 0.955A A Ae y y . The predication

underestimated 0.955 FCR for the length of 175 in.

b.) Point B represents with the wheel base of about 120, the regression model can

accurately predict FCR on the length.

c.) There is a moderate positive association in Graph II and very weak association in

Graph III. The association is stronger between residual and engine size than that between

residual and wheel base.

d.) Since the engine size shows a stronger positive association with the residuals, it is

more useful to provide extra info when it comes to assess FCR based on the length.

Page 21: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

21

Practice Questions -- LSRL

1. The residual value of x, y in a linear regression is

(A) Negative

(B) Zero

(C) Positive

(D) Dependent on the value of r

(E) None of these

2. If 60,12 is an influential point for the regression line xy 098.4908.7ˆ , then which

of the following must be true?

(A) Removal of 60,12 will improve r

(B) Removal of 60,12 will not affect r

(C) Removal of 60,12 will change the value of the slope of regression line

(D) 60,12 has a large residual

(E) None of these

3. A statistics student calculated a LSRL to describe the relationship between two

variables and then realized that he had mistakenly interchanged the explanatory and

response variable. When the LSRL is recalculated, what can be said about the

correlation?

(A) The correlation does not change when the variables are switched.

(B) The correlation will have the same absolute value, but the sign will change.

(C) The correlation will increase once the explanatory and response variables are

correctly identified and used appropriately in the calculation of the LSRL

(D) The correlation will decrease because of the student’s initial mistake.

(E) We can not be sure how the correlation of the new LSRL will compare to the

correlation that was found originally.

Page 22: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

22

4. Suppose a data set is transformed using , ( , log )x y x y and a least linear regression

procedure is performed on the transformed data. If the residual plot of this regression

shows a curved pattern, which of the following is an appropriate conclusion?

(A) A quadratic model should be used with the original data.

(B) A square root transformation should be applied to the transformation data

(C) The correlation coefficient of the set of transformed data is zero.

(D) The exponential transformation is not appropriate.

(E) None of these is appropriate.

5. After data are collected from an agricultural experiment, suppose a transformation is

performed on the bivariate set (inches of water, total plant growth.) if the linear

regression of the transformed data has the equation: ( ) 0.7 1.93 log(water)log growth

The regression model of the original data is

(A) 0.7 1.93(water)growth

(B) 5.01 1.93(water)growth

(C) 5.01 1.93water

growth

(D) 1.93

5.01growth water

(E) none of these

6. Residuals are

(A) possible models not explored by the researcher.

(B) variation in the response variable that is explained by the model.

(C) the difference between the observed response and the values predicted by the model.

(D) data collected from individuals that is not consistent with the rest of the group.

(E) a measure of the strength of the linear relationship between x and y .

7. Data was collected in two variables x and y and a least squares regression line was

fitted to the data. The resulting equation is ˆ 2.29 1.7y x . What is the residual for

point 5,6 ?

(A) 2.91

(B) 0.21

(C) 0.21

(D) 6.21

(E) 7.91

Page 23: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

23

8. Given a set of ordered pairs ,x y with 2.5xs , 1.9ys , 0.63r , what is the slope

of the regression line of y on x ?

(A) 0.48

(B) 0.65

(C) 1.32

(D) 1.90

(E) 2.63

9. All but one of these statements is false. Which one could be true?

(A) The correlation between a football player’s weight and the position he plays is 0.54

(B) The correlation between a car’s length and its fuel efficiency is 0.71miles per gallon.

(C) There is a high correlation (1.09) between height of a corn stalk and its age in weeks.

(D) Correlation between amount of fertilizer used and quantity of beans harvested is

0.42

(E) There is a correlation of 0.63between gender and political party.

10. Which is true?

I. Random scatter in the residuals indicates a linear model.

II. If two variables are very strongly associated, then the correlation between them will be

near 1.0 or 1.0 .

III. Changing the units of measurement for x or y changes the correlation coefficient.

(A) I only

(B) II only

(C) I and II only

(D) II and III only

(E) I, II and III

11. If the coefficient of determination 2r is calculated as 0.49 , then the correlation

coefficient

(A) can not be determined without the data

(B) is 0.70

(C) is 0.2401

(D) is 0.70

(E) is 0.7599

Page 24: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

24

12. Which of the following is a correct conclusion based on the residual plot displayed?

(A) The line overestimates the data.

(B) The line underestimates the data.

(C) It is not appropriate to fit a line to these data since there is clearly no correlation.

(D) The data are not related.

(E) There is a nonlinear relationship between the variables.

13. What conclusion can be reached based on the residual plot shown below?

(A) Since the plot shows a pattern, a linear model is not appropriate for the data.

(B) Since the plot shows a pattern, the data is not roughly normally distributed.

(C) Since both parts of the plot shows a roughly linear pattern, a linear model is

appropriate for the data.

(D) Since both parts of the plot follow a roughly linear, the data is approximately

normally distributed.

(E) The data appears to follow a quadratic or absolute value model

Page 25: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

25

14. Which of the following would not be correct based on a correlation of 0.92 ?

(A) There is a negative association between the two variables.

(B) Since correlation is a resistant measure, there must not be any outliers.

(C) In general, as the value of one variable increases, the value of the other variable tends

to decrease.

(D) There is a strong linear relationship between the two variables

(E) Approximately 84.6%of the variation in y can be explained by the linear

relationship with x .

15. Data that follows an exponential model in ,x y can be re-expressed as a linear model

if you plot:

(A) log ,x y

(B) ,x y

(C) log , logx y

(D) 2,x y

(E) , logx y

16. An LSRL is to found to be: ˆlog 2.35 0.62y x . The equation can be rewritten as:

(A) ˆ 0.24 (223.87)xy

(B) ˆ 223.87 0.24y x

(C) ˆ 223.87 (0.24)xy

(D) ˆ 223.87(0.24)xy

(E) ˆ 0.24 (223.87)xy

17. An LSRL was calculated and the residual for a certain data points is found to be

0.8074 . This tells us that the predicted cost is

(A) wrong

(B) higher than our observed cost

(C) lower than our observed cost

(D) the result of extrapolation

(E) the result of interpolation

Page 26: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

26

18. An LSRL was computed for log , logx y . The resulting equation was:

ˆlog 3.1 2log( )y x . Find the predicted value of y when 1x .

(A) 5.1

(B) 510.00

(C) 1258.9

(D) 5100.0

(E) 126,000.0

19. Which of the following statements are true?

I. When the data set includes an influential point, the data set is nonlinear.

II. Influential points always reduce the coefficient of determination.

III. All outliers are influential data points.

(A) I only

(B) II only

(C) III only

(D) All of the above

(E) None of the above

20. In the scatterplot of y versus x shown below, the least squares regression line is

superimposed on the plot. Which of the following points has the largest residual?

(A) A

(B) B

(C) C

(D) D

(E) E

Page 27: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

27

21. Consider n pairs of numbers1 1( , )x y ,

2 2( , )x y , …, ( , )n nx y . The mean and standard

deviation of the x-values are 5x and 4xs , respectively. The mean and standard

deviation of the y-values are 10y and 10ys , respectively. Of the following, which

could be the least squares regression line?

(A) ˆ 5.0 3.0y x

(B) ˆ 3.0y x

(C) ˆ 5.0 2.5y x

(D) ˆ 8.5 0.3y x

(E) ˆ 10.0 0.4y x

22. Researchers studying growth patterns of children collect data on the heights of fathers

and sons. The correlation between the fathers’ heights and the heights of their 16 year-old

sons is most likely to be

(A) -1.0

(B) near zero

(C) near 0.7

(D) exactly +1.0

(E) somewhat greater than 1.0

23. The auto insurance industry crashed some test vehicles into a cement barrier at speeds

of 5 to 25 mph to investigate the amount of damage to the cars. They found a correlation

of 0.60r between speed (MPH) and damage ($). If the speed at which a car hits the

barrier is 1.5 standard deviations above the mean speed, we expect the damage to be how

much of the mean damage.

(A) equal to

(B) 0.36 SD above

(C) 0.60 SD above

(D) 0.90 SD above

(E) 1.5 SD above

Page 28: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

28

24. The correlation between X and Y is 0.35r . If we double each X value and decrease

each Y by 0.20, and exchange the variables (put X on the Y-axis and vice versa), the new

correlation

(A) is 0.35

(B) is 0.50

(C) is 0.70

(D) is 0.90

(E) can not be determined

25. The correlation between a family’s weekly income and the amount they spend on

restaurant means is found to be 0.30r . Which must be true?

I. Families tend to spend about 30% of their incomes in restaurants

II. In general, the higher the income, the more the family spends in restaurants.

III. The line of best fit passes through 30% of (income, restaurant$) data points

(A) I only

(B) II only

(C) III only

(D) II and III only

(E) I, II and III

26. Education research consistently shows that students from wealthier families tend to

have higher SAT scores. The slope of the line that predicts SAT score from family

income is 6.25 points per $1000, and the correlation between the variables is 0.48. Then

the slope of the line that predicts family income from SAT scores (in $1000 per point) is

(A) 0.037

(B) 0.16

(C) 3.00

(D) 6.25

(E) 13.02

Page 29: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

29

27. A regression analysis of company profits and amount of money the company spent on

advertising found 2 0.72r . Which of these is true?

I. The model can correctly predict the profit for 72% of companies

II. On average, about 72% of the company’s profits results from advertising.

III. On average, companies spend about 72% of their profits on advertising

(A) none

(B) I only

(C) II only

(D) III only

(E) I and III

28. A least squares line of regression has been fitted to a scatterplot; the model’s residuals

plot is shown below. Which of the following statements is true?

(A)The linear model is appropriate.

(B) The linear model is poor because some residuals are large.

(C) The linear model is poor because the correlation is near zero.

(D) A curved model would be better.

(E) None of the above.

29. The correlation between two scores X and Y equals to 0.8. If both the X scores and

the Y scores are converted to z-scores, then the correlation between the z-scores for X

and the z-scores for Y would be

(A) -0.8

(B) -0.2

(C) 0.0

(D) 0.2

(E) 0.8

Page 30: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

30

30. A least squares regression line was fitted to the weights (in pounds) versus age (in

months) of a group of many young children. The equation of the line is ˆ 16.6 0.65y t

where y is the predicted weight and t is the age of the child. A 20-month-old child in this

group has an actual weight of 25 pounds. Which of the following is the residual weight,

in pounds, for this child?

(A) -7.85

(B) -4.60

(C) 4.60

(D) 5.0

(E) 7.85

31. A college’s job placement office collected data about students’ GPAs and the salaries

they earned in their first jobs after graduation. The mean GPA was 2.9 with standard

deviation of 0.4. Starting salaries had a mean of $47200 with a SD of $8500. The

correlation between the two variables was 0.72r . The association appeared to be linear

in scatterplot.

a.). Write an equation of the model that can predict salary based on GPA

b.) Do you think these predictions will be reliable? Explain.

c.) Your brother just graduated from that college with a GPA of 3.30. He tells you that

based this model the residual for his pay is -$1880. What salary is he earning?

Page 31: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

31

32. [2011APStatsFRQs, #5] Windmill generate electricity by transferring energy from

wind to a turbine. A study was conducted to examine the relationship between wind

velocity in miles per hour (mph) and electricity production in amperes for one particular

windmill. For the windmill, measurements were taken on twenty-five randomly selected

days, and the computer output for the regression analysis for predicting electricity

production based on wind velocity is given below. The regression model assumptions

were checked and determined to be reasonable over the interval of wind speeds

represented in the data, which were from 10 miles per hour to 40 miles per hour.

Predictor Coef SE Coef T P

Constant 0.137 0.126 0.109 0.289

Wind velocity 0.240 0.019 12.63 0

S=0.237 R-Sq=0.873 R-Sq(adj)=0.868

(a) Use the computer output above to determine the equation of the least squares

regression line. Identify all variable used in the equation.

(b) How much more electricity would be the windmill be expected to produce on a day

when the wind velocity 25 mph than on a day when the wind velocity is 15 mph? Show

how you arrived at your answer.

(c) What proportion of the variation in electricity production is explained by its linear

relationship with wind velocity?

33. [CollegeBoardAPStatsPracticeProblem] Exercise physiologists are investigating the

relationship between lean body mass (in kilograms) and the resting metabolic rate (in

calories per day) in sedentary males.

Based on the computer output above, which of the following is the best interpretation of

the value of the slope of the regression line?

Page 32: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

32

(A) For each additional kilogram of lean body mass, the resting metabolic rate increases

on average by 22.563 calories per day.

(B) For each additional kilogram of lean body mass, the resting metabolic rate increases

on average by 264.0 calories per day.

(C) For each additional kilogram of lean body mass, the resting metabolic rate increases

on average by 144.9 calories per day.

(D) For each additional calorie per day for the resting metabolic rate, the lean body mass

increases on average by 22.563 kilograms.

(E) For each additional calorie per day for the resting metabolic rate, the lean body mass

increases on average by 264.0 kilograms.

Page 33: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

33

Answers:

1. B. ),( yx is on the line. 0)(ˆ10 xbbyyye .

2. C. D may not be true.

3. A.

y

i

x

i

s

yy

s

xx

nr

1

1.

4. D. bxaybxay 10,log .

5. D. 93.1)log(93.1)log(93.17.0 )(01.51001.510 watergrowth waterwater

6. C.

7. B. 21.0))5(7.129.2(6

8. A. 4788.05.2

9.163.01

x

y

s

srb

9. D. (A) position is not quantified variable, (B) correlation does not units, (C)

correlation is no greater than one, (E) gender and political party is not quantified

variables.

10. C.

11. A. The sign of the correlation can not be determined.

12. B.

13. A.

14. B.

15. E.

16. D. xxy )24.0(87.223)10(10ˆ 62.035.2 .

17. C. 8074.0ˆ,8074.0ˆ yyyye .

18. C. 9.125810ˆ,1.3ˆlog 1.3 yy .

19. E.

20. A.

21. D. )10,5(),( yx is on the line.

22. C.

23. D. 9.0)5.1(6.06.0ˆ xxd zrzz

24. A.

25. B. )($)( 10 incomebbrestaurant . I and III do not make sense.

26. A. 1000/25.6income

satsat

s

srb ,

0368.0

1000/25.6

1)48.0(

1 22

income

satsat

incomeincome

s

sr

rs

srb .

Page 34: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

34

27. A. aap zrzz 84.0ˆ

28. A.

29. E. 8.0xyr

30. B. 6.46.2925ˆ,6.29)20(65.06.16ˆ,25 iiii yyeyy .

31. 2830)9.2(1530047200,153004.0

850072.0 101

xbyb

s

srb

x

y.

a.) 2830 15300( )Salary GPA

b.) 2 52%R , about 52% of variation can be explained, somewhat reliable.

c.) 2830 15300(3.30) 53320,Salary

1880 53320 $51440b b b be y y y y

32.

(a) Production=0.137+0.24(Velocity)

(b) Diff (Prodction)=0.24(25-15)=2.4 A

(c) 87.3%

33.) A. calories=264+22.53(mass).

Page 35: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

35

**6.6. Testing different models

X

Y

X

Y

a.) Line Model: 0 1y b b x or y a bx

For Ti-84,

Step A: Input the data into L1 and L2: STAT -> EDIT -> L1, L2

Step B: Turn on the StatPlot #1: 2nd

-> StatPlot -> #1 -> On -> 1st graph

Step C: Display the data: Zoom -> ZoomStat

Step D: Regression Line: STAT -> Calc -> 8. LinReg(a+bx) L1, L2,

1Y

To get 1Y : VARS -> Y-VARS ->1. Function -> 1. 1Y

Step E: Equation: _______________________________

Step F: R and 2R : __________R and 2 __________R

Step G: Sketch the data and equation:

Page 36: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

36

Step H: Residual Plot:

1.) Turn off the StatPlot #1 and turn on StatPlot#2.

2.) In StatPlot#2, change L2 to RESID.

To find RESID in Ti-84: 2nd

-> LIST -> #7: RESID

3.) Zoom -> ZoomStat

Sketch the residual plot

b.) Log Model: 0 1 log( )y b b x or 0 1 Ln( )y b b x or Ln( )y a b x

For Ti-84,

Step A: Input the data into L1 and L2: STAT -> EDIT -> L1, L2

Step B: Turn on the StatPlot #1: 2nd

-> StatPlot -> #1 -> On -> 1st graph

Page 37: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

37

Step C: Display the data: Zoom -> ZoomStat

Step D: Log Model: STAT -> Calc -> 9, LnReg L1, L2, 1Y

To get 1Y : VARS -> Y-VARS ->1. Function -> 1.

1Y

Step E: Equation: _______________________________

Step F: R and 2R : __________R and 2 __________R

Step G: Sketch the data and equation:

Step H: Residual Plot:

1.) Turn off the StatPlot #1 and turn on StatPlot#2.

2.) In StatPlot#2, change L2 to RESID.

To find RESID in Ti-84: 2nd

-> LIST -> #7: RESID

3.) Zoom -> ZoomStat

Sketch the residual plot

Page 38: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

38

c.) Exponential Model: 0 1( )Ln y b b x , or xy ab

For Ti-84,

Step A: Input the data into L1 and L2: STAT -> EDIT -> L1, L2

Step B: Turn on the StatPlot #1: 2nd

-> StatPlot -> #1 -> On -> 1st graph

Step C: Display the data: Zoom -> ZoomStat

Step F: Exponential Model: STAT -> Calc -> 0, ExpReg L1, L2, 1Y

To get 1Y : VARS -> Y-VARS ->1. Function -> 1. 1Y

Step E: Equation: _______________________________

Step F: R and 2R : __________R and 2 __________R

Step G: Sketch the data and equation:

Page 39: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

39

Step H: Residual Plot:

1.) Turn off the StatPlot #1 and turn on StatPlot#2.

2.) In StatPlot#2, change L2 to RESID.

To find RESID in Ti-84: 2nd

-> LIST -> #7: RESID

3.) Zoom -> ZoomStat

Sketch the residual plot

d.) Power Model: 0 1( )Ln y b b x , or by ax

For Ti-84,

Step A: Input the data into L1 and L2: STAT -> EDIT -> L1, L2

Step B: Turn on the StatPlot #1: 2nd

-> StatPlot -> #1 -> On -> 1st graph

Step C: Display the data: Zoom -> ZoomStat

Step D: Power Model: STAT -> Calc -> A, PwrReg L1, L2,

1Y

To get 1Y : VARS -> Y-VARS ->1. Function -> 1. 1Y

Step E: Equation: _______________________________

Page 40: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

40

Step F: R and 2R : __________R and 2 __________R

Step G: Sketch the data and equation:

Step H: Residual Plot:

1.) Turn off the StatPlot #1 and turn on StatPlot#2.

2.) In StatPlot#2, change L2 to RESID.

To find RESID in Ti-84: 2nd

-> LIST -> #7: RESID

3.) Zoom -> ZoomStat

Sketch the residual plot

Conclusions: ___________________________________________________

e.) Regression models in R script:

Page 41: Regression Everything is naturally associated. - Mathacle's …mathacle.com/MathPSet/Stats/Mathacle_Pset_Stats... · 2017-07-09 · Does the plot seem “linearly correlated?” 4.2

Mathacle

PSet ----- Stats, Regression Analysis

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

41