unit 8 (powerpoint)

29
Unit 8 Linear Modeling

Upload: hondafanatics

Post on 26-Jun-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Unit 8 (powerpoint)

Unit 8

Linear Modeling

Page 2: Unit 8 (powerpoint)

Linear Models

• The correlation coefficient measures the strength of the linear relationship between two quantitative variables x and y.

• A linear equation describing how an dependant variable, y, is associated with an explanatory variable, x, looks like

y = a + bx

Page 3: Unit 8 (powerpoint)

Example

A college charges a basic fee of $100 a semester for a meal plan plus $2 a meal. The linear equation describing the association between the cost of the meal plan, y, and the number of meals purchased, x, is:

y = 100 + 2x

Page 4: Unit 8 (powerpoint)

Linear Equations

A linear equation takes the form

y = a + bx

b = slope

a = y-intercept

The slope measures the rate of change of y with respect to x

The y-intercept measures the initial value of y (value of y when x = 0)

Page 5: Unit 8 (powerpoint)

Linear Modeling

• Rarely does an exact linear relationship exist between two studied variables.

• The correlation coefficient and the scatter plot help us decide if there is a reasonably strong linear relationship between two studied variables.

Page 6: Unit 8 (powerpoint)

Data The table gives the age and systolic blood

pressure of 30 subjectsIndividual SBP (Y) Age (X) Individual SBP (Y) Age (X)1 144 39 16 130 482 220 47 17 135 453 138 45 18 114 174 145 47 19 116 205 162 65 20 124 196 142 46 21 136 367 170 67 22 142 508 124 42 23 120 399 158 67 24 120 2110 154 56 25 160 4411 162 64 26 158 5312 150 56 27 144 6313 140 59 28 130 2914 110 34 29 125 2515 128 42 30 175 69

Page 7: Unit 8 (powerpoint)

Approximate Positive Linear Relationship

Age

Syst

olic

Blo

od p

ress

ure

70605040302010

220

200

180

160

140

120

100

Scatterplot of Systolic Blood pressure vs Age

Page 8: Unit 8 (powerpoint)

Equation of Fitted Line SBP = 98.7 + 0.97(AGE)

y = 98.7 + 0.97 x

Age

Syst

olic

Blo

od p

ress

ure

70605040302010

220

200

180

160

140

120

100

Scatterplot of Systolic Blood pressure vs Age

Page 9: Unit 8 (powerpoint)

Interpretation of Slope

• The slope of the SBP vs Age fitted equation is 0.97

• 0.97 = rate of change of SBP with respect to age

• Every year a subject’s blood pressure rises approximately 0.97 units.

Page 10: Unit 8 (powerpoint)

Least Squares Method for Line of Best Fit

Interactive Unit D2, Basics, Basics 1

Interactive Unit D2, Basics, Practice 1

Page 11: Unit 8 (powerpoint)

Residuals

• One method for assessing how well a linear equation models the data is assessing the extent to which points differ from the line.

• A residual is the difference between an observed y value and the corresponding value of y on the fitted line (predicted y)

• Residual = Observed y - Predicted y

Page 12: Unit 8 (powerpoint)

Sum of Squares of the Residuals

• The line of best fit is the one with the smallest sum of squares of the residuals

• It is called the least squares line or sometimes the least squares regression line

• The challenge is to find the slope and y-intercept of this least squares line

Page 13: Unit 8 (powerpoint)

More Practice with Find the Least Square Line

• Interactive D2, Basics, Basics2

Page 14: Unit 8 (powerpoint)

The “Formulas”

The methods of calculus can be used to find equations for the slope and y-intercept of the least squares line. Here are the results.

2

( )( )

( )

x X y Yb

x X

a Y b X

Page 15: Unit 8 (powerpoint)

The Good News

Many computer programs including Excel and MINITAB as well as graphing calculators provide the slope and y-intercept of the least squares line

Page 16: Unit 8 (powerpoint)

Example Find the slope and y-intercept for the least

squares line describing the association between age and blood pressure suggested by this data

Individual SBP (Y) Age (X) Individual SBP (Y) Age (X)1 144 39 16 130 482 220 47 17 135 453 138 45 18 114 174 145 47 19 116 205 162 65 20 124 196 142 46 21 136 367 170 67 22 142 508 124 42 23 120 399 158 67 24 120 2110 154 56 25 160 4411 162 64 26 158 5312 150 56 27 144 6313 140 59 28 130 2914 110 34 29 125 2515 128 42 30 175 69

Page 17: Unit 8 (powerpoint)

The Line of Best Fit

• The line that best fits the data is taken to be the one with the “smallest” residuals.

• Since residuals can be both positive and negative they are squared to insure all are positive

• The squared residuals are then added to find a measure of the total amount the fitted values deviate from the observed values

Page 18: Unit 8 (powerpoint)

Least Squares Line

Y = SBP X = Age

Y = 98.7 + 0.97X

Page 19: Unit 8 (powerpoint)

Predictions

The prediction equation y = 98.7 + 0.97x

can be used to predict a person’s SBP based on their age

For a randomly selected person who is 40 years old, the least squares equation predicts a SBP of

98.7 + 0.97(40) = 137.5

Page 20: Unit 8 (powerpoint)

Making Predictions

Use the sample least squares line

y = 98.7 + 0.97x

to complete the table

Age 35 45 55 65SBP

Page 21: Unit 8 (powerpoint)

Back to Residuals

SSRes =

is a measure of the total amount of deviation from the fitted line.

It is a measure of the variability in the data that is not explained the the linear relationship with the variable x

It measures the variability due to factors other than the explanatory variable x

2( )observed predictedy y

Page 22: Unit 8 (powerpoint)

Back to Age vs SBP• SSRes = = 8393.44

• SSTotal = = 14787.47

• 56.76% of the variability in the SBP data is explained by factors other than age

• 1 - 56.76% = 43.24% of the variability in SBP can be explained by the linear relationship with age

2( )y YRe 8393.44

0.567614787.47

SS s

SSTotal

2( )observed predictedy y

Page 23: Unit 8 (powerpoint)

The value of r2

• The correlation coefficient, r, for the SBP vs Age data is 0.65757

• r2 = (0.65757)2 = 0.4324

• When r2 is converted to a percent, 43.24% it corresponds to the percent variability in SBP that is explained by age

Page 24: Unit 8 (powerpoint)

Interpretation of r2

When r2 is converted to a percent it can be interpreted as the percent of the variability in the response variable, y, that can be explained by the linear relationship with the explanatory variable, x.

Page 25: Unit 8 (powerpoint)

Find the least squares line, the values of r and r2 Interpret r2 Interpret the slope

Model Weight (pounds) City MPGBMW 318Ti 2790 23BMW Z3 2960 19Chevrolet Camaro 3545 17Chevrolet Corvette 3295 17Ford Mustang 3270 17Honda prelude 3040 22Hyundai Tiburon 2705 22Mazda Miata 2365 25Mercury Cougar 3140 20Mercedes Benz SLK 3020 22Mitsubishi Eclipse 3235 23Pontiac Firebird 3545 18Porsche Boxster 2905 19Saturn SC 2420 27Toyota Celica 2720 22

Page 26: Unit 8 (powerpoint)

Scatter Graphr = -0.816

Weight

City M

PG

35003250300027502500

28

26

24

22

20

18

16

Scatterplot of City MPG vs Weight

Page 27: Unit 8 (powerpoint)

ResidualsModel Weight City MPG Residual

BMW 318Ti 2790 23 0.69556

BMW Z3 2960 19 -2.12366

Chevrolet Camaro 3545 17 -0.06038

Chevrolet Corvette 3295 17 -1.79682

Ford Mustang 3270 17 -1.97047

Honda prelude 3040 22 1.43200

Hyundai Tiburon 2705 22 -0.89483

Mazda Miata 2365 25 -0.25640

Mercury Cougar 3140 20 0.12658

Mercedes Benz SLK 3020 22 1.29309

Mitsubishi Eclipse 3235 23 3.78643

Pontiac Firebird 3545 18 0.93962

Porsche Boxster 2905 19 -2.50568

Saturn SC 2420 27 2.12562

Toyota Celica 2720 22 -0.79065

Page 28: Unit 8 (powerpoint)

Vehicles with the Largest Positive and Negative Residuals

• Mitsubishi Eclipse got 3.876 city MPG more than expected

• Porsche Boxster got 2.506 city MPG less than expected

Page 29: Unit 8 (powerpoint)

Analysis• City MPG = 41.7 - 0.00695 Weight• Each additional pound translates into a loss of

approximately .00695 city MPG• Each additional 1000 pounds translates into a

loss of approximately 6.95 city MPG• r2 = 66.6%• 66.6% of the variability in city MPG can be

explained by the linear association with the weight of the vehicle. 33.4% of the variability in city MPG is due to factors other than the weight of the vehicle.