regression analysis in the outcome that you will commence second week back, you might be given data...
TRANSCRIPT
![Page 1: REGRESSION ANALYSIS In the OUTCOME that you will commence second week back, you might be given data and asked to perform a REGRESSION ANALYSIS YOU NEED](https://reader035.vdocuments.us/reader035/viewer/2022070412/5697bf8d1a28abf838c8c65e/html5/thumbnails/1.jpg)
REGRESSION ANALYSISIn the OUTCOME that you will commence
second week back, you might be given data and asked to perform a REGRESSION
ANALYSIS
YOU NEED TO KNOW WHAT THIS MEANS
![Page 2: REGRESSION ANALYSIS In the OUTCOME that you will commence second week back, you might be given data and asked to perform a REGRESSION ANALYSIS YOU NEED](https://reader035.vdocuments.us/reader035/viewer/2022070412/5697bf8d1a28abf838c8c65e/html5/thumbnails/2.jpg)
REGRESSION ANALYSIS is the process of fitting a linear model to a data set.
The aim is to determine the best linear model possible and to use
it to make predictions.
![Page 3: REGRESSION ANALYSIS In the OUTCOME that you will commence second week back, you might be given data and asked to perform a REGRESSION ANALYSIS YOU NEED](https://reader035.vdocuments.us/reader035/viewer/2022070412/5697bf8d1a28abf838c8c65e/html5/thumbnails/3.jpg)
What do we mean by the “best possible linear model”?
The best possible linear model is the one in which:
a. The data is linear or has been linearized by a data transformation:
![Page 4: REGRESSION ANALYSIS In the OUTCOME that you will commence second week back, you might be given data and asked to perform a REGRESSION ANALYSIS YOU NEED](https://reader035.vdocuments.us/reader035/viewer/2022070412/5697bf8d1a28abf838c8c65e/html5/thumbnails/4.jpg)
and we also wantb. the linear model which has the greatest possible value of r2
REMEMBER: the value of the coefficient of determination measures the predictive
power of our regression model.
R2
PREDICTIVE POWER
![Page 5: REGRESSION ANALYSIS In the OUTCOME that you will commence second week back, you might be given data and asked to perform a REGRESSION ANALYSIS YOU NEED](https://reader035.vdocuments.us/reader035/viewer/2022070412/5697bf8d1a28abf838c8c65e/html5/thumbnails/5.jpg)
If r2 > 30%, then our model will have
Predictive power
![Page 6: REGRESSION ANALYSIS In the OUTCOME that you will commence second week back, you might be given data and asked to perform a REGRESSION ANALYSIS YOU NEED](https://reader035.vdocuments.us/reader035/viewer/2022070412/5697bf8d1a28abf838c8c65e/html5/thumbnails/6.jpg)
STEP 1:Construct a scatterplot of the RAW (Original ) Data and note:a. Its shapeb. The value of the coefficient of determination
We are predicting LIFE EXPECTANCY from GDP, so:
FIRST: We must decide
Which is the INDEPENDENT (x) VARIABLE:
Which is the DEPENDENT (y ) VARIABLE
GDPY LIFE EXPECTANCY
X
![Page 7: REGRESSION ANALYSIS In the OUTCOME that you will commence second week back, you might be given data and asked to perform a REGRESSION ANALYSIS YOU NEED](https://reader035.vdocuments.us/reader035/viewer/2022070412/5697bf8d1a28abf838c8c65e/html5/thumbnails/7.jpg)
GDP lifeex
950 58
1670 65
4250 68
11520 74
12280 73
4170 73
14300 75
5540 71
9830 72
1680 61
320 67
22260 66
550 50
930 66
940 64
2670 72
11220 74
1420 48
150 41
330 44
520 44
940 49
350 48
180 48
List A = gdp List B = leLife expectancy
CONCLUSION: Data is NON-LINEAR
![Page 8: REGRESSION ANALYSIS In the OUTCOME that you will commence second week back, you might be given data and asked to perform a REGRESSION ANALYSIS YOU NEED](https://reader035.vdocuments.us/reader035/viewer/2022070412/5697bf8d1a28abf838c8c65e/html5/thumbnails/8.jpg)
From the Home screen determine the value of r2. Value of r2 = 0.3665.
![Page 9: REGRESSION ANALYSIS In the OUTCOME that you will commence second week back, you might be given data and asked to perform a REGRESSION ANALYSIS YOU NEED](https://reader035.vdocuments.us/reader035/viewer/2022070412/5697bf8d1a28abf838c8c65e/html5/thumbnails/9.jpg)
STEP 2: We seek a Transformation to linearize the data.
CHECK THE CIRCLE OF TRANSFORMATIONS!!
Our scatterplot most closely resembles Quadrant 2!
Quadrant 1Quadrant 2
Quadrant 3 Quadrant 4
POTENTIALLY SUITABLE TRANSFORMATIONS are:
Y2
Logx
1 x
![Page 10: REGRESSION ANALYSIS In the OUTCOME that you will commence second week back, you might be given data and asked to perform a REGRESSION ANALYSIS YOU NEED](https://reader035.vdocuments.us/reader035/viewer/2022070412/5697bf8d1a28abf838c8c65e/html5/thumbnails/10.jpg)
Try each of these transformations to determine which one effectively linearizes the data and gives the highest value for r2.
Step 3
In each case, obtain a RESIDUAL PLOT to confirm that the transformed data is linear.
![Page 11: REGRESSION ANALYSIS In the OUTCOME that you will commence second week back, you might be given data and asked to perform a REGRESSION ANALYSIS YOU NEED](https://reader035.vdocuments.us/reader035/viewer/2022070412/5697bf8d1a28abf838c8c65e/html5/thumbnails/11.jpg)
gdp le lesqu
950 58 33641670 65 42254250 68 462411520 74 547612280 73 53294170 73 532914300 75 56255540 71 50419830 72 51841680 61 3721320 67 448922260 66 4356550 50 2500930 66 4356940 64 40962670 72 518411220 74 54761420 48 2304150 41 1681330 44 1936520 44 1936940 49 2401350 48 2304180 48 2304
List A gdp ( x variable)List B le (y variable)List C lesqu (y2 transformed variable )
TRANSFORMED DATA APPEARS NON-LINEAR STILL
R2 = 38.3%
Y SQUARED TRANSFORMATION
![Page 12: REGRESSION ANALYSIS In the OUTCOME that you will commence second week back, you might be given data and asked to perform a REGRESSION ANALYSIS YOU NEED](https://reader035.vdocuments.us/reader035/viewer/2022070412/5697bf8d1a28abf838c8c65e/html5/thumbnails/12.jpg)
Establish the value of r2 in HOMESCREEN:
![Page 13: REGRESSION ANALYSIS In the OUTCOME that you will commence second week back, you might be given data and asked to perform a REGRESSION ANALYSIS YOU NEED](https://reader035.vdocuments.us/reader035/viewer/2022070412/5697bf8d1a28abf838c8c65e/html5/thumbnails/13.jpg)
CONFIRM WITH RESIDUAL PLOT
Remember: to get the correct residual plot use the split screen view. Make sure that the scatterplot at the top has the correct transformed variable.
CONCLUSION: The residual plot shows a definite curved pattern, indicating that the transformed data is still not linear. The y2 transformation has NOT succeeded in producing an effective linear model.
![Page 14: REGRESSION ANALYSIS In the OUTCOME that you will commence second week back, you might be given data and asked to perform a REGRESSION ANALYSIS YOU NEED](https://reader035.vdocuments.us/reader035/viewer/2022070412/5697bf8d1a28abf838c8c65e/html5/thumbnails/14.jpg)
NEXT STEP….
You guessed it!!
Now we try the next potential candidate transformation.
It was the log x transformation!
![Page 15: REGRESSION ANALYSIS In the OUTCOME that you will commence second week back, you might be given data and asked to perform a REGRESSION ANALYSIS YOU NEED](https://reader035.vdocuments.us/reader035/viewer/2022070412/5697bf8d1a28abf838c8c65e/html5/thumbnails/15.jpg)
GDP lifeex logGDP
950 58 2.981670 65 3.224250 68 3.63
11520 74 4.0612280 73 4.094170 73 3.62
14300 75 4.165540 71 3.749830 72 3.991680 61 3.23320 67 2.51
22260 66 4.35550 50 2.74930 66 2.97940 64 2.97
2670 72 3.4311220 74 4.051420 48 3.15150 41 2.18330 44 2.52520 44 2.72940 49 2.97350 48 2.54350 48 2.54
(Delete the y2 column, as we have discarded this transformation.)
List A= gdp List B= le List C= loggdp
R2 = 66.0%
CONCLUSION: It appears that the log(GDP) transformation has successfully linearized the data! Scatterplot appears linear, and R2 has increased.
![Page 16: REGRESSION ANALYSIS In the OUTCOME that you will commence second week back, you might be given data and asked to perform a REGRESSION ANALYSIS YOU NEED](https://reader035.vdocuments.us/reader035/viewer/2022070412/5697bf8d1a28abf838c8c65e/html5/thumbnails/16.jpg)
NOTE THE VARIABLES ARE LISTED HERE SO YOU CAN CHECK
![Page 17: REGRESSION ANALYSIS In the OUTCOME that you will commence second week back, you might be given data and asked to perform a REGRESSION ANALYSIS YOU NEED](https://reader035.vdocuments.us/reader035/viewer/2022070412/5697bf8d1a28abf838c8c65e/html5/thumbnails/17.jpg)
Now confirm this by creating a RESIDUAL PLOT for the log(x) transformation. Open a new graphing screen!!
CONCLUSION: The residual plot shows a random scattering of points with no pattern, indicating that the transformed data is linear.The value of r2 has now increased to 66.0%. The logx transformation has succeeded in producing an effective linear model for the data with significant predictive power.
![Page 18: REGRESSION ANALYSIS In the OUTCOME that you will commence second week back, you might be given data and asked to perform a REGRESSION ANALYSIS YOU NEED](https://reader035.vdocuments.us/reader035/viewer/2022070412/5697bf8d1a28abf838c8c65e/html5/thumbnails/18.jpg)
And now……
Yes you guessed it!
We need to check out the reciprocal x transformation, because …..
maybe it will give a higher coefficient of determination than the logx!
(here we go again)
![Page 19: REGRESSION ANALYSIS In the OUTCOME that you will commence second week back, you might be given data and asked to perform a REGRESSION ANALYSIS YOU NEED](https://reader035.vdocuments.us/reader035/viewer/2022070412/5697bf8d1a28abf838c8c65e/html5/thumbnails/19.jpg)
list A listB list C list Dlog(list1) 1/list1
950 58 2.98 0.001051670 65 3.22 0.000604250 68 3.63 0.00024
11520 74 4.06 0.0000912280 73 4.09 0.00008
4170 73 3.62 0.0002414300 75 4.16 0.00007
5540 71 3.74 0.000189830 72 3.99 0.000101680 61 3.23 0.00060
320 67 2.51 0.0031322260 66 4.35 0.00004
550 50 2.74 0.00182930 66 2.97 0.00108940 64 2.97 0.00106
2670 72 3.43 0.0003711220 74 4.05 0.00009
1420 48 3.15 0.00070150 41 2.18 0.00667330 44 2.52 0.00303520 44 2.72 0.00192940 49 2.97 0.00106350 48 2.54 0.00286180 48 2.26 0.00556
Don’t delete log x column because we think this model was effective!
List A = gdp List B = le List C = loggdp List D = recgdp
Life expectancy
1/GNP
R2 = 51.5%
CONCLUSION: The transformed data appears to be linear, but the value of the coefficient of determination is 51.5%, lower than for the loggdp transformation.
![Page 20: REGRESSION ANALYSIS In the OUTCOME that you will commence second week back, you might be given data and asked to perform a REGRESSION ANALYSIS YOU NEED](https://reader035.vdocuments.us/reader035/viewer/2022070412/5697bf8d1a28abf838c8c65e/html5/thumbnails/20.jpg)
Coefficient of determination
![Page 21: REGRESSION ANALYSIS In the OUTCOME that you will commence second week back, you might be given data and asked to perform a REGRESSION ANALYSIS YOU NEED](https://reader035.vdocuments.us/reader035/viewer/2022070412/5697bf8d1a28abf838c8c65e/html5/thumbnails/21.jpg)
CONCLUSION: The residual plot shows a random scattering of points with no pattern, indicating that the 1/x transformation has made the data linear.
Remember to create a new graphing screen for the new transformation!!
![Page 22: REGRESSION ANALYSIS In the OUTCOME that you will commence second week back, you might be given data and asked to perform a REGRESSION ANALYSIS YOU NEED](https://reader035.vdocuments.us/reader035/viewer/2022070412/5697bf8d1a28abf838c8c65e/html5/thumbnails/22.jpg)
OVERALL CONCLUSIONS
We have tested three transformations:
Y squared transformation: Ineffective (did not linearize the data)
Log (x ) transformation: Effective in linearizing data with r2 = 66.0%
1/x transformation: Effective in linearizing data with r2 = 51.5%
Based on this regression analysis, we conclude that the log(GDP) transformation provides the best model for making predictions from this data.
![Page 23: REGRESSION ANALYSIS In the OUTCOME that you will commence second week back, you might be given data and asked to perform a REGRESSION ANALYSIS YOU NEED](https://reader035.vdocuments.us/reader035/viewer/2022070412/5697bf8d1a28abf838c8c65e/html5/thumbnails/23.jpg)
MAKING A PREDICTIONUse your linear regression model to predict the Life Expectancy in a country where the GNP is $8000gnp le List3
loggnp950 58 2.98
1670 65 3.224250 68 3.63
11520 74 4.0612280 73 4.094170 73 3.62
14300 75 4.165540 71 3.749830 72 3.991680 61 3.23320 67 2.51
22260 66 4.35550 50 2.74930 66 2.97940 64 2.97
2670 72 3.4311220 74 4.051420 48 3.15150 41 2.18330 44 2.52520 44 2.72940 49 2.97350 48 2.54180 48 2.26
Find the equation of the LEAST SQUARES REGRESSION line for the Log transformation
Regression(a+bx) Xlist = log(GNP)Ylist=le
a = 14.3b = 14.5
Life Expectancy = 14.3 + 14.5 log(GNP)
Life Expectancy = 14.3 + 14.5 × log(8000)
= 70.9
Predicted Life Expectancy = 70.9 years