Download - Correlation & Regression_
1. Scatterplot
2. Simple (Linear) Regression
3. Simple Non-Linear RegressionData: Country database website
– information on national characteristics of 160 countries
– measures of quality of life of the population (e.g. life expectancy and infant mortality)
– measures of wealth (e.g. GNP)
Simple Regression/Correlation
Research Question:
Is there a relationship between population growth rate and level of urbanization (measured as the percent of population living in urban areas)?
A question of
“whether or not there is,”
as well as
“how much.”
Scatterplot & Linear Simple Regression
T-tests and chi-square tests
Correlation and regression analysis
1. Scatterplot: Graphs/Scatter
Simple
popgrow
1. Scatterplot: Graphs/Scatter
urb
1. Scatterplot: Graphs/Scatter
URB
120100806040200
PO
PG
RO
W8
6
4
2
0
-2
Each dotrepresentsa case/country
Chart/Options
To make changes to chart, double click chart in output window. Chart Editor will appear.
Chart/Options: the least square line
Chart/Axis
1. Scatterplot: Graphs/Scatter
URB
100806040200
GR
OW
TH
6
4
2
0
-2
-4
File/Print,
Export Chart to export as a Graphics Figure (e.g. .jpg) to Word,
Copy/Paste as picture to Word
The leastsquare line
1. Scatterplot: Graphs/Scatter
Identify cases using the Point ID tool in the Chart Editor window. Select the button in the menu, and use the pointer to select the point in the upper right corner with high urbanization and high growth rate. (Case number 46: United Arab Emirates.)
Format/Color or Format/Marker
While in the Chart Editor, click on a feature (e.g. the least square line or the dots), you can change the color (on the button menu above) or right click and select “properties window” to change the line or marker/dot types.
2. Simple Linear RegressionAnalyze/Regression/Linear...
2. Simple Linear Regression
Dependent: popgrow
Independent(s): urb
Model Summary
.249a .062 .057 1.1537Model1
R R SquareAdjustedR Square
Std. Error ofthe Estimate
Predictors: (Constant), URBa.
Regression Results
r2 = 0.06
Variables Entered/Removedb
URBa . EnterModel1
VariablesEntered
VariablesRemoved Method
All requested variables entered.a.
Dependent Variable: POPGROWb.
About 6% of the variation in popgrow can be explained by variation in urb.
Coefficientsa
2.511 .220 11.389 .000
-1.25E-02 .004 -.249 -3.357 .001
(Constant)
URB
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: POPGROWa.
ANOVAb
15.002 1 15.002 11.271 .001a
226.272 170 1.331
241.274 171
Regression
Residual
Total
Model1
Sum ofSquares df Mean Square F Sig.
Predictors: (Constant), URBa.
Dependent Variable: POPGROWb.
Regression Results
< 0.05
Reject H0: No linear relationship between growth and urbanization
Confidence interval on the slope for URB, b:[-0.0125 + 1.96 * 0.004] = [-.01984 ≤ b ≤ -.00416 ] --> does not contain 0There is a significant linear relationship. (reject H0.)
> 1.96
There is a weak negative relationship between POPGROW and URB.
Only 6% of the variation in growth rate is explained by variation in the level of urbanization (r2=0.062)
The Linear Model (refer to B in coefficients):
POPGROW = 2.511 - 0.0125 * URB
The predicted growth rate for the US (URB=77.2%) is 1.58% (2.511- 0.0125*77.2), in comparison to the actual rate of 1%.
Analysis of Regression Results
Research Question:
Perhaps the average number of live births per female (FERTIL) will also explain variation in growth rate?
1. Scatterplot & 2. Linear Simple Regression
popgrow
1. Scatterplot: Graphs/Scatter
fertil
1. Scatterplot: Graphs/Scatter
FERTIL
987654321
PO
PG
RO
W7
6
5
4
3
2
1
0
-1
2. Simple Linear Regression
Dependent: popgrow
Independent(s): fertil
Model Summary
.740a .547 .544 .7892Model1
R R SquareAdjustedR Square
Std. Error ofthe Estimate
Predictors: (Constant), FERTILa.
Regression Results
r2 = 0.547
Variables Entered/Removedb
FERTILa . EnterModel1
VariablesEntered
VariablesRemoved Method
All requested variables entered.a.
Dependent Variable: POPGROWb.
About 55% of the variation in popgrow can be explained by variation in fertil.
Coefficientsa
.245 .132 1.851 .066
.469 .033 .740 14.027 .000
(Constant)
FERTIL
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: POPGROWa.
ANOVAb
122.537 1 122.537 196.759 .000a
101.513 163 .623
224.051 164
Regression
Residual
Total
Model1
Sum ofSquares df Mean Square F Sig.
Predictors: (Constant), FERTILa.
Dependent Variable: POPGROWb.
Regression Results
< 0.05
Reject H0: No linear relationship between growth and fertility
Confidence interval on the slope for FERTIL, b:[0.469 + 1.96 * 0.033] = [0.404 < b < 0.533] --> not containing 0There is a significant linear relationship. (reject H0.)
t > 1.96
There is a moderate positive relationship between POPGROW and FERTIL.
About 55% of the variation in growth rate is explained by variation in fertility rate (r2=0.547)
The Linear Model (refer to B in coefficients):POPGROW = 0.245 + 0.469 * FERTIL
T-scores and the significance levels indicates the constant and coefficient ON FERTIL are significantly different from 0.
However, the relationship does not look linear on the scatterplot.
Analysis of Regression Results
Test alternative regression models for the relationship between popgrow and fertil using the Analyze/Regression/Curve Estimation package.
3. Simple Non-Linear Regression
3. Simple Non-linear Regression
Dependent: popgrow
Independent(s): fertil
Linear, Quadratic, Logarithmic
Analysis Results
LINEAR: popgrow = b0 + b1 * fertilpopgrow = .245 + .469 * fertil r2 = .547
LOGARITHMIC: popgrow = b0 + b1 * log(fertil)
popgrow = .045 + 1.67 * log(fertil) r2 = .635 QUADRATIC: popgrow = b0 + b1 * fertil + b2 * fertil2
popgrow = -1.337 + 1.508 * fertil - .132 * fertil2 r2 = .655
About 65% of the variation in growth rates can be explained by variation in fertility using logarithmic or quadratic as compared to only 55% using the linear model.
Independent: FERTIL Dependent Mth Rsq d.f. F Sigf b0 b1 b2 POPGROW LIN .547 163 196.76 .000 .2447 .4686 POPGROW LOG .635 163 284.08 .000 .0445 1.6661 POPGROW QUA .655 162 154.03 .000 -1.3371 1.5078 -.1315
POPGROW
FERTIL
987654321
7
6
5
4
3
2
1
0
-1
Observed
Linear
Logarithmic
Quadratic
Analysis Results
Both the logarithmic and quadratic curves seem to better represent the relationship between growth rate and fertility: growth rates increase at a decreasing rate with fertility.