Download - Optical illusion ? Correlation ( r or R or )
![Page 1: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/1.jpg)
Optical illusion ?
Correlation ( r or R or )-- One-number summary of the strength of a relationship-- How to recognize-- How to compute
Regressions-- Any model has predicted values and residuals.
(Do we always want a model with small residuals ? )-- Regression lines
--- how to use--- how to compute
-- The “regression effect”(Why did Galton call these things “regressions” ? )
-- Pitfalls: Outliers-- Pitfalls: Extrapolation-- Conditions for a good regression
![Page 2: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/2.jpg)
Which looks like a stronger relationship?
-1
1
3
-0.43 0.32 1.07 1.82
X
Y
-6
-4
-2
0
2
4
6
8
-4 -3 -2 -1 0 1 2 3 4 5
X
Y
![Page 3: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/3.jpg)
Mortality vs. Education
9
9.5
10
10.5
11
11.5
12
12.5
13
800 850 900 950 1000 1050 1100 1150
Education
Mortality vs. Education
9
9.5
10
10.5
11
11.5
12
12.5
13
800 850 900 950 1000 1050 1100 1150
Education
Optical Illusion ?
![Page 4: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/4.jpg)
Kinds of Association…
Positive vs. Negative
Strong vs. Weak
Linear vs. Non-linear
![Page 5: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/5.jpg)
CORRELATION
CORRELATION(or, the CORRELATION COEFFICIENT)measures the strength of a linear
relationship.
If the relationship is non-linear, it measures the strength of the linear part of the relationship. But then it doesn’t tell the whole story.
Correlation can be positive or negative.
![Page 6: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/6.jpg)
-1
0
1
2
-1 0 1 2
X
Y
correlation = .97
-1
0
1
2
-2 0 2
X
Y
correlation = .71
![Page 7: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/7.jpg)
correlation = –.97
correlation = –.71
-1
0
1
2
-1 0 1 2
X
Y
-1
0
1
2
-2 0 2
X
Y
![Page 8: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/8.jpg)
-1
0
1
2
-1 0 1 2
X
Y
correlation = .97
correlation = .97
-1
0
1
2
-1 0 1 2
X
Y
![Page 9: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/9.jpg)
-0.75
0.00
0.75
1.50
-2 -1 0 1
X
Y
correlation = .24
-1
0
1
2
-1 0 1 2
X
Y
correlation = .90
![Page 10: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/10.jpg)
-0.75
0.00
0.75
1.50
-1.25 0.00 1.25
X
Y
correlation = .50
-0.75
0.00
0.75
1.50
-2 -1 0 1
X
Y
correlation = 0
![Page 11: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/11.jpg)
Computing correlation…
1. Replace each variable with its standardized version.
2. Take an “average” of ( xi’ times yi’ ):
' ( ) /
' ( ) /i i x
i i y
x x x s
y y y s
(use n-1 if you used n-1 to standardize)' '
i ix yr
n
![Page 12: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/12.jpg)
Computing correlation
' 'i ix yr
n
r, or R, or greek
(rho)
n-1 or n ?
sum of all
the products
![Page 13: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/13.jpg)
Good things about correlation
It’s symmetric ( correlation of x and y means same as correlation of y and x )
It doesn’t depend on scale or units— adding or multiplying either variable by
a constant doesn’t change r — of course not; r depend only on the
standardized versions
r is always in the range from -1 to +1 +1 means perfect positive correlation; dots on line -1 means perfect negative correlation; dots on line 0 means no relationship, OR no linear relationship
![Page 14: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/14.jpg)
Bad things about correlation
Sensitive to outliers
Misses non-linear relationships
Doesn’t imply causality
![Page 15: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/15.jpg)
Made-up Examples
PERCENT TAKING SAT
STATE AVE
SCORE
![Page 16: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/16.jpg)
Made-up Examples
SHOE SIZE
IQ
![Page 17: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/17.jpg)
Made-up Examples
BAKING TEMP
JUD
GE
’S IM
PR
ES
SIO
N
250 350 450
![Page 18: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/18.jpg)
Made-up Examples
GDP PER CAPITA
LIFE
EX
PE
CT
AN
CY
![Page 19: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/19.jpg)
Observed Values, Predictions, and Residuals
explanatory variable
resp. var.
![Page 20: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/20.jpg)
Observed Values, Predictions, and Residuals
explanatory variable
resp. var.
![Page 21: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/21.jpg)
Observed Values, Predictions, and Residuals
explanatory variable
resp. var.
![Page 22: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/22.jpg)
Observed Values, Predictions, and Residuals
explanatory variable
resp. var.
Observed value
Predicted value
Residual = observed – predicted
![Page 23: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/23.jpg)
Linear models and non-linear models
Model A: Model B:
y = a + bx + error y = a x1/2 + error
Model B has smaller errors. Is it a better model?
![Page 24: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/24.jpg)
aa opas asl poasie ;aaslkf 4-9043578
y = 453209)_(*_n &*^(*LKH l;j;)(*&)(*& + error
This model has even smaller errors. In fact, zero errors.
Tradeoff: Small errors vs. complexity.
(We’ll only consider linear models.)
![Page 25: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/25.jpg)
JPM (vertical axis) vs. DJI (horizontal axis) daily changes
-10.0000
-8.0000
-6.0000
-4.0000
-2.0000
0.0000
2.0000
4.0000
6.0000
8.0000
10.0000
-6 -4 -2 0 2 4 6
DJI
JPM
![Page 26: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/26.jpg)
JPM (vertical axis) vs. DJI (horizontal axis) daily changes
-10.0000
-8.0000
-6.0000
-4.0000
-2.0000
0.0000
2.0000
4.0000
6.0000
8.0000
10.0000
-6 -4 -2 0 2 4 6
DJI
JPM
![Page 27: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/27.jpg)
About Lines
y = mx + b
b
slope = m
![Page 28: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/28.jpg)
About Lines
y = mx + b
y intercept
slope b
slope = m
![Page 29: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/29.jpg)
About Lines
y = mx + b
b
slope = m
![Page 30: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/30.jpg)
About Lines
y = mx + b
y = b + mx
b
slope = m
![Page 31: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/31.jpg)
About Lines
y = mx + b
y = b + mx
y = + xy = 0 + 1x
![Page 32: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/32.jpg)
About Lines
y = mx + b
y = b + mx
y = + xy = 0 + 1x
y = b0 + b1x
![Page 33: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/33.jpg)
About Lines
y = mx + b
y = b + mx
y = + xy = 0 + 1x
y = b0 + b1x
y intercept slope
b0
slope = b 1
![Page 34: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/34.jpg)
About Lines
y = mx + b
y = b + mx
y = + xy = 0 + 1x
y = b0 + b1x
y intercept slope
b0
slope = b 1
![Page 35: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/35.jpg)
Computing the best-fit line
In STANDARDIZED scatterplot:-- goes through origin-- slope is r
In ORIGINAL scatterplot:-- goes through “point of means”-- slope is r × Y x
![Page 36: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/36.jpg)
6
8
10
4 6 8 10 12
x1
y1
4.5
6.0
7.5
9.0
4 6 8 10 12
x2
y2
6
8
10
12
4 6 8 10 12
x3
y3
6
8
10
12
10.0 12.5 15.0 17.5
x4
y4
![Page 37: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/37.jpg)
The “Regression” Effect
A preschool program attempts to boost children’s reading scores.
Children are given a pre-test and a post-test.
Pre-test: mean score ≈ 100, SD ≈ 10Post-test: mean score ≈ 100, SD ≈ 10
The program seems to have no effect.
![Page 38: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/38.jpg)
A closer look at the data shows a surprising result:
Children who were below average on the pre-test tended to gain about 5-10 points on the post-test
Children who were above average on the pre-test tended to lose about 5-10 points on the post-test.
![Page 39: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/39.jpg)
A closer look at the data shows a surprising result:
Children who were below average on the pre-test tended to gain about 5-10 points on the post-test
Children who were above average on the pre-test tended to lose about 5-10 points on the post-test.
Maybe we should provide the program only for children whose pre-test scores are below average?
![Page 40: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/40.jpg)
Fact:In most test–retest and analogous situations, the
bottom group on the first test will on average tend to improve, while the top group on the first test will on average tend to do worse.
Other examples:• Students who score high on the midterm tend on
average to score high on the final, but not as high.
• An athlete who has a good rookie year tends to slump in his or her second year. (“Sophomore jinx”, "Sports Illustrated Jinx")
• Tall fathers tend to have sons who are tall, but not as tall. (Galton’s original example!)
![Page 41: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/41.jpg)
80
90
100
110
120
130
80 90 100 110 120
pre-test
post-test
![Page 42: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/42.jpg)
It works the other way, too:
• Students who score high on the final tend to have scored high on the midterm, but not as high.
• Tall sons tend to have fathers who are tall, but not as tall.
• Students who did well on the post-test showed improvements, on average, of 5-10 points, while students who did poorly on the post-test dropped an average of 5-10 points.
![Page 43: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/43.jpg)
Students can do well on the pretest…-- because they are good readers, or-- because they get lucky.
The good readers, on average, do exactly as well on the post-test. The lucky group, on average, score lower.
Students can get unlucky, too, but fewer of that group are among the high-scorers on the pre-test.
So the top group on the pre-test, on average, tends to score a little lower on the post-test.
![Page 44: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/44.jpg)
Extrapolation
Interpolation: Using a model to estimate Yfor an X value within the range on which the model was based.
Extrapolation: Estimating based on an X value outside the range.
![Page 45: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/45.jpg)
Extrapolation
Interpolation: Using a model to estimate Yfor an X value within the range on which the model was based.
Extrapolation: Estimating based on an X value outside the range.
Interpolation Good, Extrapolation Bad.
![Page 46: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/46.jpg)
Nixon’s Graph:Economic Growth
![Page 47: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/47.jpg)
Nixon’s Graph:Economic Growth
Start ofNixon Adm.
![Page 48: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/48.jpg)
Nixon’s Graph:Economic Growth
Now
Start ofNixon Adm.
![Page 49: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/49.jpg)
Nixon’s Graph:Economic Growth
Now
Start ofNixon Adm. Projectio
n
![Page 50: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/50.jpg)
Conditions for regression
“Straight enough” condition (linearity)
Errors are mostly independent of X
Errors are mostly independent of anything else you can think of
Errors are more-or-less normally distributed
![Page 51: Optical illusion ? Correlation ( r or R or )](https://reader036.vdocuments.us/reader036/viewer/2022062517/5681301f550346895d959f50/html5/thumbnails/51.jpg)
How to test the quality of a regression—
Plot the residuals.Pattern bad, no pattern good
R2
How sure are you of the coefficients ?