prediction ii assumptions and interpretive aspects
TRANSCRIPT
Prediction II
Assumptions and Interpretive Aspects
Assumptions of Regression
• Normal Distribution– Both variables should be normally distributed– For non-normal distributions we use non-parametric tests
• Continuous Variables– Variables must be measured with a interval or ratio scale– Non-parametric tests are better for the scores collected with a
nominal and ordinal scales• Linearity
– The relation between two variables should be linear• Homoscedasticity
– The variability of of actual Y values about YI must be the same for all values of X.
Linearity
• In unlinear distributions, r is lower than its real value– So, prediction is less successful
• Some characteristics in nature are curvilinearly related. • For such variables, we need to use some
advanced tecniques• For instance, the relationship between anxiety
and success is curvilinear• When anxiety is low, success is low
(motivation is low)• When anxiety is at its medium, success is
high (motivation is high and anxiety does not have a derograting effect)
• When anxiety is high, success is low (the organism is shocked)
Homoscedasticity
Interpretive AspectsFactors Influencing r
• Range of Talent– When Y, X or both are restricted the r is lower than its real value– Because, r is a byproduct of both S2
YX and S2Y
• That is S2YX / S2
Y in formula B• If we restrict the variance of Y, for instance, standart error of prediction would stay
same. So, the r would get lower– See figure 11.1 on page 195
– This is what we called ceiling and floor effect
Interpretive AspectsFactors Influencing r
• Range of Talent Item per minute
Worker Payment $ Apprerent Real
A 12 8 8B 13 11 11C 14 12 12D 15 12 12E 16 13 13F 17 15 14G 18 15 15I 19 15 15J 20 15 16K 21 15 17L 22 15 19M 23 15 20N 24 15 21
r= 0,84 0,9810 12 14 16 18 20 22 24 26
0
5
10
15
20
25
ApprerentLinear (Apprerent)RealLinear (Real)
Interpretive AspectsFactors Influencing r
• Heterogeneity of Samples– When samples are pooled, the correlation for
aggregated data depends on where the sample values lie relative to one another in both the X and Y dimensions • Let’s say professor Aktan and Göktürk prepared final
exams for two courses: Statistics and Int. Resch. Methd.
Interpretive AspectsFactors Influencing r
• Heterogeneity of Samples– Students always gets 20 points higher in Göktürk’s
exams Statistics ResearchGöktürk 65 67 72 75 64 63 80 81 74 82 67 71 88 92 75 82 r= 0,95Aktan 45 47 52 55 44 43 60 61 54 62 47 51 68 72 55 62 r= 0,95 r= 0,98
40 50 60 70 80 90 1000
10
20
30
40
50
60
70
80
90
100
Series1Linear (Series1)Series3Linear (Series3)
Interpretive AspectsFactors Influencing r
• Heterogeneity of Samples– Aktan insist on giving his own Statistics exam
40 50 60 70 80 90 1000
10
20
30
40
50
60
70
80
90
100
Series1Linear (Series1)Series3Linear (Series3)Series5Linear (Series5)
Statistics Research
Göktürk 65 67
72 75
64 63
80 81
74 82
67 71
88 92
75 82 r= 0,95
Aktan 45 67
52 75
44 63
60 81
54 82
47 71
68 92
55 82 r= 0,95
r= 0,58
Interpretive AspectsRegression Equation
• β coefficient shows the slope of the regression line.– General equation of a straight line• Y=bX + c
– Regression of Y on X
β c
Interpretive AspectsRegression Equation
• β coefficient shows the slope of the regression line.– To see that let’s use two z score distribution in
which mean is 0 and SD is 1
– Now, Zx-mean and Zy-mean becomes 0. So, c=0– Zsy/Zsx is equal to 1/1. So, B=(r1/1)Zx= rZx– As you can see, beta is equal to r in z distributions
Interpretive AspectsRegression Equation
• Now, let’s say we calculated r between statistics and research scores for students of Çağ, ODTÜ and Mersin University– For Çağ University r= .82– For Mersin University r= .62– For ODTÜ r= .35
Statistics ResearchÇağ -3 -2,46 -2 -1,64 -1 -0,82 0 0 1 0,82 2 1,64 3 2,46Mersin -3 -1,86 -2 -1,24 -1 -0,62 0 0 1 0,62 2 1,24 3 1,86ODTÜ -3 -1,05 -2 -0,7 -1 -0,35 0 0 1 0,35 2 0,7 3 1,05
-4 -3 -2 -1 0 1 2 3 4
-3
-2
-1
0
1
2
3
ÇağMersinODTÜ
Interpretive AspectsProportion of Variance in Y Associated with Variance in X
• Correlation coefficient has a special meaning– The squared correlation coefficient is equal to the
proportion of variance in Y which is explained by the variance in X• That is explained variance
– r2 = proportion of explained variance– 1- r2 = proportion of unexplained variance
– Let’s say correlation between depression and GPA is .67• So, change in depression explains 45% of change in GPA
– r= .67, so r2 = .45
Interpretive AspectsProportion of Variance in Y Associated with Variance in X
• We can see the meaning of this in the Figure below