9 correlation regression
Post on 21-May-2017
283 Views
Preview:
TRANSCRIPT
1
0 2 4 6 8 10 12 14 16 18 200
10
20
30
40
50
60
Correlation and Regression
Elementary StatisticsLarson Farber
Chapter 9
Hours of Training
Accidents
Ch. 9 Larson/Farber 2
Correlation
What type of relationship exists between the two variables and is the correlation
significant?
x y
Cigarettes smoked per dayScore on SAT
Height
Hours of Training
Explanatory(Independent)
Variable
Response(Dependent)
Variable
A relationship between two variables.
Number of Accidents
Shoe Size HeightLung Capacity
Grade Point Average
IQ
Ch. 9 Larson/Farber 3
Accidents
Negative Correlationas x increases, y decreases
x = hours of trainingy = number of accidents
Scatter Plots and Types of Correlation
Ch. 9 Larson/Farber 4
Positive Correlationas x increases y increases
x = SAT scorey = GPA
GPA
Scatter Plots and Types of Correlation
Ch. 9 Larson/Farber 5
IQ
No linear correlation
x = heighty = IQ
Scatter Plots and Types of Correlation
Ch. 9 Larson/Farber 6
x
x y 8
78 2
92 5
9012
5815
43 9
74 6 81
Absences Grade
Application
0 2 4 6 8 10 12 14 16
404550556065707580859095
x
FinalGrade
Absences
Ch. 9 Larson/Farber 7
Correlation Coefficient
A measure of the strength and direction of a linear relationship between two variables
2222 )( yynxxn
yxxynr
The range of r is from -1 to 1.
If r is close to 1 there
is a strong positive
correlation
If r is close to -1 there is a strong negative correlation
If r is close to 0 there is no
linear correlation
-1 0 1
Ch. 9 Larson/Farber 8
6084846481003364184954766561
624 184450696 645666486
Computation of r
57 516 3751 579 39898
x y 1 8 78 2 2 92 3 5 90 4 12 58 5 15 43 6 9 74 7 6 81
= - 0.975
2222 )( yynxxn
yxxynr
130308043155
r
64 4 25144225 81 36
xy x2 y2
22 )516()39898(757)579(7
)516)(57()3751(7
r
Ch. 9 Larson/Farber 9
Test for the Significance of r
r is the correlation correlation for the sample. The correlation coefficient for the population is ρ (rho).
Hypothesis test for the significance of r.
Ha: r < 0 significant negative correlation (left tail) H0: r 0 No significant negative correlation
Ha: r > 0 significant positive correlation (right tail) H0: r 0 No significant positive correlation
Ha: r 0 significant correlation (two tail) H0: r = 0 No significant correlation
The sampling distribution for r is a t-distribution with n-2 degrees of freedom.
21
02
nr
rrtr
Standardized teststatistic
Ch. 9 Larson/Farber 10
Test for Significance of r
In finding the correlation between the number of times absent and a final grade, you used seven pairs of data to find r = - 0.975. Test the significance of this correlation. Use = 0.01.
Ha: r 0 significant correlation (two tail) H0: r = 0 No significant correlation
2. State the level of significance
1. Write the null and alternative hypothesis
= 0.01
3. Identify the sampling distribution
A t-distribution with 6 degrees of freedom.
Ch. 9 Larson/Farber 11
t0
4. Find the critical value
Critical Values t0
3.707-3.707
6. Find the test statistic
811.9
27975.01
975.0
21 22
nr
rt
5. Find the rejection region
Rejection Regions
Ch. 9 Larson/Farber 12
7. Make your decision
8. Interpret your decision
t0-3.707 3.707
t = -9.811 falls in the rejection region. Reject the null hypothesis.
There is a significant correlation between the number of times absent and final grades.
13
180
190
200
210
220
230
240
250
260
1.5 2.0 2.5 3.0
Ad $
(xi,yi)
)ˆ,( ii yx
di
iii yyd ˆ Called a residual
(xi,yi) = a data point
)ˆ,( ii yx = a point on the line with same x-value
2d is a minimum
revenue
Ch. 9 Larson/Farber 14
From algebra-the equation of a line may be written asy = mx + b
where m is the slope of the line and b is the y-interceptThe line of regression is: bmxy ˆ
The slope m is found by
22 )( xxnyxxynm
The y-intercept is
xmyb
The Line of Regression
Once you know there is a significant linear correlation, you can write an equation describing the relationship between the x and y variables. This equation is called the line of regression or least squares line.
Ch. 9 Larson/Farber 15
57 516 579
624 184450696 645666486
x y 1 8 78 2 2 92 3 5 90 4 12 58 5 15 43 6 9 74 7 6 81
64 4 25144225 81 36
6084846481003364184954766561
39898
xy x2 y2
3751
222 )57()579(7)516)(57()3751(7
)(
xxnyxxynm
)143.8)(924.3(714.73 xmyb
= -3.924
Calculate m and b
Write the equation f the line of regression with x = number of times absent and y = final grade.
The line of regression is: 667.105924.3ˆ xy
=105.667
Ch. 9 Larson/Farber 16
0 2 4 6 8 10 12 14 16
404550556065707580859095
xAbsences
FinalGrade
Line of Regressionm = -3.924 and b = 105.667
The line of regression is: 667.105924.3ˆ xy
Note that the point (8.143, 73.714) is on the line
Ch. 9 Larson/Farber 17
Predicting Values
The regression line can be used to predict values of y for values of x within the range of the data.
The regression equation for number of times absent and final grade is:
667.105924.3ˆ xy
Use this equation to predict the expected grade for a student with(a) 3 absences(b) 12 absences
(a) 895.93667.105)3(924.3ˆ y
579.58667.105)12(924.3ˆ y(b)
Ch. 9 Larson/Farber 18
The Coefficient of Determination
The coefficient of determination, r2 is the ratio of the explained variation to the total variation.
variationTotalvariationExplained2 r
The correlation coefficient of number of times absent and final grade is r = - 0.975. Then the coefficient of determination is (-0.975)2 = 0.9506.
Interpretation: About 95% of the variation in final grades can be explained by the number of times a student is absent. The other 5% is unexplained and can be due to sampling error or other variables such as intelligence, amount of time studied etc.
Ch. 9 Larson/Farber 19
1 8 78 74.275 13.8756 2 2 92 97.819 33.8608 3 5 90 86.047 15.6262 4 12 58 58.579 0.3352 5 15 43 46.807 14.4932 6 9 74 70.351 13.3152 7 6 81 82.123 1.2611
The Standard Error of Estimate
5767.92
es92.767
= 4.307
y
The Standard Error of Estimate se is the standard deviation of the observed yi values about the predicted value.
2)ˆ( 2
n
yys iie
y
2)ˆ( yy x y
Ch. 9 Larson/Farber 20
Prediction Intervals
Given a specific linear regression equation and x0 a specific value of x, a c-prediction interval for y is:
EyyEy ˆˆ
22
20
)()(11xxn
xxnn
stE ec
where
Use a t-distribution with n-2 degrees of freedom.
The point estimate is and E is the maximum error of estimate.
y
21
Application
Construct a 90% confidence interval for a final grade when a student has been absent 6 times.
1. Find the point estimate:
123.82667.105)6(924.3667.105924.3ˆ
xy
The point (6, 82.123) is the point on the regression line with x-coordinate of 6.
22
Application
Construct a 90% confidence interval for a final grade when a student has been absent 6 times.
2. Find E
438.918273.1)307.4(015.2
)57()579(7)14.86(7
711)307.4(015.2
)()(11
2
2
22
20
xxn
xxnn
stE ec
At the 90% level of confidence, the maximum error of estimate is 9.438
23
ApplicationConstruct a 90% confidence interval for a final grade when a student has been absent 6 times.
561.91685.72 y
When x = 6, the 90% confidence interval is from 72.685 to 91.586
3. Find the endpoints
685.72438.9123.82ˆ Ey
561.91438.9123.82ˆ Ey
top related