overview
Post on 23-Feb-2016
29 Views
Preview:
DESCRIPTION
TRANSCRIPT
Overview
4.2 Introduction to Correlation
4.3 Introduction to Regression
ScatterplotsUsed to summarize the relationship between two quantitative variables that have been
measured on the same element
Graph of points (x, y) each of which represents one observation from the data set
One of the variables is measured along the horizontal axis and is called the x variable
The other variable is measured along the vertical axis and is called the y variable
Predictor Variable and Response Variable
The value of the x variable can be used to predict or estimate the value of the
y variable
The x variable is referred to as the predictor variable
The y variable is called the response variable
Scatterplot TerminologyNote the terminology in the caption to Figure
4.2.
When describing a scatterplot, always indicate the y variable first and use the term versus (vs.) or against the x variable.
This terminology reinforces the notion that the y variable depends on the x variable.
FIGURE 4.2Scatterplot of sales price versus square
footage.
Positive relationshipAs the x variable increases in value, the y variable also tends to increase.
FIGURE 4.3 (a) Scatterplot of a positive relationship
Negative relationshipAs the x variable increases in value, the y variable tends to decrease
FIGURE 4.3 (b) scatterplot of a negative relationship
No apparent relationshipAs the x variable increases in value, the y
variable tends to remain unchanged
FIGURE 4.3 (c) scatterplot of no apparent relationship.
4.2 Introduction to CorrelationObjective:By the end of this section, I will beable to…
1) Calculate and interpret the value of the correlation coefficient.
Correlation Coefficient rMeasures the strength and direction of the
linear relationship between two variables.
sx is the sample standard deviation of the x data values.
sy is the sample standard deviation of the y data values.
)( )(( 1) x y
y yx xrn s s
Example 4.5 - Calculating the correlation coefficient rFind the value of the correlation coefficient rfor the temperature data in Table 4.11.
Table 4.11 High and low temperatures, in degrees Fahrenheit, of 10 American cities
Interpreting the Correlation Coefficient r
1) Values of r close to 1 indicate a positive relationship between the two variables.
The variables are said to be positively correlated.
As x increases, y tends to increase as well.
Interpreting the Correlation Coefficient r2) Values of r close to -1 indicate a negative
relationship between the two variables.
The variables are said to be negatively correlated.
As x increases, y tends to decrease.
Interpreting the Correlation Coefficient r3) Other values of r indicate the lack of either
a positive or negative linear relationship between the two variables.
The variables are said to be uncorrelated
As x increases, y tends to neither increase nor decrease linearly.
Guidelines for Interpreting the Correlation Coefficient rIf the correlation coefficient between twovariables isgreater than 0.7, the variables are positively
correlated.between 0.33 and 0.7, the variables are
mildly positively correlated.between –0.33 and 0.33, the variables are
not correlated.between –0.7 and –0.33, the variables are
mildly negatively correlated. less than –0.7, the variables are negatively
correlated.
Example 4.6 - Interpreting the correlation coefficientInterpret the correlation coefficient found in Example 4.5.
Example 4.6 continuedSolution
In Example 4.5, we found the correlation coefficient for the relationship between high and low temperature to be r = 0.9761.
r = 0.9761 very close to 1. We would therefore say that high and low
temperatures for these 10 American cities are strongly positively correlated.
As low temperature increases, high temperatures also tend to increase.
Equivalent Computational Formula for Calculating the Correlation Coefficient r
2 22 2
/
/ /
xy x y nr
x x n y y n
Example 4.7Use the computational formula to calculate the correlation coefficient r for the relationshipbetween square footage and sales price of the eight home lots for sale in Glen Ellyn from Table 4.6 (Example 4.3 in Section 4.1).
SummarySection 4.2 introduces the correlation coefficient r, a measure of the strength of linear
association between two numeric variables.
Values of r close to 1 indicate that the variables are positively correlated.
Values of r close to –1 indicate that the variables are negatively correlated.
Values of r close to 0 indicate that the variables are not correlated.
4.3 Introduction to RegressionObjectives:By the end of this section, I will beable to…
1) Calculate the value and understand the meaning of the slope and the y intercept of the regression line.
2) Predict values of y for given values of x.
Equation of the Regression LineApproximates the relationship between x
and y
The equation is where the regression coefficients are the
slope, b1, and the y intercept, b0.
The “hat” over the y (pronounced “y-hat”) indicates that this is an estimate of y and not necessarily an actual value of y.
0 1y b b x
Example 4.8 - Calculating the regression coefficients b0 and b1
Find the value of the regression coefficients b0 and b1 for the temperature data inTable 4.11.
Table 4.11 High and low temperatures, in degrees Fahrenheit, of 10 American cities
Example 4.8 continuedStep 4:
Thus, the equation of the regression line for the temperature data is
10.0533 0.9865y x
Example 4.8 continuedSince y and x represent high and low
temperatures, respectively, this equation is read as follows:
“The estimated high temperature for an American city is 10.0533 degrees Fahrenheit plus 0.9865 times the low temperature for that city.”
Using the Regression Equation to Make PredictionsFor any particular value of x, the predicted
value for y lies on the regression line.
Example 4.11
Suppose we are considering moving to a city that has a low temperature of 47 degrees Fahrenheit (ºF) on this particular winter’s day. What would the estimated high temperature be for this city?
Example 4.11 continuedSolution
Plug the value of 47ºF for the variable low into the regression equation from Example 4.8:
We would say: “The estimated high temperature for an American city with a low of 47ºF, is 56.4188ºF.”
10.0533 0.9865
10.0533 0.9865 47
56.4188
y low
Interpreting the SlopeRelationship Between Slope and Correlation Coefficient
The slope b1 of the regression line and the correlation coefficient r always have the same sign.
b1 is positive if and only if r is positive.
b1 is negative if and only if r is negative.
top related